The majority of GIS datasets are currently represented in vector format, which is convenient due to storage efficiency but can be difficult to manipulate analytically. To achieve tractability, vector format GIS data is often rasterized (e.g., conformed to a grid-cell representation). The processes involved in vectorization of map data as well as rasterization of vector data have indiosyncratic errors that manifest as representational error in a given GIS system.
Section Overview. In this section, for purposes of orientation, we overview the vectorization of GIS data, then discuss theoretical and practical issues associated with feature and dataset merging. It is important to note that this section does not have as its goal the development of a specific "best" technique or algorithm for performing vectorization or rasterization operations. The material is structured as follows:
Goals. For the purposes of this course, goals for this section include:
FX, where
F denotes the GIS dataset features and X
Rn denotes
the dataset's spatial domain, with n = 2 customary.
PFCP: If N : X -> 2X
denotes a neighborhood on X, then given a point x
X, does
a point y
X
exist such that a|N(x) and
b|N(y) correlate better than
for any other choice of y?
Extension: For each z
X, determine
d(z) = x(z) - y(z)
such that PFCP is satisfied. The resultant image d
(Rn)X is called the
disparity map of a with respect to b.
Definition: Given d, if a (or
b) is compensated for d to yield a (or
b), such that maximal correlation exists between
a|N(x) and
b|N(x) for each x
X, then b and
a (a and b) are said to be
coregistered.
PDSM: Assume that there exists ground truth
ground truth g
FX and a comparison function c : F
× F -> R. If a(x)
b(x) for
some x
X, then what
mapping f : FX × FX
-> FX can be found (if any) such that the
representational error
e = ||c(f(a,b), g)||
is minimized with respect to ||c(a,g)|| and ||c(b,g)||, where ||x|| denotes the norm of x?For example, if c is the subtraction operation, then minimization could be subject to a constraint such as:
|e|
|c(a,g)| + |c(b,g)| .
We next consider the processes of map segmentation into vector data sets, as well as rasterization and merging of vector data sets.
R2
contains |X| points x
X that may be expressed in
terms of the map coordinates (x,y)
X. For example, one could
have the customary coordinates of latitude and longitude, or survey
measure from a given benchmark.
Figure 2.1. Schematic diagram of (a) natural scene and (b)
corresponding 2-D map.
X customarily has measurement error associated with locating an
object on the map a. One way to express such error is via an
abstraction called Circular Error Probable (CEP), as shown in
Figure 2.2. For example, given North-South error
NS
(sometimes called Northing error) or East-West error
EW
(sometimes called Easting error), we can approximate CEP,
denoted by
CEP,
as follows:
CEP =
((
NS2
+
EW2)/2)1/2 ,

Figure 2.2. Schematic diagram of circular error probable.
Figure 2.3. Schematic diagram of error propagation in a
GIS data acquisition, preprocessing, and storage system.
2.1.1. Overview of Vectorization and Rasterization
Map data can be converted to a more compact format that describes (a) the boundary coordinates of regions within the map, and (b) the contents of each region. By coercing each bounded region to have one and only one feature type, one can construct a map from a set of boundary coordinates to a set of feature types, which we call the vectorization map.
High-Level Vectorization Algorithm. In general, vectorization proceeds according to the following steps:
Step 2. Segment the map labelled in pointwise fashion in Step 1 to yield polygonal regions, each of which contain one and only one feature type or attribute set.
Step 3. Given the vertices of the polygons obtained in Step 2, represent each polygon as a path whose segments are described by their vertices. These path segments are called vectors and may be straight or curvilinear. For example, ARC/INFO uses curvilinear vectors that are circular arcs.
Step 4. Associate each polygon (represented as a chain of vectors or their vertices) with the geature type and attributes contained within the corresponding map region.
Step 5. Encode and store the output of Step 4. For example, one could use ARC/INFO, ERDAS, or IDRISI format, and could store the data to disk or digital tape.
In particular, we note that vectorization is concerned primarily with the spatial definition of map neighborhoods or regions in terms of a set of vector components (e.g., line segments or arcs) that delimit a region having one and only one feature type or attribute set. In contrast, rasterization assumes a given spatial segmentation scheme (e.g., an image pixel grid) that is not necessarily data-dependent, then associates one or more feature types or attribute sets per grid cell. A key problem in vectorization is accurately portraying the region boundary with a limited set of vector primitives, whereas rasterization can be seen as primarily emphasizing the resolution of feature type or attribute conflicts within a fixed, prespecified spatial framework.
Discussion. Vectorization is acustomary procedure that has many different implementations. Errors inherent in vectorization include:
High-Level Rasterization Algorithm. The following steps are generally employed in rasterization:
R.
Step 2. Discretize or partition a 2-D spatial map domain X into rectangular blocks of size q×q map units.
Step 3. Resolve feature type conflicts within each block or grid cell on a case-by-case basis, using contextual information as required.
Step 4. Resolve any remaining feature attribute conflicts.
Dataset Merging Problem. The dataset merging problem in GIS can be stated as follows:
FX , with what probability does a(x)
= b(x) ?2.1.2. Rasterization of Vector-format GIS Data.
We have previously shown that the rasterization problem can be reduced to the problem of resolving which feature attribute a grid cell should be labelled. Additionally, we discussed area-based methods for feature label assignment. Three pertinent cases will be discussed in this section, which are listed in increasing order of difficulty, as follows:
Case II. Multiple vector segments connected to form one polyline cross a grid cell, thereby partitioning the cell into two polygons, neither of which could be convex.
Case III. Multiple vector segments connected to form multiple polylines cross a grid cell, thereby partitioning the cell into multiple polygons.
Recall. Given a trapezoid T having parallel sides of length a and b and height h, the area of T is given by
< T > = h · (a + b)/2 .
Observation. If a given polygon A in a vectorized map can be decomposed into trapezoids A1, A2, ..., Ai, ..., An, each of which has parallel sides of length ai and bi, with height hi, then the area of A would be computed as:
< A > =
< Ai > =
hi · (ai + bi)/2 .
R2, a line segment belonging to the sequence of
segments that delimits a vector polygon is itself delimited by the
points xi,xi+1
X. Let the line
segment intersect the y-th grid cell in the rasterized
map domain Y
Z2 at points zy(1) and
zy(2), thereby partitioning the grid cell into two
polygons A1 and A2. Given the vertices of the
grid cell denoted by xy(i)
X,
i = 1..4, as shown in Figure ____b, the
parallel sides of the left-hand trapezoid have length
a1 = || xy(1) - zy(1) || and b1 = || zy(2) - xy(3) ||
and heighth1 = || xy(2) - xy(3) || .
Hence,< A1 > = h1 · (a1 + b1)/2 ,
and symmetrically for the right-hand partition A2.Case Ib: The grid cell is divided into a triangle and an irregular pentagon, as shown in Figure ___b. Given the notational conventions of Figure ___a, the area of the triangular partition A1 is given by
< A1 > = || xy(1) - zy(1) || &183; ||xy(1) - zy(2) ||/2 ,
and the remaining are can be obtained by subtracting < A1 > from the area of the grid cell.
X. Each consecutive pair of vertices
xi,xi+1 defines the slant side of
a trapezoid, as shown in Figure ___a. Summing the areas of the
component trapezoids is a simple matter, unless the sequence reverses
direction, as shown in Figure ___b. In this case, we must employ a
more generalized algorithm for specifying the area of an irregular
polygon, as follows.
Algorithm. Given a map domain X
R2, let the
image a ,
%%%%LEFT OFF HERE%%%%
{Doe97} Doe, J.H. An Introduction to GIS, New York: Bogus Press (1997).