GIS DATABASE MANAGEMENT and lect.3  

Vector Comcepts

DEM Catalogue

SRTM

SRTM

USGS DEMs

Database concepts
Why choose a database approach?
Database models
Creating a database
GIS database applications
Developments in databases

Data input and editing
Methods of data input
Data editing
Towards an integrated database

Data analysis
Learning outcomes
Introduction
Measurements in GIS – lengths, perimeters and areas
Queries
Reclassification
Buffering and neighbourhood functions
Integrating data – map overlay
Spatial interpolation
Analysis of surfaces
Network analysis

Analytical modelling in GIS
Process models
Modelling physical and environmental processes
Modelling human processes
Modelling the decision-making process
Problems with using GIS to model spatial processes

Output: from new maps to enhanced decisions
Maps as output
Non-cartographic output
Spatial multimedia
Mechanisms of delivery
GIS and spatial decision support

GIS DATA MODELS

Data Model

SPATIAL OBJECTS AND DATABASE MODELS
A. INTRODUCTION

B. POINT DATA

C. LINE DATA
Network entities
Network characteristics
Attributes
Networks as linear addressing systems

D. AREA DATA
1. Environmental/natural resource zones
2. Socio-economic zones
3. Land records
Areal coverage
Holes and islands

E. REPRESENTATION OF CONTINUOUS SURFACES
General nature of surfaces
Data structures for representing surfaces
Spatial interpolation

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

This unit continues the development of basic concepts about representing reality as spatial data. Here we look at how the representation of reality in the form of entities is handled with the spatial objects points, lines and areas.

UNIT 11 - SPATIAL OBJECTS AND DATABASE MODELS

Compiled with assistance from Timothy L. Nyerges, University of Washington


A. INTRODUCTION


the objects in a spatial database are representations of real-world entities with associated attributes

the power of a GIS comes from its ability to look at entities in their geographical context and examine relationships between entities

thus a GIS database is much more than a collection of objects and attributes

in this unit we look at the ways a spatial database can be assembled from simple objects
e.g. how are lines linked together to form complex hydrologic or transportation networks
e.g. how can points, lines or areas be used to represent more complex entities like surfaces?


B. POINT DATA

the simplest type of spatial object

choice of entities which will be represented as points depends on the scale of the map/study
e.g. on a large scale map - encode building structures as point locations
e.g. on a small scale map - encode cities as point locations

the coordinates of each point can be stored as two additional attributes

information on a set of points can be viewed as an extended attribute table
each row is a point - all information about the point is contained in the row
each column is an attribute
two of the columns are the coordinates

overhead - Point data attribute table
here northing and easting represent y and x coordinates

each point is independent of every other point, represented as a separate row in the database model

C. LINE DATA
Network entities


infrastructure networks
transportation networks - highways and railroads
utility networks - gas, electric, telephone, water
airline networks - hubs and routes

natural networks
river channels

Network characteristics

a network is composed of:
nodes - junctions, ends of dangling lines
links - chains in the database model

diagram







valency of a node is the number of links at the node
ends of dangling lines are "1-valent"
4-valent nodes are most common in street networks
3-valent nodes are most common in hydrology

a tree network has only one path between any pair of nodes, no loops or circuits are possible
most river networks are trees

Attributes

examples of link attributes:
direction of traffic, volume of traffic, length, number of lanes, time to travel along link
diameter of pipe, direction of gas flow
voltage of electrical transmission line, height of towers
number of tracks, number of trains, gradient, width of most narrow tunnel, load bearing capacity of weakest bridge

examples of node attributes:
presence of traffic lights, presence of overpass, names of intersecting streets
presence of shutoff valves, transformers

note that some attributes (e.g. names of intersecting streets) link one type of entity to another (nodes to links)

some attributes are associated with parts of network links
e.g. part of a railroad link between two junctions may be inside a tunnel
e.g. part of a highway link between two junctions may need pavement maintenance

many GIS systems require such attributes to be attached to the network by splitting existing links and creating new nodes
e.g. split a street link at the house and attach the attributes of the house to the new (2-valent) node
e.g. create a new link for the stretch of railroad which lies inside the tunnel, plus 2 new nodes

this requirement can lead to impossibly large numbers of links and 2-valent nodes
e.g. at a scale of 1:100,000, the US rail network has about 300,000 links
the number of links would increase by orders of magnitude if new nodes had to be defined in order to locate bridges on links

Networks as linear addressing systems

often need to use the network as an addressing system, e.g. street network

address matching is the process of locating a house on a street network from its street address
e.g. if it is known that the block contains houses numbers from 100 to 198, house #124 would probably be 1/4 of the way along that link

points can be located on the network by link number and distance from beginning of link
this can be more useful than the (x,y) coordinates of points since it links the points to a location on the network

this approach provides an answer to the problem of assigning attributes to parts of links
keep such entities (houses, tunnels) in separate tables, link them to the network by link number and distance from beginning of link
need one distance for point entities, two for extended entities like tunnels (start and end locations)
the GIS can then compute the (x,y) coordinates of the entities if needed

links need not be permanently split in this scheme


D. AREA DATA

is represented on area class maps, choropleth maps

boundaries may be defined by natural phenomena, e.g. lake, or by man, e.g. forest stands, census zones

there are several types of areas that can be represented

1. Environmental/natural resource zones

examples include
land cover data - forests, wetlands, urban
geological data - rock types
forestry data - forest "stands", "compartments"
soil data - soil types

boundaries are defined by the phenomenon itself
e.g. changes of soil type

almost all junctions are 3-valent

2. Socio-economic zones

includes census tracts, ZIP codes, etc.

boundaries defined independently of the phenomenon, then attribute values are enumerated

boundaries may be culturally defined, e.g. neighborhoods

3. Land records
land parcel boundaries, land use, land ownership, tax information

Areal coverage

overhead - Areal coverage
1. entities are isolated areas, possibly overlapping

any place can be within any number of entities, or none
e.g. areas burned by forest fires
areas do not exhaust the space

2. any place is within exactly one entity
areas exhaust the space
every boundary line separates exactly two areas, except for the outer boundary of the mapped area
areas may not overlap

any layer of the first type can be converted to one of the second type
each area may now have any number of fire attributes, depending on how many times it has been burned - unburned areas will have none

Holes and islands

areas often have "holes" or areas of different attributes wholly enclosed within them
diagram







the database must be able to deal with these correctly
this has not always been true of GIS products

cases can be complex, for example:
Lake Huron is a "hole" in the North American landmass
Manitoulin Island is a "hole" in Lake Huron
Manitoulin Island has several large lakes, including one which is the largest lake on an island in a lake anywhere
some of these lakes have islands in them

some systems allow area entities to have islands
more than one primitive single-boundary area can be grouped into an area object
e.g. the area served by a school or shopping center may have more than one island, but only one set of attributes

diagram

SPATIAL OBJECTS AND DATABASE MODELS
. REPRESENTATION OF CONTINUOUS SURFACES

examples of continuous surfaces are:
elevation (as part of topographic data)
rainfall, pressure, temperature
population density

potential must exist for sampling observations everywhere on an interval/ratio level

General nature of surfaces

critical points
peaks and pits - highest and lowest points
ridge lines, valley bottoms - lines across which slope reverses suddenly
passes - convergence of 2 ridges and 2 valleys

faults - sharp discontinuities of elevation - cliffs

fronts - sharp discontinuities of slope

slopes and aspects can be derived from elevations

Data structures for representing surfaces

traditional data models do not have a method for representing surfaces
therefore, surfaces are represented by the use of points, lines or areas

note: the following series of three overheads on Tiefort Mountains all represent the same area
1. points - grid of elevations overhead - Elevation represented as points

DEM or Digital Elevation Model
based on sampling the elevation surface at regular intervals
result is a matrix of points
much digital elevation data available in this form

2. lines - digitized contours overhead - Elevation represented as lines
from DLG hypsography layer, identical to those on the printed map, plotted directly from stereo photography
based on string object type
a line connecting sampled points of equal elevation
elevation is attribute
could be done for rainfall, barometric pressure etc.

3. areas - TIN (Triangulated irregular network) overhead - Triangulation of a terrain surface
overhead - Elevation represented as areas

note: perspective diagram is developed from the triangulated surface (TIN created by M.P. Kumler, USGS)
sample points often located at peaks, pits, along ridges and valleys
sampling can be varied depending on ruggedness of the surface
a very efficient way of representing topography
result is TIN composed of nodes, lines and triangular faces

Spatial interpolation

frequently when using continuous data we wish to estimate values at specific locations which are not part of the point, line or area dataset
these values must be determined from the surrounding values using techniques of spatial interpolation (see Units 40 and 41)
e.g. to interpolate contours, a regular grid is often interpolated from an irregular scatter of points or densified from a sparse grid


REFERENCES
Burrough, P. A., 1986. Geographical Information Systems for Land Resources Assessment, Clarendon Press, Oxford. See chapter 2 for a review of database models.

Dueker, K. J., 1987. "Geographic Information Systems and Computer-Aided Mapping," American Planning Association Journal, Summer 1987:383-390. Compares database models in GIS and computer mapping.

Mark, D.M., 1978. "Concepts of Data Structure for Digital Terrain Models," Proceedings of the Digital Terrain Models (DTM) Symposium, ASP and ACSM, pp. 24-31. A comprehensive discussion of DEM database models.

Marx, R. W., 1986. "The TIGER System: Automating the Geographic Structure of the United States Census," Government Publications Review 13:181-201. Issues in the selection of a database model for TIGER.

Nyerges, T. L. and K. J. Dueker, 1988. Geographic Information Systems in Transportation, Federal Highway Administration, Division of Planning, Washington, D. C. Database model concerns in transportation applications of GIS.

Peuquet, D.J., 1984. "A conceptual framework and comparison of spatial data models," Cartographica 21(4):66-113. An

excellent review of the various spatial data models used in GIS.


EXAM AND DISCUSSION QUESTIONS

1. How does a natural zone coverage differ from an enumeration zone coverage? Describe the differences in terms of (a) application areas, (b) visual appearance, (c) compilation of data.

2. Compare the various data models for elevation data. Which would you expect to be best for (a) a landscape dominated by fluvial erosion and dendritic drainage patterns, (b) a glaciated landscape, (c) a barometric weather map with fronts, (d) a map of population densities for North America.

3. What data models might be needed in a system to monitor oil spills and potential environmental damage to coastlines? Give examples of appropriate spatial objects and associated attributes.

4. Describe the differences between the data models commonly used in remote sensing, computer assisted design, automated cartography and GIS.

RELATIONSHIPS AMONG SPATIAL OBJECTS
INTRODUCTION
Three types of relationship

B. EXAMPLES OF SPATIAL RELATIONSHIPS
Point-point
Point-line
Point-area
Line-line
Line-area
Area-area

C. CODING RELATIONSHIPS AS ATTRIBUTES
Example - "flows into" relationship
Example - "is contained in" relationship

D. OBJECT PAIRS

E. CARTOGRAPHIC AND TOPOLOGICAL DATABASES
Strict definition of "topological"
Usage of "topological" in GIS

F. PLANAR ENFORCEMENT
Process
Objective

G. RELATIONSHIPS IN RASTER SYSTEMS

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

This final unit in the spatial databases module looks at the complex issue of relationships and how they can be coded. The important concept of planar enforcement, introduced here, is referred to several times in later units.

UNIT 12 - RELATIONSHIPS AMONG SPATIAL OBJECTS

Compiled with assistance from Gerald White, California State University, Sacramento

RELATIONSHIPS AMONG SPATIAL OBJECTS
there are a vast number of possible relationships in spatial data

many are important in analysis
e.g. "is contained in" relationship between a point and an area is important in relating objects to their surrounding environment
e.g. "intersects" between two lines is important in analyzing routes through networks

relationships can exist between entities of the same type or of different types
e.g. for each shopping center, can find the nearest shopping center (same type)
e.g. for each customer, can find the nearest shopping center (different types)

Three types of relationship

1. relationships which are used to construct complex objects from simple primitives
e.g. relationship between a line (chain) and the ordered set of points which defines it
e.g. relationship between an area (polygon) and the ordered set of lines which defines it

2. relationships which can be computed from the coordinates of the objects
e.g. two lines can be examined to see if they cross - the "crosses" relationship can be computed
e.g. areas can be examined to see which one encloses a given point - the "is contained in" relationship can be computed
e.g. areas can be examined to see if they overlap - the "overlaps" relationship

3. relationships which cannot be computed from coordinates - these must be coded in the database during input
e.g. we can compute if two lines cross, but not if the highways they represent intersect (may be an overpass)
some databases allow an entity called a "complex object", composed of "simple objects", e.g. objects representing "house", "lot", "cable", with
associated attributes might be grouped together logically as "account"



B. EXAMPLES OF SPATIAL RELATIONSHIPS
Point-point


"is within", e.g. find all of the customer points within 1 km of this retail store point

"is nearest to", e.g. find the hazardous waste site which is nearest to this groundwater well

Point-line

"ends at", e.g. find the intersection at the end of this street

"is nearest to", e.g. find the road nearest to this aircraft crash site

Point-area

"is contained in", e.g. find all of the customers located in this ZIP code boundary

"can be seen from", e.g. determine if any of this lake can be seen from this viewpoint

Line-line

"crosses", e.g. determine if this road crosses this river

"comes within", e.g. find all of the roads which come within 1 km of this railroad

"flows into", e.g. find out if this stream flows into this river

Line-area

"crosses", e.g. find all of the soil types crossed by this railroad

"borders", e.g. find out if this road forms part of the boundary of this airfield

Area-area

"overlaps", e.g. identify all overlaps between types of soil on this map and types of land use on this other map

"is nearest to", e.g. find the nearest lake to this forest fire

"is adjacent to", e.g. find out if these two areas share a common boundary

RELATIONSHIPS AMONG SPATIAL OBJECTS
CODING RELATIONSHIPS AS ATTRIBUTES

in the database model we can visualize relationships as additional attributes

Example - "flows into" relationship

overhead - Coding relationships as attributes I

option A:
each stream link in a stream network could be given the ID of the downstream link which it flows into
flow could be traced from link to link by following pointers

option B
alternatively the network could be coded as two sets of entities - links and nodes
the links could "point" to their downstream node
the nodes could "point" to the next downstream link

Example - "is contained in" relationship

overhead - Coding relationships as attributes II

given:
locations of 4 wells, with attributes of depth and flow
wells lie in two different counties with attributes of population

we wish to determine how much flow is available in each county:
1. find the containing county of each well (compute the "is contained in" relationship)

store the result as a new attribute, County, of each well

2. using this revised attribute table, total flow by county and add results to the county table
County Population Flow A 20,000 4,500 B 35,000 5,500



D. OBJECT PAIRS

distance is an attribute of a pair of objects

there are other types of information which are similarly attributes of pairs of objects
e.g. flow of commuters between a suburb and downtown
e.g. trade between two countries
e.g. flow of groundwater between a sink and a spring

in some cases these attributes can be attached to an object linking the origin and destination objects
e.g. on a map, trade can be an attribute of an arrow connecting the two countries
thick arrows indicate strong trade
however, such maps quickly become impossibly complex

in general, it is necessary to allow for information which is not an attribute of any one object but of a pair of objects, including:
distance
connectedness - yes or no
flow of goods, trade
number of trips

such attributes cannot necessarily be ascribed to any real object
e.g. commuting flows between a suburb and downtown are not necessarily attributes of any set of links in the transport network
e.g. flow of groundwater between a sink and a spring does not necessarily follow any aquifer or conduit

these are attributes of object pairs

object pairs are important in various kinds of spatial analysis using GIS

attributes of object pairs can be thought of as tables which have one object as rows and the other object as columns with the values in each cell representing the value of the interaction between them
are many different terms for the implementation of this concept - e.g. interaction matrix, turn table, Cartesian product


E. CARTOGRAPHIC AND TOPOLOGICAL DATABASES
Strict definition of "topological"


if a map is stretched and distorted, some of its properties change, including:
distances
angles
relative proximities

other properties remain constant, including:
adjacencies
most other relationships, such as "is contained in", "crosses"
types of spatial objects - areas remain areas, lines remain lines, points remain points

strictly, topological properties are those which remain unchanged after distortion

Usage of "topological" in GIS

a spatial database is often called "topological" if one or more of the following relationships have been computed and stored
connectedness of links at intersections
ordered set of lines (chains) forming each polygon boundary
adjacency relationships between areas

unfortunately the precise meaning of the term has become distorted by use

in general, "topological" implies that certain relationships are stored, making the data more useful for various kinds of spatial analysis

by contrast, a database is called "cartographic" if the above conditions are absent
objects can be manipulated individually

relationships between them are unavailable or are considered unimportant

cartographic databases are less useful for analysis of spatial data
however they are satisfactory for simple mapping of data
many packages designed for mapping only use cartographic database models
a cartographic database can usually be converted to a topological database by computing relationships - the process of "building topology" through planar enforcement


F. PLANAR ENFORCEMENT

objects and their attributes are capable of describing the conditions existing on a map or in reality

variation of a single property like soil type or elevation over a mapped area is achieved by including appropriate attributes for entity types
e.g. elevation described by giving attributes to elevation points
e.g. soil type described by giving attributes to areas

in cases like soil type, the objects used to describe spatial variation must obey certain simple rules
e.g. two areas cannot overlap
e.g. every place must be within exactly one area, or on a boundary

these rules are collectively referred to as planar enforcement
a set of objects obeying these rules is said to be planar enforced

planar enforcement is a very important operation in a vector GIS

Process

begin with a number of unrelated line segments
imagine a number of limp spaghetti noodles lying on a table

the following elements are now defined (terminology from the US Census Bureau for development of digital spatial database concepts): overhead - Planar enforcement
a 0-cell (or node) is identified wherever two noodles cross or a noodle terminates
i.e. all intersections are calculated
1-cell (or link, also "chain", "arc", "edge") is identified for each length of noodle between two consecutive 0-cells (nodes)
a 2-cell (or area, also "face", "polygon") is defined for each group of consecutive 1-cells forming an enclosed area that does not contain any 1-cells that are not part of the boundary
note that these definitions relate directly to the ordinary concept of dimensionality

the results are:
0-cells are either isolated ("points") or adjacent to one or more 1-cells ("nodes")
all 1-cells end in exactly two 0-cells
each line segment (chain) between adjacent 0-cells is assigned to exactly one 1-cell
all 1-cells lie between exactly two 2-cells
every place on the "map" between noodles is assigned to a single 2-cell (the rest of the world is a 2- cell as well, often given the ID zero)

Objective

planar enforcement is used to build objects out of digitized lines (hence the phrase "building topology")

it is a consistent and precise approach to the problem of making meaningful objects out of groups of lines

simple rules can be used to correct some digitizing errors:
a very short 1-cell terminating in a 1-valent 0-cell indicates an overshoot diagram






a long 1-cell terminating in a 1-valent 0-cell very close to another 1-cell indicates an undershoot diagram







planar enforcement is often needed when a set of data is being imported from another system
e.g. if the source is a cartographic database and needs to have relationships computed
e.g. if the database models of the two systems are incompatible, data is transferred as unrelated noodles, then objects are rebuilt

planar enforcement must be applied one layer at a time

planar enforcement concepts are built into many systems


G. RELATIONSHIPS IN RASTER SYSTEMS

in general, it is easier to work with relationships in vector systems
the concept of object is not as natural for raster systems, which model the world as composed of pixels

however, relationships can be handled in raster systems with simple techniques:
overhead - Relationships in raster systems

e.g. a map of county boundaries
in one layer each pixel has a county code attribute which is an ID pointing to an entry in a county attribute table

in a second layer each well location is coded by giving the appropriate pixel an ID pointing to a well attribute table
the "is contained in" relationship can be computed by an overlay operation and stored as an additional column in the well attribute table

only a few raster systems contain this type of capability to extract relationships into attribute tables
most do not handle relationships between spatial objects


REFERENCES
Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment. Clarendon, Oxford. Chapter 2 describes objects, attribute tables and relationships.

Goodchild, M.F., 1988. "Towards an enumeration and classification of GIS functions," Proceedings, IGIS ''87. NASA, Washington DC 2:67-77. Defines and discusses object pairs.

Keating, T., W. Phillips and K. Ingram, 1987. "An integrated topologic database design for geographic information systems," Photogrammetric Engineering and Remote Sensing Vol. 53. Good discussion of topological and cartographic database models.


EXAM AND DISCUSSION QUESTIONS

1. Discuss the use of planar enforcement for street networks, and the problems presented by overpasses and underpasses. Can you modify the basic rules to maintain consistency but allow for such instances?

2. What additional examples of relationships can you devise in each of the six categories used in section B?

3. Why have designers of raster GIS not commonly devised ways of coding spatial relationships between objects in their systems? Is this likely to change in the future, and if so, why?

4. "Topology is what distinguishes GIS from automated cartography". Discuss.

THE VECTOR OR OBJECT GIS
INTRODUCTION
Vector data model

B. "ARCS"
Storing areas

C. DATABASE CREATION
Building topology
Editing
Relationship between digitizing and editing
Edgematching

D. ADDING ATTRIBUTES

E. EXAMPLE ANALYSIS USING VECTOR GIS
Objective
Procedure
Result

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

This unit begins a two part introduction to vector GIS. We have placed these units here since we feel this discussion benefits from an understanding of the previous introduction to spatial data concepts in Units 10 to 12. However, with a little revision, it is possible to move this module so that it follows Units 4 and 5 on raster GIS.

UNIT 13 - THE VECTOR OR OBJECT GIS

Compiled with assistance from Holly Dickinson, State University of New York at Buffalo

THE VECTOR OR OBJECT GIS
INTRODUCTION

Vector data model


based on vectors (as opposed to space-occupancy raster structures)

fundamental primitive is a point

objects are created by connecting points with straight lines
some systems allow points to be connected using arcs of circles

areas are defined by sets of lines
the term polygon is synonymous with area in vector databases because of the use of straight-line connections between points

very large vector databases have been built for different purposes
vector tends to dominate in transportation, utility, marketing applications
raster and vector both used in resource management applications


B. "ARCS"

when planar enforcement is used, area objects in one class or layer cannot overlap and must exhaust the space of a layer

every piece of boundary line is a common boundary between two areas

the stretch of common boundary between two junctions (nodes) has various names
edge is favored by graph theorists, "vertex" for the junctions
chain is the word officially sanctioned by the US National Standard
arc is used by several systems

arcs have attributes which identify the polygons on either side

these are referred to as "left" and "right" by reference to the sequence in which the arc is coded

arcs (chains/edges) are fundamental in vector GIS

Storing areas

two ways of storing areas:

polygon storage
every polygon is stored as a sequence of coordinates
although most boundaries are shared between two adjacent areas, all are input and coded twice, once for each adjacent polygon
the two different versions of each internal boundary line may not coincide
difficult to do certain operations, e.g. dissolve boundaries between neighboring areas and merge them
used in some current GISs, many automated mapping packages

arc storage
every arc is stored as a sequence of coordinates
areas are built by linking arcs
only one version of each internal shared boundary is input and stored
used in most current vector-based GISs

C. DATABASE CREATION

database creation involves several stages:
input of the spatial data
input of the attribute data
linking spatial and attribute data

spatial data is entered via digitized points and lines, scanned and vectorized lines or directly from other digital sources
once the spatial data has been entered, much work is still needed before it can be used

Building topology

once points are entered and geometric lines are created, topology must be "built"
this involves calculating and encoding relationships between the points, lines and areas
this information may be automatically coded into tables of information in the database

Editing

during this topology generation process, problems such as overshoots, undershoots and spikes are either flagged for editing by the user or corrected automatically
automatic editing involves the use of a tolerance value which defines the width of a buffer zone around objects within which adjacent objects should be joined
tolerance value is related to the precision with which locations can be digitized








these edit procedures include such functions as snap, move, delete, split, join, etc.

Relationship between digitizing and editing

digitizing and editing are complementary activities
poor digitizing leads to much need for editing
good digitizing can avoid most need for editing
both can be very labor-intensive

the process used to digitize area objects can affect the need for later editing:

in "blind" digitizing all linework is digitized once as "noodles" in any order
it is unlikely that the building and cleaning operations will be able to automatically sort out area objects unambiguously from the resulting jumble









some systems require the user to identify junctions between digitized "noodles" explicitly
usually by touching a special button on the cursor








mistakes in building topology are less likely

some systems require the user to digitize each individual arc/chain separately









much easier to sort out polygons - less need for editing

some systems support the building of topology "on the fly"
the system searches constantly for complete area objects as digitizing proceeds
the user is informed by a sound or by blinking as soon as the object is detected

Edgematching

compares and adjusts features along the edges of adjacent map sheets

some edgematches merely move objects into alignment

others "join" the pieces together logically - conceptually they become one object
the user "sees" no interruption

an edgematched database is "seamless" - the sheet edges have disappeared as far as the user is concerned

THE VECTOR OR OBJECT GIS
ADDING ATTRIBUTES

once the objects have been formed by building topology, attributes can be keyed in or imported from other digital databases

once added to the database, attributes must be linked to the different objects
attributes can be linked by pointing to the appropriate object on the screen and coding its corresponding object ID into the attribute table

unlike many raster GIS systems, attribute data is stored and manipulated in entirely separate ways from the locational data


E. EXAMPLE ANALYSIS USING VECTOR GIS

compare with example analysis in Unit 4 (The Raster GIS)

Objective

identify areas suitable for logging

an area is suitable if it satisfies the following criteria:
is Jack pine (Black Spruce are not valuable)
is well drained (poorly drained and waterlogged terrain cannot support equipment, logging causes unacceptable environmental damage)
is not within 500 m of a lake or watercourse (erosion may cause deterioration of water quality)

Procedure

database consists of three layers
note: polygons do not entirely fill the space in each case
hence, areas not included fall in polygon ID 0


buffer hydrography out to 500 m

merge buffer and lake

extract Jack pine polygons (species = Jack pine)

extract drained soil polygons (drainage = 2, therefore soil = A)

overlay buffer, Jack pine and soil polygons

build topology

extract polygons not in the buffer but in others (buffer = n, Jack pine = y, drainage = y)

Result

loggable area shown in final map


REFERENCES
Beard, M.V. and N.R. Chrisman, 1988. "Zipping: a locational approach to edgematching," The American Cartographer 15:163-72. Describes a solution to the edgematching problem.

Chrisman, N.R., 1990. "Deficiencies of sheets and tiles: building sheetless databases," International Journal of Geographical Information Systems 4:157-67. A more general discussion of building edgematched databases.

ESRI, 1990. Understanding GIS: The ARC/INFO Way, ESRI, Redlands, CA. A general introductory tutorial for ARC/INFO, a well-known contemporary GIS.

Tomlinson, R.F., H.W. Calkins and D.F. Marble, 1976. Computer Handling of Geographical Data. UNESCO Press, Paris. Excellent semi-technical description of CGIS, an early vector-based system.


EXAM AND DISCUSSION QUESTIONS

1. List and describe the processes involved in constructing a vector database by digitizing maps.

2. By using simple sketches, describe and illustrate typical problem cases which lead to difficulties in building area- object topology in a vector database, and the strategies which various GISs use to minimize editing effort.

3. Discuss the applications of GIS, in relation to the vector data model. Give examples of cases where the model

would be particularly inappropriate in comparison with raster.

4. Why did the designers of CGIS choose a vector data model, and yet use scanning as the major method

VECTOR GIS CAPABILITIES
INTRODUCTION

B. SIMPLE DISPLAY AND QUERY
Display
Standard Query Language (SQL)
Boolean operators
SQL extensions for spatial queries

C. RECLASSIFY, DISSOLVE AND MERGE
Steps
Forestry example
City zoning example

D. TOPOLOGICAL OVERLAY
Point in polygon
Line on polygon
Polygon on polygon ("Polygon overlay")
Example
Spurious polygons

E. BUFFERING

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

VECTOR GIS CAPABILITIES
INTRODUCTION

analysis functions with vector GIS are not quite the same as with raster GIS
more operations deal with objects
measures such as area have to be calculated from coordinates of objects, instead of counting cells

some operations are more accurate
estimates of area based on polygons more accurate than counts of pixels
estimates of perimeter of polygon more accurate than counting pixel boundaries on the edge of a zone

some operations are slower
e.g. overlaying layers, finding buffers

some operations are faster
e.g. finding path through road network


B. SIMPLE DISPLAY AND QUERY
Display


using points and "arcs" can display the locations of all objects stored

attributes and entity types can be displayed by varying colors, line patterns and point symbols

may only want to display a subset of the data
e.g. want to display areas of urban landuse with some base map data
select all political boundaries and highways, but only areas that had urban land uses

how would the user do this?
e.g. one of the layers in a database is a "map" of land use, called USE
area objects on this layer have several attributes
one attribute, called CLASS, identifies the area''s land use
for urban land use, it has the value "U"
need to extract boundaries for all areas that have CLASS="U"

Standard Query Language (SQL)

different systems use different ways of formulating queries

Standard Query Language (SQL) is used by many systems

SQL phrase structure:
SELECT &LTattribute name(s)> FROM &LTtable> WHERE &LTcondition statement>

e.g. SELECT FROM USE WHERE CLASS="U"
this selects only the objects for display - no attributes are retrieved by the query

SQL examples using a list of student names:
SELECT name FROM list (selects all names)
SELECT name FROM list WHERE grade = "A" (selects names of students receiving an "A")
SELECT name FROM list WHERE cumgrade > 3.0 (selects names of students with a cumulative gpa greater than 3.0)

SQL operators:
relational: >, &LT, =, >=, &LT=
arithmetic: =, -, *, / (only on numeric fields)
Boolean: and, or, not

Boolean operators

used to combine conditions
e.g. WHERE cumgrade > 3.0 AND grade = "A" (selects students satisfying both conditions only)

Boolean operators can have a spatial meaning in GIS as well
e.g. when two maps are overlayed, areas (polygons) that are superimposed have the "and" condition

a spatial representation is used to illustrate Boolean operators in the study of logic, through the use of diagrams called Venn diagrams
thus GIS area overlay is a geographical instance of a Venn diagram

"XOR" is the "exclusive or" - A xor B means A or B but not both

SQL extensions for spatial queries

some systems allow specifically spatial queries to be handled under SQL e.g. WITHIN operator
SELECT &LTobjects> WITHIN &LTspecific area>

the criteria for these spatial searches may include searching within the radius of a point, within a bounding rectangle, or within an irregular polygon

C. RECLASSIFY, DISSOLVE AND MERGE

reclassify, dissolve and merge operations are used frequently in working with area objects
these are used to aggregate areas based on attributes

consider a soils map:
we wish to produce a map of major soil types from a layer that has polygons based on much more finely defined classification scheme

Steps

1. reclassify areas by a single attribute or some combination

e.g. reclassify soil areas by soil type only

2. dissolve boundaries between areas of same type
by delete the arc between two polygons if the relevant attributes are the same in both polygons

3. merge polygons into large objects
recode the sequence of line segments that connect to form the boundary (i.e. rebuild topology)
assign new ID #''s to each new object

Forestry example

consider a forestry GIS where the forest is divided into "stands", average size 10 ha:
each stand carries a list of attributes, including tree species and average tree age
attributes apply homogeneously to area of each stand
boundary occurs between stands whenever at least one attribute changes

problem: identify all cuttable areas of white spruce
assign new attribute "cuttable" to each stand
value = "y" if white spruce AND age > 50 years
value = "n" otherwise
after assigning new attribute, all others can be dropped

now wish to identify cuttable areas, each may be merger of several individual stands
dissolve boundaries between polygons with same value of "cuttable" attribute
merge polygons into larger objects

City zoning example

need to know how many individual landuse zones have been created in the city and how these are distributed geographically

each land parcel in the city has a zoning attribute attached to it

dissolve boundaries between parcels if the zoning is the same

result can be a map showing large areas of similar zoning classes

VECTOR GIS CAPABILITIES
TOPOLOGICAL OVERLAY

suppose individual layers have planar enforcement (required in many systems, not all)

when two layers are combined ("overlayed", "superimposed") the result must have planar enforcement as well
new intersection must be calculated and created wherever two lines cross
a line across an area object creates two new area objects

topological overlay is the general name for overlay followed by planar enforcement

relationships are updated for the new, combined map

result may be information about relationships (new attributes) for the old (input) maps rather than the creation of new objects
e.g. overlay map of school districts on census tracts
result is map showing every school district/census tract combination
for each combination, the database contains an area object
however, concern may be with obtaining the number of overlapping census tracts as a new attribute of each school district rather than with new objects themselves

Point in polygon

overlay point objects on areas, compute "is contained in" relationship

result is a new attribute for each point
e.g. combine wells and planning districts, find district containing each well

Line on polygon

overlay line objects on area objects, compute "is contained in" relationship

lines are broken at each area object boundary
number of output lines is greater than number of input lines

containing area is new attribute of each output line

e.g. combine streams and counties, find county containing each stream segment

Polygon on polygon ("Polygon overlay")

overlay two layers of area objects

boundaries are broken at each intersection

number of output areas likely greater than the total number of input areas

e.g. input watershed boundaries, county boundaries, output map of watershed/county combinations

after overlay we can recreate either of the input layers by dissolving and merging based on the attributes contributed by the input layer

Example

wish to use find those areas that are the best land for timber harvesting

after overlay, each original layer contributes attributes to the combined layer

we get the final map by selecting the desired attributes of the combined layer
SELECT FROM OVERLAY WHERE Species = "Jack pine" AND Soil = "C"

Spurious polygons

during polygon overlay, many new and smaller polygons are created, some of which may not represent true spatial variations
the small, invalid polygons are called spurious or sliver polygons and can be a major problem in polygon overlay

spurious polygons arise when two lines are overlaid which are actually slightly different versions of the same line
if the same line occurs on two input maps, the digitized versions may be slightly different
in many cases the lines on the source maps have been compiled from different sources, but are nevertheless the same line on the ground
e.g. a road may be part of a county boundary, also the boundary between two fields or two soil types or two vegetation types

the problem cannot be removed by more careful digitizing - more points simply leads to more slivers

some GISs allow the user to set a tolerance value for deleting spurious polygons during overlay operations








if the tolerance is set too high, some legitimate polygons may be deleted
if set too low, some erroneous polygons will remain
deletion rules might also be based on shape, as spurious polygons tend to be long and thin


E. BUFFERING

a buffer can be constructed around a point, line or area
buffering creates a new area, enclosing the buffered object

applications in transportation, forestry, resource management
protected zone around lakes and streams
zone of noise pollution around highways
service zone around bus route (e.g. 300 m walking distance)
groundwater pollution zone around waste site

options available for raster, such as a "friction" layer, do not exist for vector

buffering is much more difficult in vector from the point of view of the programmer

sometimes, width of the buffer can be determined by an attribute of the object
e.g. buffering residential buildings away from a street network:
three types of street (1, 2, 3 or major, secondary, tertiary) with the setbacks being 600 feet from a major street, 200 feet from a secondary street, and only 100 feet from a tertiary street

problems with buffer operations may occur when buffering very convoluted lines or areas










REFERENCES
Documentation for ARC/INFO (user manuals, Understanding GIS) provides an overview of vector GIS functionality for a commonly available system.

Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment, Clarendon, Oxford. Chapter 5 on data analysis.

Lusardi, Frank, 1988. The Database Expert''s Guide to SQL, McGraw-Hill Book Co., New York. Good introduction to Standard Query Language.


EXAM AND DISCUSSION QUESTIONS

1. Compare the buffer function in raster and vector systems, in terms of results, options offered by systems, and flexibility.

2. The skeleton function moves the boundary of an area object inwards rather than outwards. Show how this would work using a simple diagram, and what happens as the amount of movement increases. What objects might be created by this operation, and what uses can you devise for them?

3. What are spurious polygons and what characteristics of data cause them?

4. Section D listed three types of topological overlay between points, lines and areas. Are there others? What applications might they have?

SPATIAL RELATIONSHIPS IN SPATIAL ANALYSIS
INTRODUCTION
Review

B. ANALYSIS OF ONE CLASS OF OBJECTS
Using attributes
Using locational information

C. ANALYSIS OF OBJECT PAIRS

D. ANALYSIS OF MORE THAN ONE CLASS OF OBJECTS
Shortest path example
What spatial objects are required?
Spatial interaction example

E. ANALYSIS WHICH DEFINES NEW OBJECTS
Buffer example
Street noise example
Trade area example
Polygon overlay example

F. GIS ANALYSIS FUNCTIONS
Measure
Coordinate transformation
Generate objects
Select a subset of objects
Modify attributes of objects
Dissolve and merge area objects
Generalize or smooth lines
Compute statistics for a set of objects
Topological overlay
Operations on surfaces
Network analysis
Input and output management

REFERENCES

SPATIAL RELATIONSHIPS IN SPATIAL ANALYSIS
Review

B. ANALYSIS OF ONE CLASS OF OBJECTS
Using attributes
Using locational information

C. ANALYSIS OF OBJECT PAIRS

D. ANALYSIS OF MORE THAN ONE CLASS OF OBJECTS
Shortest path example
What spatial objects are required?
Spatial interaction example

E. ANALYSIS WHICH DEFINES NEW OBJECTS
Buffer example
Street noise example
Trade area example
Polygon overlay example

F. GIS ANALYSIS FUNCTIONS
Measure
Coordinate transformation
Generate objects
Select a subset of objects
Modify attributes of objects
Dissolve and merge area objects
Generalize or smooth lines
Compute statistics for a set of objects
Topological overlay
Operations on surfaces
Network analysis
Input and output management

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

Having established a basis of the fundamental concepts in GIS data structures, this unit begins a large module looking at how GIS can be used. First we look at how spatial relationships can be analyzed and then present a summary of the range of functions that fall within present GIS capabilities.

UNIT 15 - SPATIAL RELATIONSHIPS IN SPATIAL ANALYSIS


A. INTRODUCTION

Review


the types of spatial objects are points, lines, areas, raster cells (Unit 10)

these object types are digital representations of phenomena
are defined by their dimensionality
are sometimes further subdivided
e.g. line objects are divided into chains, strings etc. in DCDSTF

an entity type is a type of phenomenon, e.g. church, city, highway, lake
the same entity type may be represented by different types of objects at different scales
e.g. a city may be a point at one scale, an area at another

in the database, several different types of entities may be represented by the same type of object
e.g. points may represent both cities and churches

an object class is a group of objects of the same type, representing the same type of entity
e.g. cities and churches are different classes of the same type of object (point)

the number and meaning of attributes is the same for all objects in a class
e.g. church: denomination, capacity, date of construction
e.g. city: name, population, date of charter

object attributes may use various measurement scales (i.e. nominal, ordinal, interval, ratio) - see Unit 6

we think of a class of objects and its attributes as a table with rows corresponding to objects and columns corresponding to attributes

classes of objects can be grouped into layers
sometimes only one per layer, depending on the system

the power of a GIS comes from its ability to store relationships among and between objects - see Unit 12
relationships can be between objects of the same class
more often between objects of different classes

relationships can identify object pairs which have their own attributes

using this framework of spatial objects and relationships, the range of analysis possible with a GIS is explored


B. ANALYSIS OF ONE CLASS OF OBJECTS
Using attributes


need only a single attribute table
might be any object class

example using city neighborhoods:
attributes include:
population count: ratio scale
household count: ratio scale
average income: $000s per household, ratio scale
name: nominal scale
average household expenditure on automobile purchases/year: $000s, ratio scale

GIS operations on the attribute table might include:
very primitive forms of analysis or enquiry are possible
list neighborhoods by average income
simple data retrieval, print table
list all neighborhoods with average income greater than $40,000
select records satisfying criterion, print table
compute the mean expenditure on automobile purchases
requires weighting by household count in each area
compute and print result
look for relationship between average income and average expenditure on automobiles
retrieve data and plot a graph

all of these are capabilities of standard databases, e.g. DBase III

none use GIS capabilities, no access to spatial data (locations) required

Using locational information

make a map of average household income, shading each neighborhood accordingly
requires locational information to plot outline of neighborhood

shading might be determined by a function of more than one attribute, e.g. the ratio of household expenditure on automobiles to household income
this type of capability is offered by automated mapping packages

compute the area of each neighborhood and store it as a new attribute
area computed from locational information (digitized outline of neighborhood)
useful in making meaningful maps
e.g. map population density as population divided by area rather than simply mapping population over variable sized areas

other similar measures that can be computed include perimeter, centroid location, distance, e.g. from downtown


C. ANALYSIS OF OBJECT PAIRS

e.g. pairs formed from each combination of neighborhoods, including neighborhoods paired with themselves
with 5 neighborhoods have 15 combinations
with n neighborhoods have n(n+1)/2 combinations
example attributes:
distance
number of commuters in each direction
time to travel by public transit

draw a map of interactions overhead - Net flows between states
map becomes exceedingly complex if all pairs are shown
may need to show only most important flows, e.g. to downtown

analyze commuter trips by time of travel by public transit
e.g. how many take over 20 mins, 40 mins, 1 hour
produces results in tables

SPATIAL RELATIONSHIPS IN SPATIAL ANALYSIS
ANALYSIS OF MORE THAN ONE CLASS OF OBJECTS

one of the major strengths of GIS analysis

Shortest path example

find the shortest path through a street network between two places
useful for fire truck dispatching, cabs, delivery vehicles
navigation systems are being developed for mounting in vehicles
able to display current locations and map of surrounding streets, follow vehicle''s path on map
able to compute recommended route to given destination

What spatial objects are required?

links in the network
attributes include length
also factors such as traffic counts, congestion, number of lanes, average speed are important in finding optimum route

nodes in the network
intersections allow route to move from one link to another using link-node relationships
important attributes include presence of traffic light, overpass or underpass

what about turn restrictions?
turn restrictions are not attributes of links or nodes
e.g. "no left turn" is an attribute of a pair of links - cannot turn from link A to link B

thus a link-link object pair is needed (see Unit 12 for more on object pairs)
some systems define a "turn table" which is equivalent to this link-link object pair

what about stop signs?
stop signs are not attributes of nodes but are determined by the direction of entering the node, irrespective of the exit link
thus a link-node object pair is needed

Spatial interaction example

useful for prediction of customer behavior
handout - Spatial interaction data


given 3 shopping centers, represented as points
attributes: parking spaces, number of stores, quality of signage, age of construction

given 5 neighborhoods, represented as areas
attributes: population count, average household income, average age

given information on # of shopping visits by neighborhood by shopping center (information gathered by street interviews)
these are attributes of neither shopping centers or neighborhoods, but of the object pairs
produces 15 object pairs plus attributes, including distance

commonly used model in this situation is a "Spatial Interaction Model" (SIM) - requires information on:
neighborhoods (from attributes of area objects), e.g. average household income
shopping centers (from attributes of point objects), e.g. number of parking spaces
spatial behavior (from attributes of object pairs)
e.g. for each object pair, divide number of trips to appropriate shopping center by population of appropriate neighborhood, plot against distance

handout - Spatial interaction modeling


E. ANALYSIS WHICH DEFINES NEW OBJECTS

many GIS operations produce new spatial objects from old ones

may be same or different type, e.g. points producing points or points producing areas

new objects may have attributes of the old objects which created them

Buffer example

build a buffer zone (area) around a stream network (a layer of line objects)
stream layer has attributes of each stream link, including ID, discharge, length, depth

buffer operation creates area objects
may be: (a) an object for each link (b) one merged object for the entire network

attributes of the new area object:
in case (a) - length, ID, discharge, depth of channel in the buffer object (from link attributes)
in case (b) - total length

Street noise example

street is a line object with attribute "traffic count"

apply an equation to convert "traffic count" attribute to "noise level"

build a buffer of 500 m around line

attach noise level attribute to the new area object

further development:
includes houses as point objects
identify all houses lying in buffer ("point in polygon" operation)
attach noise level attribute to all houses lying inside area object
produce list of all such houses, generate mailing labels from database and mail announcements of meeting to protest noise

Trade area example

given list of customers of shopping center, with home locations (point objects)
create new attribute for each point giving distance to shopping center
calculate average distance from all points

find the "trade area" of the shopping center
e.g. draw a circle with radius equal to the average distance to all customers

produces an area object
attach count of customers within trade area as an attribute of the new object

Polygon overlay example

perhaps the most important operation in GIS

given two classes of area objects
e.g. two maps for the same area, one showing soil types, the other vegetation zones

"overlay" the two classes of objects creating a new set of area objects
every new area object has two sets of attributes - soil type (copied from the soil map) and vegetation (copied from the vegetation map)

F. GIS ANALYSIS FUNCTIONS

functions should be defined independently of technical issues, understandable by users with little technical knowledge of GIS, independently of data model
e.g. "buffer" - does not depend on choice of raster or vector, or require knowledge of technical detail

functions are used to translate needs into specific GIS operations

list of available functions is outgrowth of past GIS user needs
emphasis on resource management applications because of strength of that market sector in last 10 years
however, is no consensus on the possible domain of GIS, the total set of possible functions

some GIS claim as many as 1,000 commands
since functions and operations are defined at a higher level, each function may require several commands

overhead - GIS Analysis Functions

Measure

results become attribute of objects
measure length of line object
measure area or perimeter of area object

Coordinate transformation

results in new coordinates for points
register map to control points, transform coordinates accordingly
change projection, scale, coordinate system (e.g. lat/long, State Plane, military grid)

Generate objects

by user input, e.g. mouse, digitizer tablet
area, line, point objects
circle around point, e.g. for query
grid cell net, lat/long graticule

from existing objects in the database
buffer zones or "corridors" around points, lines, areas
areas around points by assigning everywhere to the nearest point, producing polygons (Thiessen, Voronoi or Dirichlet polygons) - e.g. to create "trade areas"

representative points in the middle of each area object (centroids)

Select a subset of objects

based on attributes, or regions, or "window"

Modify attributes of objects

by user input, e.g. keyboard

by arithmetic based on existing attributes, e.g. find density

by rules using relational and Boolean operators
e.g. if white spruce and age > 50 years then new attribute is "y"

Dissolve and merge area objects

generates new, fewer objects

Generalize or smooth lines

reduce the complexity of a line or area boundary, or smooth it, or reduce the number of digitized points needed to represent it ("weed")
diagram










note: generalization is a very complex topic which is covered in detail in Unit 48

Compute statistics for a set of objects

count them

total or average a selected attribute

compute statistical indices, e.g. standard deviation, correlation

Topological overlay

point in polygon

line on polygon

polygon overlay
sliver polygon removal

Operations on surfaces

mostly for topographic surfaces
recall there are several methods for digital representation, e.g. digitized contours, grid of heights (DEM), mosaic of triangles (TIN) - Unit 11

estimate height at a point

find profile of surface along a line, e.g. a stream profile

compute contours (line objects) from grid of heights, and vice versa

compute grid of slopes, aspects

find area objects of slope or aspect categories, e.g. slope &LT5%

find watershed boundaries from DEM

find the area visible from a point (viewshed)

Network analysis

many types of analysis can be carried out on networks, for transportation planning, utility management, airline scheduling, navigation
find shortest path through the network between selected points
determine whether one point on a stream network is downstream or upstream of another
find the parts of the network which can be reached within a given travel time from a selected point

Input and output management

applications often consist of existing analytical packages running in conjunction with a GIS
GIS does the "housekeeping" - handling data input and providing advanced output capabilities
analytical package solves the problem


REFERENCES
Goodchild, M.F., 1988. "A spatial analytical perspective on GIS," International Journal of Geographical Information Systems 1:327-34. Examines the relationship between spatial analysis and GIS and discusses key issues.

Goodchild, M.F., 1988. "Towards an enumeration and classification of GIS functions," Proceedings, IGIS: The Research Agenda. NASA, Washington, DC, II:67-77. Develops categories of analysis and provides examples.

Unwin, D., 1981. Introductory Spatial Analysis, Methuen, London. A discussion of spatial analysis with a framework much like that provided by GIS: illustrates the range of spatial analysis.

Upton, G.J.G. and B. Fingleton, 1985. Spatial Data Analysis by Example, Vol I: Point Pattern and Quantitative Data, Wiley, New York.


EXAM AND DISCUSSION QUESTIONS

1. Compare the classification of map analysis functions in this Unit with that proposed for raster functions by Berry and referenced in Unit 5.

2. Can GIS functions be described and discussed generically, or is the raster/vector distinction unavoidable?

3. Make a list of 6 (10? 15?) functions which should be included in GIS because of their importance in various kinds of analysis of spatial data but which are not listed in this unit.

4. Define "object pairs" and give examples of their use in applications in (a) hydrology, (b) transportation planning.

5. "Spatial data is distinguished by its wealth of possible relationships between objects, and by the need to qualify such relationships in different ways". Discuss.

6. Design a database for an airline reservation system - what types of entities and relationships would be needed, and what attributes of each entity? Would the concept of an object-pair be useful?

OUTPUT
Compiled with assistance from Jeffrey L. Star, University of California at Santa Barbara

A. INTRODUCTION
Types of output
B. TEXT OUTPUT
Tables
C. GRAPHIC OUTPUT
Graphics peripherals
Raster and vector devices
D. HARDCOPY OUTPUT
Line printers
Dot matrix printers/plotters
Pen plotters
Optical scanners
E. CRTS (CATHODE RAY TUBES)
Storage tube technology
Refresh image technology
Color
Bit planes and palettes
3-D display
Memory and processing components
F. GRAPHICS STANDARDS
REFERENCES
DISCUSSION AND EXAM QUESTIONS
NOTES

OUTPUT
A. INTRODUCTION

output from GIS does not have to be a map
in fact, many GIS are designed with poor map output capabilities
Types of output
text - tables, lists, numbers or text in response to query
graphic - maps, screen displays, graphs, perspective plots
digital data - on disk, tape or transmitted across a network
other, not yet common
computer-generated sound
3D images
B. TEXT OUTPUT
perhaps more important than maps for reporting results of analysis
results might be a list or table of selected objects with attributes
queries might result in numerical results, e.g. totals, distances, areas, counts
text output might be delivered by voice generator, e.g. navigation instructions like &QUOTturn left at next traffic signal"
Tables
e.g. list of all cuttable areas of timber, giving area, species, age, estimate of yield in board feet
list is not of great value without an accompanying map to identify each object in the list
examples of specialized lists:
personalized letter to be mailed to all households within 500 m of a planned expressway
list of all hazardous materials stored within 100 m of a fire, transmitted by FAX to firetruck
driving directions for a garbage collection route
workorder and accompanying map and marked travel route for each service vehicle operated by utility company, giving day''s work locations, nature of work
list and accompanying map of all city voting precincts ranked by degree of support for party in last election
C. GRAPHIC OUTPUT
Graphics peripherals

provide graphic input and output of maps, diagrams and charts
interactive graphics devices allow users to point to objects and identify them in their correct spatial context
an early interactive version of the Space War computer game, developed at MIT in the early 1960s, ran on a PDP-1 computer using video displays
television-like terminals became common in the early 1960''s, and are now the most common way users interact with computer systems
in the next few sections we look at devices for graphic output and the ways their development has influenced GIS
costs are approximate, correct to order of magnitude only
Raster and vector devices
graphic output devices can be classified into raster and vector groups
raster devices build a picture by filling it with uniform picture elements, usually in rows
e.g. line printers, dot matrix printers, scanners, most CRT terminals
the elements of the picture are called pixels or pels
resolution is sometimes expressed in megapels (approximately equal to 106 pixels) 640x480 pixel resolution is 0.3 megapels; 1280x1024 is 1.3 megapels
vector devices build a picture by drawing lines, shading areas etc.
e.g. plotters, storage tube CRT technology
a raster device may be driven by vector commands, which it then converts for display, and vice versa
conversion between raster and vector may thus occur at several points in a GIS between input and output
D. HARDCOPY OUTPUT
Line printers

the first common device for computer output, cost $30,000
capable of printing 200-1,000 lines of text per minute
each line composed of 132 characters in fixed positions
entire line printed simultaneously by impact of hammers on ink ribbon
limited set of printable characters
image must be created row by row from the top
drawbacks for graphical output because:
rectangular cells
resolution is fixed, e.g. 1/6 inch by 1/10 inch (4.2 mm by 2.5 mm)
fixed cell location
only limited set of characters can be printed
difficult to draw continuous lines
shades of grey must be generated by overprinting
e.g. black = A + O + B + V + X
results are best when viewed from a distance
used as the output device for SYMAP, the earliest mapping package released in 1967
most useful for repeated mapping of statistics on a constant base, e.g. census atlas, weekly reports of crime statistics
overhead - Line printer map

Dot matrix printers/plotters
image composed of rows of dots - often printed in blocks, e.g. 9 or 25 rows at a time
to create shades of grey, control the fraction of dots which are printed in any small area
the dots must be selected randomly, or &QUOTdithered", to avoid unwanted patterns
early versions used hammers on ribbons for each dot - cost $500
more recent versions use lasers and xerography technology, resolutions up to 300 dots per inch - cost $2,000
color versions are available - squirt ink from jets of three or four primary colors - cost $2,000
electrostatic plotters use rows of dots, create images of map size - cost $40,000
Pen plotters
create images by moving a pen under computer control
most are incremental - draw a line using large numbers of movements of fixed size in fixed directions
many have two motors, one for x and one for y
get diagonal movement in only eight directions by using combinations of one or both motors
however, step size is so small that lines appear to go equally easily in all directions
diagram
from a GIS perspective, an important advantage is the ability to plot on top of pre-printed base maps
this avoids having to have all of the base map information in digital form
Types of plotters
lowest cost - $2,000 - are desktop size, take standard A4 or similar paper
paper is flat
pens can be changed automatically to generate different colors
mid-range - $25,000 - are map size
paper rolls over drum
movement of pen is parallel to axis of drum
second direction of movement is by rotating drum
problems keeping paper in registration since it may move and stretch during construction of the map/graphic
problems keeping pens moist during long plotting jobs
typical plot jobs can last 3 to 6 hours
top range - $100,000 are high precision
used for drafting, map production
usually flat (&QUOTflatbed"), medium held on exactly flat surface
pen may be replaced by cutting tool on scribe coat
demo - display a map from a plotter
Optical scanners
output on photographic paper
paper mounted on inside of a rotating drum
image created in helical fashion by rotating drum and moving light source along axis of drum
common output devices for remote sensing, image processing
other devices output directly to 35 mm slide film

OUTPUT
CRTS (CATHODE RAY TUBES)
earliest (ca. 1968) could display rows of characters in fixed positions, little use for showing maps or images
Storage tube technology
Tektronix introduced terminals based on storage tube technology ca. 1970
major breakthrough in low-cost graphic display ($5,000)
images drawn by moving electron beam over screen under computer control
image is permanent, not refreshed, so must be erased completely - no selective deletion possible
Refresh image technology
terminals with refreshed images began to replace storage tube technology ca. 1975
significantly lower cost
image redrawn from internal memory 25-50 times per second
image created by lighting dots in fixed positions
resolution determined by number of rows and columns of dots - some common screen resolutions: IBM PC Color Graphics Adapter (CGA) - 320x200 Enhanced Graphics Adapter (EGA) - 640x350 Video Graphics Array (VGA) - 640x480
1280x1024 is a common resolution for high quality graphics
Color
color is created by using groups of 3 dots, glowing red, green and blue respectively when illuminated by different electron guns
different percentages of illumination create different colors
overhead - Colors for RGB
Bit planes and palettes
recall from Unit 3:
a bit is a unit of computer storage - it can be on or off
a byte is a group of 8 bits
a K (kilo) is 1024 - 64K bytes equals 65,536 bytes
to display a black and white image, the terminal or display adapter must have one bit of storage per pixel
320x200 or 64,000 bits or 8,000 bytes for a CGA monochrome image
to display color we use several bits per pixel
2 bits can have 4 combinations of on and off, so can display any of 4 colors
to display any of 16 colors requires 4 bits per pixel
if there are 4 bits per pixel, we say there are 4 bit planes
a device with 20 bit planes (common for high resolution graphics) can display any of 220 colors in each pixel
hint: to convert powers of 2 to powers of 10 approximately, multiply the exponent by 0.3 - 220 is about 106 or 1,000,000 - in fact 1,048,576
the number of bit planes determines how many colors can be displayed simultaneously
this may be different from the number of possible colors
e.g. the VGA has 4 bit planes, allows simultaneous display of 16 colors, but these can be defined using any combination of 64 levels of red, 64 of green and 64 of blue, for a total of 262,144 possible colors
the limited set of colors chosen at any one time is called the palette
the storage requirements of a VGA adapter (4 bit planes, 640x480 resolution) are 4x640x480 = 1,228,800 bits = 153,600 bytes = 150 Kbytes
the requirements for a 20 bit plane, 1280x1024 device are 20x1280x1024 = 26,214,400 bits = 3,276,800 bytes = 3200 Kbytes
3-D display
some vendors are now offering 3-D stereo display devices
these create the illusion of depth by rapidly switching between two images, one for the left eye and one for the right
a filter in front of the screen polarizes the images differently
the user wears eyeglasses containing clear polarizing filters
because each of the two images must be refreshed 25-50 times/second, the display must operate at 50-100 images/second
Memory and processing components
1. Object memory
some display devices have an optional local object memory to store the entire set of objects/vectors in the image to be displayed (often called &QUOTdisplay list")
allows rapid redisplay of objects without new input
most useful for pan or zoom, or rotation of 3-D objects
2. Vector-raster converter
since display is always by pixels it is necessary to compute raster images from vector input
e.g. which dots must be turned on to display this line?
3. Display memory
stores the color number to be displayed in each pixel
4. Color lookup table
identifies the combination of red, green and blue corresponding to each color in the current palette
by changing the color lookup table, can produce very rapid changes of color patterns on the screen without affecting display memory
5. Digital/analog (D/A) converter
converts the digital signal stored in display memory to a voltage applied to the CRT
F. GRAPHICS STANDARDS
the large number of graphics input and output peripherals have a confusing array of data requirements
the instructions sent to a laser printer may have nothing in common with those sent to create the same picture on a desktop plotter
there have been many attempts to create standards for communicating with devices
the format used by the early Tektronix storage tube terminals has been extended many times but is still widely used - known as Tektronix or 4010 format
many devices recognize the format established by Hewlett-Packard and known as HPGL
several companies have introduced common formats and provided &QUOTdrivers" to convert these for specific devices, e.g. DI-3000 and DISSPLA
the most successful recent format is PostScript, recognized by a large number of output devices
REFERENCES
Maguire, D.J., 1989. Computers in Geography, Wiley, New York. A good general introduction to computer use and spatial data: chapters 5 and 11 have excellent reviews of hardware.

DISCUSSION AND EXAM QUESTIONS

1. Low-quality dot matrix printers use 9 hammer pins in a vertical column to create each line of print at 6 lines per inch, and thus achieve a resolution of about 50 dots per inch or 2 dots per mm. Compare the output of this device to the contents of the standard topographic map.

2. Compare the desktop plotter and the color CRT screen as display devices. Which would you find more useful as a location analyst for a major retailer?

3. Cartography is constrained by the two-dimensional nature of its paper medium. What new ways of displaying spatial data can you devise to take advantage of 3-D display capabilities?

4. &QUOTRaster and vector are not only different approaches to constructing a picture, but fundamentally different ways of looking at the world". Discuss.

GRAPHIC OUTPUT DESIGN ISSUES
A. INTRODUCTION
Topics covered

B. LABEL PLACEMENT
Imhof''s basic rules
Overposting
Polygon labeling
Some simple methods

C. PRINCIPLES OF GRAPHICAL EXCELLENCE
Graphical excellence

D. DESIGN OF GRAPHIC OUTPUT
Scale
Base map
General graphic design
Screen display
Scene generation

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

GRAPHIC OUTPUT DESIGN ISSUES
A. INTRODUCTION
Topics covered

B. LABEL PLACEMENT
Imhof''s basic rules
Overposting
Polygon labeling
Some simple methods

C. PRINCIPLES OF GRAPHICAL EXCELLENCE
Graphical excellence

D. DESIGN OF GRAPHIC OUTPUT
Scale
Base map
General graphic design
Screen display
Scene generation

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

This unit introduces some fundamental concepts of graphic design. If you wish to do a more thorough job of this, consider combining this unit with the material in Unit 49.

UNIT 17 - GRAPHIC OUTPUT DESIGN ISSUES


A. INTRODUCTION


previous unit described technical aspects of GIS output

much GIS output is in the form of hard copy maps or graphic displays

design of graphic output is critical if information is to be conveyed effectively to the user
graphic output from GIS is often poorly designed
e.g. colors used randomly without appropriate scaling
conventional scale of colors used to display elevation on standard atlas maps has been optimized over centuries of cartographic experience

design can benefit from principles of cartographic design developed in cartography
screen display introduces a new set of issues because of greater capabilities compared with paper maps

also see more general treatment of visualization of spatial data in Unit 49

Topics covered

technical issues of label placement

general principles of graphical excellence

introduction to principles of map design


B. LABEL PLACEMENT

features shown on maps and displays can be differentiated and identified in various ways:
symbols, e.g. church, bridge
colors
sizes
labels

labels provide the greatest flexibility to attach descriptions to point, line and area features
names of administrative divisions, lakes, rivers etc.
elevations of contours, spot heights
highway numbers

in cartography, positioning labels is a complex and sophisticated process
there have been few attempts to write down the rules used (Imhof, 1975 is a well-known exception)
it has proven difficult to emulate these rules in automated map production or GIS
positioning labels on screen displays is especially difficult because of low resolution (e.g. 640 by 480 pixels), and the importance of speed
by comparison, a plotted map may have an effective resolution of 300 dots per inch, and an hour computing time may be acceptable

Imhof''s basic rules

names on maps should:
be legible
be easily associated with the features they describe
not overlap other map contents
be placed so as to show the extent of the feature
reflect the hierarchy of features by the use of different font sizes
not be densely clustered nor evenly dispersed

it may not be possible to satisfy all of these rules perfectly

the best solution will balance conflicting objectives, e.g. need to associate name with feature vs. need to avoid overlap of contents

label placement is a complex problem because of the vast number of possible positions that have to be searched and the number of conflicting objectives

two labelling problems are particularly significant in automated mapping and GIS:

Overposting

when features are densely packed on a map or screen, it is difficult to keep labels separated
labels may overlap (overposting)
labels must be positioned to avoid overposting, but without destroying the eye''s ability to associate labels with appropriate features

e.g. point features
optimum position for a label is above and to the right
below and to the right is less acceptable
least acceptable positions are to the left
label can be turned (non-horizontal) if necessary, but only by a small amount

overposting is a problem because the computer must search a vast number of possible positions
in practice, must limit the number of positions somehow
some solutions define a fixed number of possible absolute positions, like a raster
other solutions define a fixed number of positions relative to the feature

diagram









Polygon labeling

labelling polygons has become notorious within automated mapping as a difficult and challenging programming problem

the label should be central to the feature, may be reoriented or curved to fit the feature
diagram










in some cases the label may be connected with the feature by an arrow
diagram










Some simple methods

1. label centered on the polygon centroid

problems:
centroid may lie outside the polygon
a long label may have to be multi-line to fit inside
solution fails to meet Imhof''s criterion of showing the extent of the feature

2. variable rectangle positioned inside the polygon
search for feasible positions for a rectangle wholly enclosed within the polygon
ratio of width to height should be as high as possible
solution will not curve the label to fit the feature
largest enclosed rectangle may be in an inappropriate part of the polygon

diagram








3. Skeleton

shrink the polygon by moving its edges inward at a uniform rate
the vertices trace out a network known as the skeleton (discussed in more detail in Unit 33)

diagram






position the label along the central part of the skeleton
best for polygons like Florida which require curved labels

practical labeling methods use combinations of rules for different shapes, sizes of polygons

many developers have used the term expert system to describe label placement software
an expert system works with complex sets of rules in a rule base
the objective of the expert system is to emulate the complex decision process of a cartographer

C. PRINCIPLES OF GRAPHICAL EXCELLENCE

some very broad principles apply to the design of graphic output in general (includes graphics and charts)

the following discussion relies heavily on Tufte (1983)

Graphical excellence

gives the viewer the greatest number of ideas, in the shortest time, with the least ink, in the smallest place
maximize the data/ink ratio
erase non-data ink
erase redundant data-ink
revise and edit the graphic
it is difficult to get a good graphic first time around
mobilize every graphical element, perhaps several times over, to show the data
maximize data density and the number of data entries shown, within reason

if the nature of the data suggests the shape of the graphic, follow that suggestion - otherwise, move toward horizontal graphics about 50% wider than tall

GRAPHIC OUTPUT DESIGN ISSUES
DESIGN OF GRAPHIC OUTPUT

for GIS, graphic output must show:
features appropriately symbolized or labeled
objects computed by the GIS, e.g. buffer zones
relationships

it may be difficult to display the results of some forms of GIS data analysis because of the constraints of 2D display, e.g.:
3D data
interaction data (migration, flows of goods)
global data

Scale

the scale of output should be consistent with input scale
e.g. inappropriate to digitize from 1:1,000,000 map, display at 1:24,000 because data will not be sufficiently accurate
also inappropriate to digitize at 1:24,000, display at 1:1,000,000 without adequate generalization features will be too dense, too detailed
scale on a CRT screen is as important as on a plotted map

in principle a spatial database is "scale-free", but in practice scale is a crude indicator of data accuracy
GISs should record and track scale in the database, but do not

Base map

to be useful, a map must include information for visual locational reference
output of computed information alone is rarely useful need base map features as well
e.g. map of cuttable forest stands
needs to show locations of roads, watersheds, streams and lakes, besides cuttable stands, so user can locate stands on the ground, make decisions based on correct spatial context
particularly important in raster systems
display of a single layer is rarely useful without some form of basemap for locational reference
basemap information will normally be vector, or at higher resolution than the raster
this will be difficult if the raster system does not have vector capabilities

input of basemap information can be expensive
difficult to justify digitizing of data just to support interpretation of graphic output

can plot output on top of pre-printed base map
avoids need to digitize base map information
base map must be accurately registered
some GIS support this function

General graphic design

often desirable to create good-looking finished product
e.g. as part of professional report, presentation
undesirable to have map look "computer-produced", excessively abstract or schematic

high cost of providing cosmetic output functions in GIS
e.g. map border neatlines, symbols, north arrows, legends
complexity of programming for these features may be much greater than for analytic functions
time to plot these features may be high, particularly for pen plotters

some GIS map products are now almost indistinguishable in quality from manual cartography

is appearance really important in a map drawn to support decision-making?
GIS output maps are to be used directly, not destined for walls or map libraries
should GIS products be simple, schematic, avoid high cost of manual cartographic quality?
marketplace seems to say "no"

Screen display

issues are different here because screen is:
smaller, lower resolution than a printed or plotted map
more flexible
zoom, pan, interaction with user, animation, use of color

principles of design of screen displays are still poorly developed
black background or white?
affects perception of color
tradition (PC and mainframe terminals) is black background, Mac and many workstations use white

hard copy map must display as much information as possible to satisfy possible user requirements
because system is interactive, screen can display limited information but provide for access to more
e.g. user "clicks" on or "picks" an object with a mouse, accesses lengthy text description
access to an object''s attributes is not limited by constraints of static display

Scene generation

maps show geographic variation using symbols, objects, other abstractions of reality

GISs do not have to do this - why not show a picture of the reality? - artist''s impression?

scene generation is set of techniques for simulating real physical appearance

e.g. GIS is used to plan a ski area on a mountain which is currently forested
plan could be shown as a map, with contours, green tint for remaining forest, line objects for ski lifts
scene generation would show oblique perspective view, cover hill with trees of varying height
current technology allows appearance of trees to be varied depending on species, age
we are still some way from having hardware fast enough to do this in "real time"


REFERENCES
Freeman, H. and J. Ahn, 1984. "AUTONAP - an expert system for automatic map name placement," Proceedings, First International Symposium on Spatial Data Handling, Zurich.

Imhof, E., 1975. "Positioning names on maps," The American Cartographer 2(2):128-44.

Robinson, A.H., R.D. Sale, J.L. Morrison and P.C. Muehrcke, 1984. Elements of Cartography, 5th edition, Wiley, New York. Excellent source of map design principles.

Tufte, E.R., 1983. The Visual Display of Quantitative Information, Graphics Press, Cheshire, CT. Contains numerous examples of graphical excellence (and its opposite) in map design.

Zoraster, S., 1986. "Integer programming applied to the map label placement problem," Cartographica 23(3):16-27

A session on automatic names placement at AutoCarto 9, Baltimore, April, 1989 provides several reviews of the use of expert systems for map design:

Doerschler, J.S., and H. Freeman, "An expert system for dense- map name placement," pp. 215-224.

Ebinger, L.R., and A.M. Goulette, "Automated names placement in a non-interactive environment," pp. 205-214.

Johnson, D.S., and U. Basoglu, "The use of artificial intelligence in the automated placement of cartographic names," pp. 225-230.

Jones, C.B., and A.C. Cook, "Ruled-based cartographic name placement with Prolog," pp. 231-240.

EXAM AND DISCUSSION QUESTIONS

1. "Map output is essential to GIS, yet GIS designers have ignored or failed to implement many of the well-known principles of map design". Discuss.

2. Examine and discuss a collection of GIS output maps, such as the ARC/INFO Maps volume produced annually by Environmental Systems Research Institute, Redlands, CA, or the slides supplied with Unit 53. How effectively do they implement principles of map design?

3. Discuss the differences in graphic design principles as applied to manually produced maps, GIS hard copy output and screen displays.

4. Review the Cartographic Design chapter of Elements of Cartography, and discuss its contents in the context of GIS output.

MODES OF USER/GIS INTERACTION
A. MODES OF INTERACTION
Product mode
Query mode
Example queries by application
Continuum

B. TYPICAL QUERIES
1. Simple recall of data
2. Where is object A?
3. What is this object?
4. Summarize attributes of objects within distance x
5. Summarize attributes of objects within a region
6. What is the best route?
7. Show all of the objects satisfying the criteria
8. Use of relationship between objects

C. CHARACTERISTICS OF QUERY MODE
User training

D. PRODUCT MODE
Personnel requirements

E. USER INTERFACES
Command driven
Questions or prompts
Menu driven
Icons
Windows
Plain language interfaces

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

MODES OF USER/GIS INTERACTION
MODES OF INTERACTION

Product mode


system generates information products - lists, maps - which are later used for decision-making

user of products does not interact with the system directly

e.g. student records system generates class lists, transcripts which are used by committees, faculty to make decisions about student progress
faculty need know nothing about student record system except some conceptual understanding of (a) what information is in it and (b) what its capabilities are
e.g. can you give me a set of mailing labels for all students who took at least one course in geography last year? - anticipates (a) that the necessary data is in the system and (b) that the system has the functionality to do this

resource manager should have similar levels of understanding of the agency''s GIS
what is stored in it (data layers)
what functions a GIS can perform
e.g. can you make me a map showing all lands visible from this proposed waste disposal site?

market researcher should have similar understanding of capabilities of firm''s GIS
e.g. report numbers of people age 25-34, income over $30,000, living within concentric bands 1/4 mile wide around this proposed restaurant location

Query mode

user interacts directly with system, perhaps through an operator, to obtain answers to queries

Example queries by application

common type of query in land records office - "what easements and zoning restrictions are on the property?"
GIS can handle different ways of identifying property, e.g. by pointing to map, street address, subdivision plan, adjacency to other property

GIS can answer queries like "what police precinct(s) cover(s) this property?" by overlay of precinct layer and check for overlap

common queries for publicly accessible records in a municipal data base:
e.g. locations of water lines, sanitary and storm sewers, current, historical and proposed Official Plan designations, historical property designation, site plan details, state of approvals

navigation - "how do I get there from here?"

Continuum

query and product modes are two extremes on a continuum

vendors position themselves differently on this continuum
e.g. vendors of navigation and utilities management systems provide products at the query end while vendors of resource management systems tend to focus at the product end

choice of mode also depends on frequency of use and training
e.g. in a city records office, a clerk will operate the GIS to determine answers to specific queries while the general public will refer to updated maps printed at regular intervals from the GIS
e.g. travel agent learns commands to formulate queries for airline reservations while the traveler consults printed timetables (generated from same database) needing no technical knowledge of reservation system


B. TYPICAL QUERIES
1. Simple recall of data

given a way of identifying the object by unique attributes (name, ID number, street address, account number)
get list of attributes



other queries require searching for objects satisfying requirements
there are several different types of search queries:

2. Where is object A?

given a way of identifying the object by unique attributes (name, ID number, street address, account number)
show the location of the object on the screen along with its surroundings

scale of the surroundings depends on the application
might show house in relation to neighborhood, then use an inset map to show neighborhood in relation to city

common example of this query is address matching - finding the coordinates or location of a house/customer/lot from its street address

address matching is used extensively in market research, processing of census returns, dispatching emergency vehicles to fires
in some instances address matching is the only required function of a GIS
special case is the automobile navigation system which shows location of the vehicle on a constantly updated map on a small screen next to the driver

3. What is this object?

inverse of (2)

object is identified by pointing ("picked") with an interactive device - mouse, cursor, light pen

system returns attributes of object
e.g. street address of lot, name of owner, price of house at last sale, production of oil well

special case - given a grid of points, estimate the value of its attribute here
e.g. with a DEM, estimate the elevation at the point indicated

4. Summarize attributes of objects within distance x

extension of (3) to multiple objects related by distance

give me (summary of, total of) attributes of objects within distance x of this point (pointing to screen)

common query in site selection
give me totals for potential customers within 1/4 mile rings of this proposed location, broken down by census information, e.g. income, age, sex, occupation

statistics will be stored as attributes of point or area objects (reporting zones) and will be aggregated to respond to queries

this query service is offered by many market research companies
client dials up with coordinates (lat/long) of proposed site, queries database
large chain (e.g. bank, supermarket, convenience store) will do this many thousands of times a year

5. Summarize attributes of objects within a region

extension of (4) to user-defined polygons rather than circles centered on a point

give me (summary of, total of) attributes of objects within this area (outlining polygon on screen)
e.g. tell me how much prime agricultural land is within this area (floodplain of proposed dam)
e.g. tell me how much prime timber was burnt by the fire which burned this (query) area

common query in political, school districting
report numbers of students or voters in proposed district

6. What is the best route?

what is the best (least cost, least impact, fastest) route between these two points?

database model may be discrete (links and nodes of a network) or continuous (raster or grid)

discrete case used to dispatch fire trucks, emergency vehicles, cabs etc.
requires constant updating of link attributes to include road construction, maintenance, congestion

continuous case used to route transmission corridors, pipelines to minimize impact
used by aircraft flying N. Atlantic routes to minimize impact of jetstream if flying from East to West, maximize benefit of tailwind if flying the other way
military applications ("trafficability") in determining routes for e.g. tanks

7. Show all of the objects satisfying the criteria

show all of the objects satisfying the following criteria (defined on their attributes)
e.g. show all oilwells producing over 1 m3 per day

extension of (2) to many objects

8. Use of relationship between objects

some queries require use of relationships between objects

if relationships are not currently stored they must be computed
e.g. show all the links in the stream network downstream of this link - requires the "flows into" relationship between links
e.g. show all the oilwells in the same county as this one - requires the "is contained in" relationship between oilwells and counties
e.g. show the nearest road to this point - requires the "is nearest to" relationship between roads and the point
e.g. show the counties adjacent to this one - requires the "is adjacent to" relationship between counties


C. CHARACTERISTICS OF QUERY MODE

provide soft copy

operate in real time - maximum time allowed for response is a few seconds

often reported verbally over phone

response must be precisely what was required
e.g. phone call to information asking for number for "Bill Smith" does not want 20 possible numbers, only most likely

GIS query system often replaces anecdotal memory of staff
traditional approach to address matching relied on personal knowledge of city - e.g. London cab drivers required to know every street in London as qualification for license
address match by traditional approach can take minutes per enquiry
GIS system provides answers which are more precise, more rapid and can handle many more transactions

User training

query requires a high level of user expertise
user must make frequent use of system in order to justify and maintain the level of familiarity
use must involve a relatively small number of functions unless user is to commit large amounts of time to training
query mode works best when queries are repetitive, e.g. "onecall" operation to identify locations of underground pipes, cables prior to construction - query is always "what is near here"

query requires a friendly interface
menus, icons may be preferred over commands which are tedious to type
on the other hand trained users may prefer cryptic commands to other methods of access, e.g. airline reservations systems are command-driven, have high level of functionality, speed

MODES OF USER/GIS INTERACTION
PRODUCT MODE

product mode is necessary for more complex or recurring information products, often involving manipulation, analysis, and recombining stored data to derive new information

produces hard copy which is useful for several months after production
e.g. phone book valid for 1 year
e.g. airline guide valid for 1 month
e.g. class list valid for 1 semester
e.g. map of current census boundaries good for 10 years

products often need to be updated and reproduced on a regular basis
typical resource management agency may have 50-100 standard products to generate monthly or annually
some agencies may require generation of as many as 100 different map sheets to cover jurisdiction

since there is no immediacy of demand, production can be treated as a batch processing operation according to a planned schedule
contrast this with query mode which places variable load on the system
to maintain minimum response time system must be configured for worst-case load, e.g. 10am

due to the repetitive nature of the update of these products, their production can be pre-programmed through the use of macros
a macro is an ordered list of computer operations designed to generate a standard result
macros, initiated with a single command, will cause the execution of the long and complex set of required operations

Personnel requirements

GIS Analyst
analyst required to translate user needs into products
identify necessary data layers

develop appropriate data collection strategies, plans
design sequences of GIS functions to generate products from layers
design products to meet needs of users
design of products requires personnel with ability to:
conceptualize the sequence of GIS processes required
construct algorithms for compiling the data
design the report format so that the information that is provided is worthwhile to the users
these people must have:
understanding of subject matter to interact effectively with decision-makers who need information
level of technical expertise to develop GIS operation sequences to produce specified product
clear understanding of limitations of technology and data

GIS technicians
need to know technical aspects of the operation of the software and hardware
will use the macros to produce required products

users
generally product mode requires little technical expertise on the part of ultimate users of the data since products will usually be in traditional form - e.g. maps and tables


E. USER INTERFACES

overhead - Types of user interfaces

there are several types of user interfaces:

Command driven

user types commands at generic prompt - e.g. C>

commands usually cryptic

user must absolutely follow system-defined syntax by using precise spelling and punctuation rules
can be frustrating for poor or slow typists

may be very large number of many commands, e.g. near 1,000 in some GISs

online help may reduce need to learn all rules and syntax, especially for the infrequently used commands

"toolbox" metaphor used to describe the collection of possible functions available at any time

Questions or prompts

system asks user for responses in sequence to determine parameters for analysis and output

limited range of operations

common in programs with minimal or restricted functionality

Menu driven

user picks options from menus by pointing or by typing single letters or numbers

menus present the only options which are possible at that time

consequences of choice may be listed beside each option

with familiarity, complex menu systems become tedious to use

may not provide the flexibility of command driven systems

Icons

a form of menu system providing symbolic icons to represent options available

user drives system by pointing to icons for some operations and using menus for others

some users feel that symbolic systems can be more intuitive to learn and operate than verbal ones
e.g. measure of ease of use is whether "a child of 10 can do something useful with the system without any prior training"

Windows

GIS interfaces should take advantage of the nature of spatial data
there are two natural modes of access to spatial data - through maps and through attributes

the more sophisticated systems now use multiple windows with different ones for textual and graphic images

windows allow several views of the map at once
i.e. full area and zoomed in image

Plain language interfaces

efforts to construct plain language interfaces for GIS have not been successful to date
range, syntax of geographic queries is too large
stressing language ignores the importance of visual access through maps, pointing etc.


REFERENCES
Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment, Clarendon, Oxford. See pages 9-10 for a short review of different types of queries.


EXAM AND DISCUSSION QUESTIONS

1. Describe the differences between query and product mode interaction with GIS. Which mode would you select in constructing a GIS for a state department of transportation?

2. Discuss the use of product and query modes in accessing (a) phone directory information, (b) airline flight information, (c) TV program guides.

3. How will geographical access to records most affect people''s lives in the coming decade?

4. What do you anticipate will be the level of production of 1:100,000 quad sheets by the US Geological Survey in the year 2010 - higher, lower than today or zero?

GENERATING COMPLEX PRODUCTS
. STEPS IN DEFINING A GIS PRODUCT

B. EXAMPLE GIS PRODUCT DEFINITION
Decisions
Information needed
Data needed
Processing steps
Summary of functions needed

C. PRACTICAL PROBLEMS
Management demands
Data not available
Data is available, but there are problems
Data in wrong format
Complexity of decision rules

D. SITE SUITABILITY
Spatial search
Assigning suitability
Decision theory
Sensitivity

REFERENCES

EXAM AND DISCUSSION QUESTIONS

EXERCISE

NOTES

GENERATING COMPLEX PRODUCTS
A. STEPS IN DEFINING A GIS PRODUCT


what decisions have to be made?

what information products are needed to make those decisions?
e.g. decision is whether student should be allowed to graduate
information product is student''s transcript, generated from student records database
e.g. decision is where to put access road
information products may include perspective plots, location of timber stands

what data must be input or available to the system to make those products?
need to know geographical coverage required, thematic data needed and sources of data

what GIS functions need to be carried out on the data?


B. EXAMPLE GIS PRODUCT DEFINITION
Decisions


National Forest must manage forest land for multiple uses
one use is recreation which may conflict with other uses, e.g. wildlife, timber harvesting

question is "Where are the most accessible areas which could be considered for the development of recreation facilities?"
accessibility is defined in this example in terms of proximity to public roads
the best areas are large and close to roads

Information needed

a map showing forest lands, classified according to accessibility for recreation
a scale of 1:24,000, or larger if possible

the map should show
zones and associated accessibility classes for Forest Service land
base map information - roads, railroads, cities and towns, Forest Service boundary

Data needed

roads and railroads - from standard 1:24,000 topographic map

Forest Service management area - have been drafted on 1:24,000 topographic map from legal descriptions
shown as many individual areas, most are contiguous but some are not

city and town boundaries - have been drafted on 1:24,000 topographic map from legal descriptions

data will be input as 3 layers overhead - Accessibility analysis data
management area boundaries as area objects (layer A1)
roads and railroads as line objects (layer B1)
city and town boundaries as area objects (layer C1)

Processing steps

overhead - Project flowchart
overhead - Project steps (6 pages)

1. using the forest service areas data (layer A1), assign a new attribute FORESTLAND, value = 1 if area is forest service land, 0 otherwise

2. dissolve boundaries between areas with the same FORESTLAND value, and merge areas to create new area objects with one attribute - FORESTLAND - call the new layer A2

3. using the transportation map B1, select public access roads only - call the new layer B2

4. generate buffers 0.5 miles wide around all objects in layer B2 - call the new layer B3 - assign the attribute INHALF a value of 1 for the buffer area, 0 outside

5. generate buffers 1.0 miles wide around all objects in layer B2 - call the new layer B4 - assign the attribute INONE a value of 1 for the buffer area, 0 outside

6. topologically overlay the objects in layers A2, B3 and B4 (some systems may require two steps to overlay three layers) to obtain layer B5 with area objects with the following attributes: FORESTLAND INHALF

INONE

7. using the city and town boundary layer (C1), assign a new attribute URBAN, value 1 for areas of cities or towns, 0 otherwise

8. topologically overlay the objects in layer C1 with those in layer B5 to obtain layer B6, adding attribute URBAN to the three attributes in B5

9. assign a new attribute ACCESS to the objects in B6 using the following rules:

Value Criteria 0 not forest land (FORESTLAND=0) FU forest land and urban (URBAN=1) 1 forest land, non-urban and within 0.5 miles of rail/public road (INHALF=1) 2 forest land, non-urban, outside 0.5 miles but within 1.0 miles of rail/public road (INONE=1) 3 forest land, non-urban, outside 1 mile of rail/public road (INONE=0)

note that the criteria are ordered so that in each step only areas that have failed all of the previous tests are considered
e.g. test for FU assumes that the object has already failed the prior test FORESTLAND=0
10. Dissolve boundaries and merge areas with the same value of ACCESS

call this new layer B7
assign unique ID numbers to each new object
11. Measure the areas of objects (in hectares) in B7 and assign this attribute to each object as AREA

12. Modify attribute ACCESS for cases where ACCESS ="1" and area is greater than 2500 to "1A": If ACCESS = "1" and AREA >= 2500 then ACCESS="1A"

13. Create a plot showing: Forest Service ownership boundary (layer A2) All roads and railroads (layer B1) All cities and towns (layer C1) Area objects in layer B7, shaded by value of ACCESS attribute and labelled with ID number assigned in step 10

14. Create a list of all area objects in layer B7, showing the following attributes: ID (number assigned in step 10) ACCESS AREA


Summary of functions needed

Assign a new attribute (from existing attributes based on mathematical or Boolean operators)

Dissolve boundaries and merge areas (based on value of specified attribute)

Select objects (based on attributes satisfying mathematical or Boolean operators)

Generate buffers (to a specified width around line objects)

Topologically overlay (two or more layers of area objects to obtain a new layer of area objects)

Measure area (of area objects, assigning values to a new attribute)

Modify an attribute (selectively, based on mathematical or Boolean operators)

Create a plot (of specified classes of objects, showing selected attributes, using various symbol, shade and label options)

Create a list (of a specified class of objects, showing selected attributes, plus subtotals, totals etc.)

once this sequence of operations has been worked out, it is very easy to design a macro which will automatically execute this sequence of steps whenever a layer is updated

GENERATING COMPLEX PRODUCTS
PRACTICAL PROBLEMS

practical problems commonly encountered when trying to produce information products include:

Management demands

to demonstrate the value of the GIS to management, and thus ensure continued support, some useful products will need to be available very early in GIS system implementation
but products cannot be generated until all the needed data have been input
data input, production need to be coordinated so that some useful products appear quickly

mandates change, the responsibilities of the agency change from time to time
"drop everything you''re doing - we need x"
difficult to operate a systematically designed approach to GIS when agency''s responsibilities are poorly defined or too flexible

Data not available

information product may require input data which is not currently available as a digital data layer
data may have to be collected, compiled and input to support the product

most agencies do not plan their data acquisition systematically to support their decision-making mandates
introduction of GIS into an agency often forces much more systematic data planning

has led to the development of interagency committees to allow for sharing of data needed by several agencies

Data is available, but there are problems

scale of data is much too small
e.g. a geology coverage is essential, but none is available at a suitably large scale
tempting to use the small-scale coverage, but results will be questionable

geographical coverage is incomplete, or currency varies, or accuracy varies
e.g. parts of the city are accurately mapped, other parts are poor quality
e.g. topographic maps have widely different update dates
how to warn the user when the quality of data changes within a data layer

Data in wrong format

data may already be digital, but in inappropriate format

e.g. forest management agency has forest inventory maps digitized as raster cells
16 by 16 array of cells for each 1 sq km of the area managed
each cell contains stand number
stand numbers point to attribute table of stand attributes
these data had been used for simple tables, measurement of area based on counting cells
data must now be merged into a vector GIS to support forest management functions
raster/vector conversion creates area objects, boundaries follow old pixel edges

diagram









Complexity of decision rules

rules for assigning new attributes can be complex, difficult to specify precisely

e.g. rules for areas loggable for "Mature Sawlogs":
overhead - Decision rules - Mature Sawlogs

species association: S = spruce SH = spruce/hardwood HS = hardwood/spruce
primary species: WS = white spruce
texture:
requires fine through moderately coarse soils
drainage:

requires inundated through wet soils as logging occurs when soils are frozen

objects must satisfy all of conditions 1 through 5 to be acceptable

assignment of "suitability" attributes can involve as many as 100 attributes


D. SITE SUITABILITY
Spatial search


spatial search uses attributes to search for most suitable or most profitable or least noxious locations for activities or facilities
the activity/facility might require a single point location, a line or an extended area
point location examples: wells, observation towers
line location examples: power transmission, oil and gas pipelines, highways
area location examples: campsites, logging, airports, waste disposal sites

requires measure of suitability derived from many underlying layers or attributes by assignment rules

Assigning suitability

the process of combining many data layers into a single layer of suitability has been called cascading
many tens of layers may be involved in creating one index of suitability
cascading rules can include arithmetic, conditional, logical operations, recoding

e.g. a study to locate a power transmission corridor through an area about 100 km across
used 30,000 cells each 500 m square
used over 100 data layers which were cascaded into a single index of suitability ranging from 0 (impossible for route) to 5 (best)
some example layers: existing power corridor (yes/no) soil capability for agriculture (score from 1 - best - to 7) urban area (value = population density)

some types of spatial search are atomistic
suitability depends only on the characteristics of the place itself

other types of search are holistic

suitability depends not only on characteristics of the place but also on locations of other facilities
e.g. a point is not good for a firetower if there is already a firetower one kilometer away
e.g. it makes no sense to consider individual pixels as locations for a highway route - the route as a whole has to make sense

how to determine rules for assigning weights to contributing factors?
different decision-makers will have different preferences - how to find consensus?
e.g. committee has to recommend route for new power transmission corridor
one member wants to preserve agriculture, suitability gives negative weight to farmland
another wants to preserve natural areas, suitability gives negative weight to wetlands, woodlands, positive weight to farmland
one wishes simply to minimize construction cost of the power line by using the shortest route

Decision theory

provides methods which have been used to implement complex choice rules

a single utility function (SUF) defines the importance given to each value of an attribute by a decision-maker
e.g. to the agricultural representative, the attribute CROP_TYPE may be valued as follows: cornland is worth 0.6, pasture 0.2, irrigated tobacco 0.8

a multiple utility function (MUF) defines the importance given to each attribute (or group of attributes) in the overall measure of suitability
e.g. to the agricultural representative, CROP_TYPE gets 0.9, CONSTRUCTION_COST gets 0.1

SUFs and MUFs can be elicited from decision-makers by systematically presenting combinations of options and asking for preferences
since different individuals give different weights to factors, there are methods for combining preferences which have been expressed using different MUFs

decision theory is important to GIS because of the number of applications using spatial search

see Unit 57 for more on multiple criteria decision making

Sensitivity

many choices have to be made in defining GIS operations, attribute assignment rules, SUFs and MUFs
these choices may be difficult to make
how to balance conflicting objectives?

to assist, it may help to know how sensitive the results are to these choices
when the results are sensitive, choices need to be made carefully and accurately
when the results are insensitive, choices can be made less carefully
e.g. must be very sensitive to the impact on endangered species (cannot use sites with endangered species) while different slope aspects may have little effect on the result

need to distinguish between sensitivity in principle and in practice
in principle, preserving wetlands may be a high priority concern
in practice, there may be no wetlands in the study area
in practice, wetlands may extend across the study area, so any route will have to cross them and create the same impact
Unit 46 discusses sensitivity in more detail


REFERENCES
Massam, B.H., 1980. Spatial Search, Pergamon, London. Excellent discussion of spatial search and applications of decision theory.

French, S., 1986. Decision Theory: An Introduction to the Mathematics of Rationality, Halsted, New York. Good source on decision theory.


EXAM AND DISCUSSION QUESTIONS

1. How much flexibility is there in the sequence of operations in the recreation accessibility example? What changes could be made to the sequence without affecting the result? Can you devise a diagram or flow chart to show this?

2. Describe the relevance of decision theory to spatial search using GIS, with examples.

3. What is the difference between sensitivity in principle and sensitivity in practice in the result of a spatial search?

4. Describe the process you would follow as a consultant working with a resource management agency to determine the information products required from a GIS, and to plan the development of the GIS database.


EXERCISE

The Milk Marketing Board of Dairyland has been developing a network system for the management of the collection and distribution of milk across Dairyland. What is involved is the routing of 425 trucks varying in capacity from 9 000 to 45 000 liters, to collect approximately 2.3 billion liters of milk per year from 9 800 producers. The milk is carried over several thousands of kilometers of roadways to processing plants, within a very confining time period of two days.

Design a GIS database to include information on the road network, production quantities at farms, processing capacities at dairies, requirements at markets (cities) and amounts shipped and shipping costs from each farm to each processor and from each processor to each market. Define the functions the system will need to (a) produce maps of the producers, processors, markets and shipments, (b) produce tables of quantities produced, processed, marketed and shipped, (c) evaluate changes such as closure of a processor, expansion of a market, change in production levels.

Write a proposal for such a system to be submitted to the Milk Marketing Board.

GIS AS ARCHIVES
INTRODUCTION
Archive databases
Project databases

B. NATURE OF ARCHIVES
Currency of data
Use of archives
Data for digital archiving

C. EXAMPLES OF SPATIAL DATA ARCHIVE SYSTEMS
CGIS
MIDAS
STORET
LUNR
NARIS
MLMIS

D. FATE OF ORIGINAL ARCHIVE SYSTEMS
Platform
Static data
Distribution
User interface
Costs vs benefits
Development of general purpose systems

E. SPATIAL DATA ARCHIVES TODAY

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

GIS AS ARCHIVES
. INTRODUCTION

Archive databases


are spatial databases developed as stores of information for general use

provide coverage of some political jurisdiction, e.g. globe, nation, state, province, county

purposes often not clearly articulated

system provides limited functionality oriented toward data retrieval

Project databases

are spatial databases developed to support specific projects

coverage for study area only

purposes usually better articulated

system provides functionality adequate for project


B. NATURE OF ARCHIVES

map library is traditional archive

data is partitioned by both theme and geography
by theme, e.g.:
base maps (usually topographic) include roads, railways, surface hydrology, topography, depending on scale
individual thematic map series, e.g. vegetation, soils, transportation, energy
by geography, e.g.:
USGS 1:24,000 sheets are organized by state, within state by name of sheet
index map shows sheets in correct geographical positions

most atlases and map libraries partition data primarily by geography, secondarily by theme

digital spatial data archives tend to be organized in an opposite way
primary key is thematic

topographic data is produced by USGS using DLG (Digital Line Graph) and DEM (Digital Elevation Model) format
street network data is produced by Bureau of the Census in TIGER format
remotely sensed images are produced by NASA and other space agencies

secondary key is geographical
topographic data organized by map sheet
TIGER organized by county
Landsat images organized by scene

Currency of data

to be suitable for archiving, data must be stable through time
some geographical data never changes - e.g. Census data and satellite images as a representative slice in time never change
some changes rarely - e.g. topography and hydrography
some changes rapidly - e.g. street networks in rapidly developing cities

in some cases geographical data needs to be available both in archived and current forms
e.g. street network data from TIGER
needs to remain in census period format so that census data can be referenced to it
needs to be current for use in navigation systems, postal delivery

advantage of digital archives is that updating, when it does occur, does not require reissuing of hardcopy products
updates can be made to central archive
updated version can be transmitted by high speed links or distributed as digital tapes

Use of archives

archives should be established to meet user needs
however, users and their needs are often not adequately assessed
need to ask:
who really needs this type of information? for what purposes?
would it get used?
does it need to be digital?

consider the example of a commonly used archive, the phone directory
how often does the need to look up phone numbers occur?
must be frequently enough to justify the cost of providing phone books with every phone
however, even this obvious need for access to phone numbers has not yet justified the development of an on-line database accessible to individual phone customers - access must be through operator

(in France the phone directory is available on-line to individuals through an interactive TV system - however, this may be due more to the facts that the TV and telecommunications systems are state-owned and the government is strongly interested in promoting new technology - presumably benefits do not justify the cost of similar systems in North America and elsewhere)

what does this say about the need for digital geographical archives to replace such things as road maps and area code maps?
the phone book archive is very simple compared to the complexity of databases and queries for geographical archives
would a digital road map be used frequently enough to justify its cost and updating?

Data for digital archiving

therefore, data suitable for archiving must
be of sufficiently general use to justify cost
remain current for sufficient period of time, or be capable of constant update
be sufficiently self-explanatory that users do not encounter significant problems of interpretation
e.g. user must have access to definition of each object or attribute or attribute value
e.g. user should have access to a data quality report

GIS AS ARCHIVES
C. EXAMPLES OF SPATIAL DATA ARCHIVE SYSTEMS

many large raster GIS databases have been built as inventories of natural resources, land use, etc.

CGIS

started 1962

Canada Geographic Information System

designed to allow computer-assisted analysis (measuring area, overlaying different themes) of the data collected by the Canada Land Inventory

many technical innovations

one of two claims to original development of the term GIS - the other at Northwestern University (Marble)

MIDAS

1964 - U.S. Forest Service

grid cell overlay and modeling

first full service GIS for natural resource management

STORET

1964 - U.S. Public Health Service

Division of Water Supply and Pollution Control.

standardized the data collected by different organizations relating to water quality, flows, treatment processes and location

LUNR

is a well-documented example of an early inventory project for the State of New York (see Tomlinson et al., 1976)
cells were 1 km square which is too large for most planning purposes
values could be recorded for percentages within cells, but with no information on where these proportions were located in the cell

NARIS

Natural Resource Information System

active in Illinois in 1970s

raster system based on 40 acre cell

funded by Ford Foundation, state agencies

located at Center for Advanced Computation, University of Illinois

software never adopted for any other purpose, despite the large investment

MLMIS

is a successful resource inventory for the State of Minnesota - see Unit 9
the software developed as part of the analysis and delivery effort for MLMIS, EPPL, has been widely distributed for general purpose use
MLMIS cells are 40 acres, but further subdivision is used


D. FATE OF ORIGINAL ARCHIVE SYSTEMS

most of these original archive systems have become obsolete, expensive experiments that are no longer used

Why?

Platform

mainframe systems can not compete with low-cost mini and micro platforms which appeared in the 1980s

mainframes tend to have unique operating systems which limit portability of software and data
many mainframe GIS systems of 1960s and 1970s were never "ported" to newer platforms (e.g. CGIS)

entire investment in system on a large mainframe may have to be abandoned because of costs of converting to new platform or operating system

high cost of keeping large databases on-line (accessible within seconds) on 1970s platforms
on-line storage available on typical 1989 workstation would have cost $000s/week on 1970s mainframe
these recurring costs could not be offset by revenue generated by database
alternative was to store data on tape, incur access delays and maintenance costs

Static data

systems assumed static data, did not allow for updating

Distribution

high cost of communicating with mainframes over long distances in 1970s - no high speed, low cost digital communication links

access effectively restricted to users at central site
difficult for remote site, e.g. county planning office, to justify cost of on-line connection

User interface

command syntax like plain English but rules are still extensive, difficult to learn

how many users would need the data often enough to justify learning the syntax?
e.g. how long does it take a travel agent to forget how to use the airline reservation system?

command-driven interface requires lengthy, accurate typing

difficult to make interface user-friendly, forgiving
terminals of 1970s did not offer color, operated at frustratingly slow communication speeds

Costs vs benefits

use failed to materialize at sufficient levels to justify costs
inadequate user survey
system "over-sold"
inadequate anticipation of high recurring costs
cost over-runs on development

Development of general purpose systems

many of the old, special-purpose in-house systems of the 1970s like NARIS have been replaced by general-purpose, vendor-supplied vector systems in the early 1980s
these use smaller platforms like the VAX, Prime, Sun costing 1/10 of the large mainframes

special-purpose functionality of archival systems will not support other types of data, other needs
high cost of development of system cannot be offset by sales for other uses
by contrast, vendor incurs high development costs of general-purpose GIS but can offset costs by multiple sales into diverse market
simple low-cost general-purpose raster systems offer comparable functionality without development costs

general-purpose systems can survive where general-purpose databases cannot
ultimately, interest in CGIS and many other archival GISs was in system, not data archive


E. SPATIAL DATA ARCHIVES TODAY

cheaper platforms, lower recurring cost allows functionality of older systems to be supplied by low-cost workstation

better awareness of value of digital data archives and GIS-based solutions to resource management has produced a large market for GIS

important archives today are very general databases like TIGER, USGS digital cartographic data, where user community demand is strong and supporting agency has specific mandate for data collection


REFERENCES
Mounsey, H. and R.F. Tomlinson, 1988. Building Databases for Global Science, Taylor and Francis, New York. A review of the current status of global spatial data archives

Tomlinson, R.F., D.F. Marble and H.W. Calkins, 1976. Computer Handling of Spatial Data. UNESCO Press, Paris. Compares and assesses several archival systems, including CGIS, LUNR, MLMIS.

Wiggins, J.C., R.P. Hartley, M.J. Higgins and R.J. Whittaker, 1987. "Computing aspects of a large geographic information system for the European Community,"

International Journal of Geographical Information Systems 1(1):77-78. A description of the Co-ordinated Information on the European Environment (CORINE) program, a multi-national natural resources GIS project.


EXAM AND DISCUSSION QUESTIONS

1. Review and discuss a selected state natural resource data archive, such as MLMIS, CGIS. How does the system compare with NARIS?

2. The volume Building Databases for Global Science listed in the references contains chapters discussing a number of efforts to construct global databases. What particular problems do they present?

3. Design a study to assess the need for a state natural resource data archive, and to evaluate its potential benefits to the user community.

4. The unit drew a parallel between spatial databases and map libraries. Discuss the validity of this analogy, and the dangers in using it to make design decisions for spatial databases.

THE RASTER/VECTOR DATABASE DEBATE
. INTRODUCTION
Raster or vector
Basic issues

B. COORDINATE PRECISION
Raster precision
Location of coordinates in raster
Vector precision
Data precision

C. SPEED OF COMPUTING

D. MASS STORAGE
Raster storage
Vector storage

E. CHARACTERISTICS OF PHENOMENA
Raster sampling
Vector sampling
Features, entities and objects

F. SUMMARY
Combinations of raster and vector

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

THE RASTER/VECTOR DATABASE DEBATE
INTRODUCTION

Raster or vector


Unit 4 introduces definitions of raster and vector

arguments about which was better have been commonplace since the earliest systems were created

raster databases are appealing
simplicity of organization
speed of many operations, e.g. overlay, buffers
especially appealing to the remote sensing community who are used to "pixel" processing

on the other hand, there are many situations in which the raster approach may appear to sacrifice too much detail
cartographers were appalled by the crude outlines of parcels that resulted in the "pinking shear" effect of diagonal boundaries represented by grid cell edges

diagram








surveyors were dismayed by the "inaccuracy" caused by the cells when portraying linear features and points
situations in which the raster approach sacrificed too much detail

however, computing times for overlaying vector based information can be excessive
early polygon overlay routines were error-prone, expensive, slow

today, there are situations in which it is clear that one approach is more functional than the other
e.g. using "friction" layer to control width of buffer is only feasible in raster

e.g. viewshed algorithms to find area visible from a point are feasible with elevation grids (raster DEMs), not with digitized contours
e.g. land survey data can only be represented with precise lines

an important current trend involves linking raster and vector systems, displaying vector data overlying a raster base
raster data may be from a GIS file (perhaps a remotely sensed image) or from a plain scanned image file

therefore, the question has evolved from "Which is best?" to "Under what conditions is which best and how can we have flexibility to use the most appropriate approaches on a case by case basis?"

Basic issues

four issues to the discussions of raster versus vector:
coordinate precision
speed of analytical processing
mass storage requirements
characteristics of phenomena


B. COORDINATE PRECISION
Raster precision


e.g. MLMIS (Minnesota Land Management Information System, see Unit 9)

early version used cell sizes of 40 acres

depended on a rectilinear public land survey
system "created" a state of perfectly square townships, composed of sections that were exactly one mile on each side
variability in the original survey lines were only addressed if mis-alignment was more than an eighth of a mile

locational precision was limited by size of cells - 40 acres, or 1/4 mile on each side
all linear features had to be represented as 1/4 mile wide strips
point features "occupied" 40 acres on the map

at scales smaller than 1:250,000 these conditions were not difficult to accept
for state-wide planning purposes, the database was adequate

what about using smaller cells
five meter cells have been scanned into systems
this size is chosen on the basis that 5 m is the width of a #00 pen line on a 1:24,000 map
of course, still not possible to represent objects smaller than 5 m
e.g. fire hydrants, storm sewer grates, power poles
precision not adequate for facilities managers
on the other hand, at 5 m there is no appreciable loss of information for most natural phenomena, most occur at this scale or smaller

Location of coordinates in raster

in most cases it is unclear whether the center of the cell or one of its corners is the precise location of the coordinate
e.g. top left corner of the grid may be referenced to a specific UTM coordinate, but it is unclear whether that location is in the middle of the top left cell or at the top left corner of that cell
locational precision is thus 1/2 the cell''s width and height

Vector precision

can be encoded with any conceivable degree of precision
precision is limited by the method of internal representation of coordinates
typically 8 or 16 decimal digits are used ("single" or "double" precision)
this limits precision to 1/108 or 1/1016 of the size of the study area respectively
for equivalent raster precision we would need 108 by 108, or 1016 by 1016 cells respectively, neither of which is feasible, even with run length encoding

however this argument may be artificial
real vector data accuracy may be much worse than one line width
e.g. digitizing from a 1:24,000 quadrangle map may appear to allow points to be recorded to the nearest 2 m, from a map that has common errors of 12 m

Data precision

vector precision is true for certain classes of data
data captured from precision survey (Coordinate Geometry - COGO)
plat maps created from land surveyors'' coordinates
political boundaries defined by accurate survey

few natural phenomena have true edges which can be accurately represented as mathematical lines
soils, vegetation types, slopes, wildlife habitats, all have fuzzy boundaries
due to the methods used to record the spatial information
due to the transitional nature of variation in the phenomenon

it can be argued that the fine lines from the vector system gave a false sense of precision
lines on maps are typically 0.5 mm wide and are often assumed to represent the uncertainty in the location of the object
in a raster system uncertainty is automatically reflected in the cell size

true comparison in terms of precision is between raster cell size and the positional uncertainty of a vector object, not the coordinate precision

THE RASTER/VECTOR DATABASE DEBATE
SPEED OF COMPUTING

raster data can be processed very quickly to answer most analytical questions involving overlays, proximity, and
Boolean queries, no calculations are required to determine relative positions between layers

in most cases, analysis requires cell-by-cell comparison of the contents of layers
little arithmetic computation is required beyond simple conditional statements

thus raster systems are suitable for small computers and have low program development costs

the same questions require considerable computation in vector systems
vector topology is complex
complex geometrical problems must be solved, e.g. to find line intersection points
complex geometrical algorithms are needed in polygon overlay to avoid generating spurious polygons
calculation of distances may be complex depending on the projection/coordinate system used

these requirements slow the response times of vector based systems
may prevent large scale work, or multiple users on small computers

on larger machines, the speed factor will still be evident:
may limit the number of concurrent users
may require that complex work to be done in batch mode at non-peak times


D. MASS STORAGE
Raster storage


simplest raster data storage method requires one memory location (e.g. one or two bytes) per cell
this is not at all efficient, but is used by several systems
such systems severely limit the maximum numbers of rows and columns that can be used

file compression is possible through a variety of approaches - see Unit 35
most common are forms of run length encoding
degree of compression depends on spatial variability of data
for very complex data the benefit of run length encoding can be negative - use of run length encoding should be optional
there is a small overhead in packing and unpacking data compared to cell-by-cell storage

Vector storage

use very little storage for simple polygons
memory requirements depend on complexity of objects
also on precision of coordinates (i.e. single or double)

volume also depends on which relationships between objects are stored in the database
some systems store few relationships, require small amount of storage, compute other relationships as needed
other systems offer more elaborate database models, store more relationships, require larger amounts of storage
e.g. system A required 150 Kbytes to store 700 lots, system B required 5 Mbytes to store the same 700 lots, both using vector database models

generally, vector systems should use less mass storage than a raster based system of high enough resolution to emulate the vectors
assuming that the required resolution is defined by the line width on the input document, rather than by the width of the transition zone in reality


E. CHARACTERISTICS OF PHENOMENA
Raster sampling


raster is a regularly spaced sampling of phenomena
reflects lack of knowledge of spatial variation
if we knew where the complex variation occurred, we would sample there more heavily - not wasting samples in areas of little variation
e.g. knowing that population in California is concentrated in Los Angeles basin and San Francisco Bay area, we would collect more data in those areas than in the Mojave Desert
raster is appropriate for remote sensing as the satellite is not intelligent enough to vary its sampling in response to variation on the earth''s surface

data typically collected in raster format is satellite imagery (raw and classified) and elevation data

Vector sampling

vector representation permits more spatial variability in some areas than in others
e.g. rapid variation at edge of area objects, none in the middle

e.g. census tracts are small in urban areas and large in rural

this is appropriate for social, economic, demographic variation which is much more intense in some areas than in others

also appropriate for some natural phenomena
e.g. variation in vegetation cover is more rapid near the Nile than in the Sahara
e.g. variation in geology is instantaneous across a fault

some objects are vector by definition
variation in ownership is instantaneous at edge of lot
variation in county is instantaneous at boundary

data typically collected in vector format are coordinate geometry (surveyor''s records) and legal boundaries

Features, entities and objects

difficult to group cells together as an object with attributes in raster
e.g. connect cells along a road
e.g. connect cells as a numbered forest stand

raster "sees" the world as populated by cells of uniform size
raster arranges geography in fixed sequence - gives "sequential access" to world

vector "sees" the world as populated by entities, represented in the database model as objects
vector arranges geography in any sequence - gives "random access" to data

operations on objects are easier in vector, e.g. analysis on a network - routing vehicles through a road network

a point object must occupy a full cell in raster, this creates some problems:
locations of water wells, emergency call boxes should not be indicated by raster cells, of some arbitrary cell size, in which they lie somewhere
for environmental modelling of water quality, we may need to know precisely where the wells are
to estimate costs of connecting wires to call boxes, we may need to be able to determine distances accurately

on the other hand, some data can only be presented in aggregated form:
e.g. census data is typically aggregated by census tract, both vector and raster representations may be appropriate depending on the application
e.g. sensitive natural resource-related data, such as the location of rare plants or archaeological sites, may be presented in large cells to allow preservation of the phenomena by indicating its presence without identifying its location precisely


F. SUMMARY

the raster/vector debate can be summarized as a series of decision rules, for example: handout - Recommendations for the use of vector and raster structures

Combinations of raster and vector

is the best of both worlds available?
to an extent it is, in two ways

1. can store data in one form and process it in another
needs an efficient algorithm to convert from raster to vector and vice versa
possible to capture and store data in vector mode, yet analyze it in raster form
may save computing time and mass storage
especially important for small machines

2. use combinations of systems which run raster and vector analytical systems in parallel
e.g. install raster and vector systems on the same PC, use conversion functions in one or both systems
e.g. overlay a vector based landuse parcel map over a Landsat image to improve the interpretation of the satellite image
the image might then be used to correct a vector based vegetation parcel map

REFERENCES
Burrough, P.A., 1986, Principles of Geographical Information Systems for Land Resources Assessment, Clarendon, Oxford. See raster/vector summary on p. 169.

Gahegan, M.N., and S.A. Roberts, 1988. "An intelligent object-oriented geographical information system," International Journal of Geographical Information Systems 2(2):101-110. Discusses an interface to a spatial analysis systems which allows the underlying geographical domain to be represented using a high-level, feature- oriented model.

Star, J.L. and J.E. Estes, 1990. Geographic Information Systems: An Introduction. Prentice Hall, Englewood Cliffs NJ. Chapter 4 summarizes both sides of the raster-vector issue.


EXAM AND DISCUSSION QUESTIONS

1. Summarize the dimensions of the debate between raster and vector database models.

2. Describe the design of a study to compare the amounts of storage used by raster and vector models in creating a digital representation of a specific map? If you feel this cannot be done, explain why not.

3. Summarize the arguments for raster GIS and the application areas in which it has distinct (a) advantages and (b) disadvantages.

4. What factors would determine the appropriate cell size for a raster GIS being developed for a power transmission corridor study?

5. In physics, light appears to behave sometimes as particles, sometimes as waves. Discuss whether a useful analogy exists between this debate in physics and the raster/vector debate in GIS.

THE OBJECT/LAYER DEBATE
A. INTRODUCTION

B. THE LAYER VIEW
Data models

C. THE OBJECT VIEW
Scale
Time
Object orientation
Identity
Inheritance
Encapsulation

D. COUNTER-ARGUMENTS FOR THE LAYER VIEW
Do objects really exist?
Environmental modeling
Gradients
Enlightenment

E. AREAS OF APPLICATION
Resource management (layer view)
Utilities (object view)
Transportation, hydrology (mixed)

F. EXCEPTIONS
Transport networks

G. CONVERSION

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

THE OBJECT/LAYER DEBATE
A. INTRODUCTION


if we see the task of building a database as one of representing the contents of maps, then the choice is between:
raster - divide the map into a sequence of identical, discrete elements and list the contents of each
vector - list the features present on the map and represent each as a point, line or area object

however the real purpose of the database is to represent the world

maps are highly efficient ways of showing geographical variation, but representing the contents of maps is not the same as representing the world
the objective of a map is visual - to capture geographic information and pass it to the user through the processes of visual perception
the objectives of a database - measurement, analysis, modeling - may conflict with the objectives of a map

the raster/vector debate is only one part of a much larger set of issues concerned with representing the world in spatial databases in meaningful ways

this unit looks at the debate between layers and objects

both use databases containing points, lines and areas to describe geographic variation
the difference is in how the contents of the database represent the real world


B. THE LAYER VIEW

the real world is continuous
an infinite number of places exist in the world

locations are specified by some system of coordinates
our ability to specify exact location is limited only by the precision of measuring devices
in principle, location could be specified using coordinates with any number of digits

geography can be described using a number of variables, e.g. elevation, soil type, mean January temperature, population density, county

each of these variables can be determined at any place given suitable measurement instruments

each variable can be conceptualized as a layer
each layer captures the variation of one variable over the surface of the earth

the database can be interrogated to determine the value of any variable at any place
the result can be checked by visiting the specified place (ground truth)

Data models

GIS has developed certain data models for representing the layer view of the world, among them:
raster - continuous geographic variation is approximated by finite-sized pixels
polygon - the world is divided into irregular pieces, and the variable is assumed to be constant within each piece and to change suddenly at each boundary
TIN - the world is divided into triangles, and variation is approximated by a plane within each triangle

each of these models provides a way of capturing the variation of one variable over the earth''s surface

each uses objects of various kinds - points, lines or areas - but the objects exist in the database, for the purpose of describing variation, and not in the real world
e.g. contours are line objects, but exist to capture varying elevation


C. THE OBJECT VIEW

humans see the world as an empty space littered with various types of objects

objects are used in speaking, writing and thinking about the world

we find it more convenient to describe places in relation to objects of known location, than using coordinates of any kind
e.g. the island of Mauritius is described as "in the Indian Ocean", not by its latitude and longitude

objects are not simply artificial constructs used in describing variation (the layer view) but fundamental to our understanding of geography

objects can be points, lines or areas

any place can be occupied by any number of objects, and can be empty ("there''s nothing there")

the layer view is inefficient when variables are defined only over limited geographic areas or classes of objects
e.g. date of last forest fire is defined only over burned areas
e.g. name of city is defined only over city points or areas, but name of county is defined everywhere in the US

Scale

the same object can be represented differently at different scales

e.g. "San Francisco" can suggest:
a city and county with legal geographic limits
the entire metropolitan area of San Francisco Bay

Time

the object view has obvious advantages when well-defined objects move through time, e.g. people

poorly-defined objects, e.g. clouds, present enormous problems
e.g. how to track the movement of an object through time from one image to another when objects are changing form, splitting and merging

Object orientation

is a set of concepts originating in Computer Science and dealing with the design of both databases and processing algorithms
includes programming languages (e.g. Smalltalk) and databases, many of which are still experimental

argues that it is artificial and confusing to separate the definition of objects from the operations performed on them
in GIS, to separate the nature of geographical variation from operations (analysis, modeling) performed on geographical variation

has very recently stimulated debate within GIS about the nature and role of geographical objects

contains several key concepts:

Identity

objects have identity which persists through various kinds of processing
e.g. "Indian Ocean" is an identifiable and persistent geographical concept, even though it is not possible to delimit it precisely at any scale
in GIS, object identity can persist through scale change and also change of graphic representation (e.g. city as point to city as polygon)

Inheritance

when new objects are created they can inherit the properties of their parents
e.g. when a land parcel is created from survey records, it should inherit properties of those records, including the name of the surveyor and the date of the survey

in GIS, it is sometimes desirable to deal with complex objects formed as collections of simpler objects, and to have the complex inherit properties of the simple objects, or vice versa
e.g. the complex object "airport" might be composed of the simpler objects "runway", "tower", "perimeter" etc. Some of the attributes of "airport" should be accessible at the level of the simpler objects, such as "airport name"

Encapsulation

rather than separate the objectives of description and processing, it is desirable to couple objects with the operations performable on them
in GIS, a "line" may be part of a county boundary, a river course, a highway or a contour - each has its own set of appropriate operations


D. COUNTER-ARGUMENTS FOR THE LAYER VIEW
Do objects really exist?


if an object is real, it must be possible to visit a place and determine precisely whether the place lies in or on the object

this is possible for certain mathematically defined objects, e.g. the Northern Hemisphere, the Equator

for the object "house" to be real, every place on the earth''s surface must be either inside or outside

we would need to define the precise footprint of the house
we would need to be able to measure a place''s location exactly, and this is not possible because of limitations in geodesy and surveying technologies

objects like "woodlot", "Indian Ocean" are not precisely defined

in the layer view, very few geographical objects exist in any precise sense

Environmental modeling

theories of atmospheric, oceanic, geophysical processes are compatible with the layer view
e.g. meteorology models fields, a field being a variable defined everywhere in two or three dimensions
examples of such fields are temperature, wind speed and direction, atmospheric pressure

environmental scientists use layers to describe soil and vegetation types, biodiversity, species ranges etc.

much data for environmental modeling comes from remote sensing, which implies a layer view, at least until the data is analyzed and interpreted

Gradients

the object view of the world as populated by discrete objects is less compatible with notions of continuous geographical change:
e.g. gradients, transition zones, zones of uncertainty, fuzziness, slopes, ecotones, clines

Enlightenment

people see the world, learn and talk about it and navigate using objects

scientific understanding often requires a new perspective
e.g. understanding electromagnetic radiation requires the concept of continuous, invisible fields
e.g. atmospheric science requires continuous fields of temperature, pressure

the scientist''s view of the world may have to be radically different from more traditional modes of human thought
the layer view may be the scientist''s way of imposing a more enlightened perspective on the world

THE OBJECT/LAYER DEBATE
AREAS OF APPLICATION

some areas of GIS application are more likely to take the object view, some the layer view

Resource management (layer view)

geographic variation can be described by a relatively small number of variables
there are measurement problems in many areas, e.g. biodiversity

conceptualization does not change radically from one scale to another

difficulties occur in handling information on moving individuals, e.g. tracking grizzly bear, using the layer view

Utilities (object view)

the notion of empty space littered with well-defined objects fits the problems of managing large numbers of pipes, valves, meters etc.

two objects may occupy the same location, but be separated vertically

the notion of a variable measurable everywhere on the earth''s surface has little relevance

Transportation, hydrology (mixed)

roads, railways and streams are mostly well-defined line objects

area objects are sometimes needed for analysis
e.g. lake, noise buffer around highway

some types of hydrologic modeling require a layer view
e.g. overland, subsurface flows are not often confined to objects, better modeled as fields using layers


F. EXCEPTIONS

some information is not well suited to either view

Transport networks

often modeled as line and point objects (links and nodes) lying in the plane

the DIME and TIGER databases of the US Census use a layer view to model street networks
street intersections are nodes (0-cells)
streets between intersections are links (1-cells) bounded by 0-cells
blocks are areas (2-cells) bounded by 1-cells
all places are in exactly one 2-cell or on a 1-cell
as a consequence of this planar enforcement, all crossings of streets are also intersections - both overpasses and grade intersections are 0-cells
this makes it difficult to use the model for routing

in transportation, all information is limited to links and nodes - places that are not on links or nodes cannot have attributes of any kind

the links and nodes form a 1-dimensional space embedded in the geographical 2-dimensional space

objects are restricted to points on the network, or line segments of the network - there is no equivalent of the area object

positions on the network can be specified in terms of link number and distance from the link''s beginning node

in the object view, it would be necessary to create links and nodes wherever attributes change, thus fragmenting the network unnecessarily


G. CONVERSION

it is possible to convert from one view of the world to the other

this is commonly done in the construction of a layer database from digitized "spaghetti", e.g. in digitizing a soil map
the lines on the map (boundaries between soil classes) are digitized as separate objects
objects may cross, one place may be empty or occupied by any number of objects

the "building of topology" or process of planar enforcement essentially converts this object view of the world into a layer
after conversion, every place either lies on a boundary, or has exactly one soil type

REFERENCES
Egenhofer, M. and A.U. Frank, 1987. "Object oriented databases: database requirements for GIS," Proceedings International Geographic Information Systems Symposium: The Research Agenda 2:189-211.

Gahegan, M.N. and S. Roberts, 1988. "An intelligent, object- oriented geographical information system," International Journal of Geographical Information Systems Vol 2:101-10.

Goodchild, M.F., 1989. "Modeling error in objects and fields," in M.F. Goodchild and S. Gopal, editors, Accuracy of Spatial Databases, Taylor and Francis, New York, 107-13. An accuracy perspective on the layer/object debate.

Herring, J.R., 1987. "TIGRIS: topologically integrated geographic information system," Proceedings, AutoCarto 8:282-91.

Herring, J.R., 1989. "A fully integrated geographic information system," Proceedings, AutoCarto 9:828-37.

Maguire, D.J., M.F. Worboys and H.M. Hearnshaw, 1990. "An introduction to object-oriented GIS," Mapping Awareness 4(2):36-39. A good general introduction to the object oriented GIS.

Worboys, M.F., H.M. Hearnshaw and D.J. Maguire, 1990. "Object-oriented data modeling for spatial databases". International Journal of Geographical Information Systems.


EXAM AND DISCUSSION QUESTIONS

1. It has been said that geographical data is special because it is inherently infinite - the world contains an infinite number of variables defined at an infinite number of places. Do you agree? Are there any other examples?

2. Consider a set of driving directions from your house to your supermarket. Do they imply a layer or an object view of the world, or is neither appropriate?

3. Which layers would you identify as most important in describing the natural environment?

HISTORY OF GIS
A. INTRODUCTION

B. HISTORIC USE OF MULTIPLE THEME MAPS

C. EARLY COMPUTER ERA

D. CANADA GEOGRAPHIC INFORMATION SYSTEM (CGIS)
Purpose
Technological innovations
Key innovative ideas in CGIS
Key individual

E. HARVARD LABORATORY
The Harvard packages
Key individuals

F. BUREAU OF THE CENSUS
DIME files
Urban atlases

G. ESRI

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

GIS MARKETPLACE
A. INTRODUCTION
Purpose

B. MARKET POTENTIAL

C. PRODUCT CHARACTERISTICS
Data model
Platforms
Cost
Domain
Performance issues
Query/product mode
Ease of integration
Training needs
User interface
Maintenance
Potential for creativity

D. VENDORS'' PRODUCTS
GIS products and vendors

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

Current information on GIS vendors and products are available in several publications. Trade journals are also a source of current information. You may want to assign an exercise in which students will assess different products on the characteristics that are discussed here.

UNIT 24 - GIS MARKETPLACE

Compiled with assistance from Doug Banting, Ryerson Polytechnical Institute, Toronto


A. INTRODUCTION

Purpose


to review current GIS marketplace

products change all the time
unfortunately this means the material in this unit will become outdated quickly
sources for updating this material are provided in the References

no intention to evaluate/compare/recommend
different application areas are dominated by GISs of different functionality, approach
even within application areas, capabilities vary widely
different sets of capabilities meet different needs
needs of each GIS installation vary widely

no list can be exhaustive
recent surveys list around 1,000 "GIS" vendors
vagueness of definition of GIS makes it difficult to draw the line


B. MARKET POTENTIAL

many reports indicate the rapid growth of the GIS market


C. PRODUCT CHARACTERISTICS

GIS products can be evaluated and compared on a multitude of different aspects
different characteristics are important for different applications and users

variables to compare are include:

Data model

vector, raster or "integrated"

Platforms

range from Macintosh and PC to workstations to mainframes
programs which work on micro-computers are generally very limited in their ability to handle large databases

most major GIS programs now operate on several different platforms

Cost

cost ranges from free (several public domain GISs, e.g. GRASS from US Army Corps of Engineers) to tens of thousands of dollars for the major commercial programs

Domain

related to cost

commercial products represent major investments in product development by a private company
are designed to meet market needs and generally respond fairly quickly to changing needs
are usually well supported by easily accessed customer service departments
are regularly updated and enhanced

public domain refers to the ownership of the software, however, practically speaking, it implies low cost but often a low level of support
public domain programs can be freely copied and shared
further listings of GIS-type products are published regularly in many newsletters and journals

includes:
raster based systems: GRASS - U.S. Army Corp of Engineers
vector based systems:
MOSS - U.S. Bureau of Land Management
SAGIS - National Park Service
ODYSSEY - developed at the Harvard Lab, now available from Department of Geography, University of Washington
ROOTS - Laboratory for Computer Graphics and Spatial Analysis, Harvard University

a third type of program are the educational programs which have been developed as teaching tools
these programs general have significant limitations on the size of the database
work on micro-computers
are very low cost
include extensive tutorial materials
are updated regularly
are not supported by extensive customer service departments

includes:
IDRISI - from Clark University
OSU MAP - from Ohio State University

GIS MARKETPLACE
Performance issues

systems can be arranged on three performance dimensions, each identified with one corner of an equilateral triangle

corner 1 - volume of data
corner 2 - richness of functionality
corner 3 - speed of response









no system can optimize for all three
current systems choose two to optimize, sacrifice the third

any system can be positioned somewhere in the triangle depending on what combination of factors it tries to optimize
the nearer to a corner, the more that factor is important or optimized in the system

"linear" (i.e. network) systems typical of AM/FM-type applications, need fast response, large volume, but can sacrifice functionality to achieve them
locate between corners 1 and 3









"Choropleth" systems are those in which most objects are areas
primary applications are with demographic, socio- economic and natural resource data
for demographic and socioeconomic, areas are reporting zones
for natural resources, areas are defined by homogeneous attributes
these systems focus on integrity of polygon topology rather than the connectivity of vector nets

applications in these areas require sophisticated analysis - simple query is relatively unimportant
users can often afford to wait for product

richness of functionality, volume of data more important than fast response to query
these systems fall between corners 1 and 2









"Precision" systems are those in which absolute locations and boundaries must be identified
these are large scale (high resolution) with lot boundaries, streets as double lines - "parcel" database
analysis is unimportant
mapping (subdivision plans) is very important
these systems need large volume, low functionality, fast response not necessary
CAD (computer assisted design) and computer mapping packages are adequate, topological database model not essential
these are at corner 1









Query/product mode

as noted previously, applications tend to be focused along a continuum of from strictly query oriented systems to strictly product oriented systems

Ease of integration

some GIS programs will import several different types of files from other programs but do not provide export facilities

on the other hand, some GIS programs are designed specifically to operate in parallel with other programs and external databases

Training needs

some programs are easy to operate, some require week long training programs

it is important to note that large, complex command structures are not necessarily a design flaw
knowledgeable users will appreciate the flexibility and adaptability of these large programs
also, ease of learning is not necessarily always a positive characteristic
e.g. many statistical packages do not have simple "idiot-proof" interfaces specifically because their designers wish to ensure that only users who are knowledgeable about the statistical methods will understand how to access them

User interface

different types of interface were reviewed in Unit 18

Maintenance

frequency of updates, cost

Potential for creativity

some programs allow many, flexible "tools" that can be combined in innovative ways to deal with problems not foreseen by the program''s designers


D. VENDORS'' PRODUCTS
GIS products and vendors


note: GIS Sourcebook, 1989 includes an excellent summary of current GIS products and vendors listing cost, date first introduced, support, data structures, functionality
an update will be published in 1990 and it may continue to be an annual publication


REFERENCES
American Farmland Trust, 1985. A Survey of Geographical Information Systems for Natural Resource Decision Making, Washington. Useful if fugitive survey.

GIS Sourcebook, 1989, GIS World, Fort Collins, CO. Includes an excellent summary of current GIS products and vendors

listing cost, date first introduced, support, data structures, functionality. An update will be published in 1990 and may continue to be an annual publication.

Tomlinson, R.F., 1987. "Review of North American Experience of Current and Potential Uses of GIS," International Journal of Geographical Information Systems, 1(3):203- 218.

Wheate, R., 1988. "GIS/Image Processing: A Summary of Software Packages Available for the PC," Operational Geographer, 14:30-33. Excellent review of PC software, including image processing.



Several useful papers appear in the proceedings of an Atlanta GIS conference in 1986, published as Proceedings of GIS Workshop by ASPRS, Falls Church, VA:

Caron, L. and J. Merchant, 1986. "Geographic information systems for non-urban local-level jurisdictions: existing alternatives," p. 110

Fleet, H., 1986. "SAGIS: a full-function public-domain GIS for micro and minicomputers," p. 301.

Reeve, C. and J. Smith, 1986. "The varied viewpoints of GIS justification: a survey of vendors and users," p. 396.

EXAM AND DISCUSSION QUESTIONS

1. Describe the "linear", "choropleth" and "precision" views of the world used in this unit.

2. Why are there so few vector-based GISs designed for small computers (e.g. PC)?

3. Describe the three-corner scheme used in this unit, and explain the nature of each corner or pole. Why is it not possible to develop a system which optimizes all three measures of performance?

4. Compare several different GIS products across the range of characteristics listed in this unit. Make some conclusions about the applications each of the products would be best suited for.

TRENDS IN GIS
A. INTRODUCTION


this unit discusses
trends in computer hardware and software for GISs
new applications of GIS technology
new sources of data


B. HARDWARE
Fast geoprocessing


computing power is often measured in MIPS
Million Instructions Per Second

MIPS measures oversimplify the measurement of computing power but are nevertheless useful as broad bases for comparison
arithmetic calculations can require execution of as little as two and as many as 100 instructions
arithmetic on real numbers is better measured in MFLOPS ("megaflops") or millions of floating point (i.e. decimal) operations per second
MIPS="meaningless information processing statistics"

current personal computers and workstations used for GIS range from 1 to 5 MIPS

within the next five years, 20 to 30 MIPS workstations are likely at roughly similar prices
the power of personal workstations is currently increasing by close to a factor of 2 per year

workstations will likely be coupled to 1,000 MIPS file servers which can extract software and data at high speed for analysis in personal workstations

over the next decade, 1,000 MIPS machines will become common in large organizations

to this point, advances in computing power have always found new areas of application. What new applications of GIS will take advantage of higher speeds?
larger data sets, higher levels of spatial resolution
more complex models
more complex analysis for decision-making
better methods of display and visualization

Parallel Processing

trend toward different computer architectures
away from single processors operating on data in sequence
parallel processors can perform tasks on several different processors simultaneously within the same computer

what GIS processes will be suitable for parallel processing?
analyses which require repeating the same steps everywhere on the map
easier to see applications for raster data than for vector since each pixel is independent
e.g. finding route for a vehicle across a rugged terrain
e.g. image processing applications such as image classification, visualization, scene generation

Memory

trend is toward lower costs for ever larger computer memories
cost of storing large GIS datasets will come down
more data can be placed "on-line" for faster access

the nature of geographical data (high volume, infrequent update) is suitable for optical storage
once written, cannot be changed
CD-ROM - 5 1/4 inch disks with 250 MBytes, enough to store all streets in Los Angeles
once a "master" has been created, the unit cost of CD-ROM data is only about $10
optical WORM - 12 inch disks with 2 GBytes, enough for the contents of 100 topographic maps

erasable optical disk is available (e.g. NEXT computer)
very high density of data

Workstations

"dumb" terminals connected to a central processor are gradually being eliminated in favor of desk top computers

especially popular are "workstations", which have excellent graphic performance and sophisticated user interfaces
the exact distinction between personal computers and workstations is unclear
workstations are generally more powerful
workstations are generally more expensive ($10,000 vs. $4,000)
workstations generally use UNIX operating system rather than DOS
workstations have more powerful graphics capabilities (1280x1024 rather than 640x480)
originally, workstations developed for the scientific and engineering market

workstations function effectively as nodes on a network devoted to GIS processing

a large CPU (a fileserver) for database management and centralized processing may still be required as part of the network, while most processing is done at individual workstations
data can be distributed around the disks on the network

Networks

the dominant hardware system architecture of recent years, the multiuser host, is giving way to multiuser network architectures
the network integrates compute servers, file servers, workstations and shared peripherals
any user can access data, peripherals across the network
the network will likely be linked to other networks through "gateways"

this architecture requires fast and economical data transfer and the availability of powerful workstations
data transfer rates of MBytes/second are common at low cost
only hundreds of bytes per second were possible 15 years ago
this change of cost has had enormous impact on the ways people organize computing

hardware manufacturers are beginning to offer network architectures, and networking among hardware of different vendors is likely to become more available over time
requires interchangeability of parts; standards for communication, data and software; common operating systems

this will lead to reduction in the power of data processing centers in favor of "information management" systems based on control of transactions
the role of the "computer center" based on a large mainframe is changing rapidly

Hardware for specialized processing functions

compute servers, file servers, sort servers (e.g. TRW''s Fast Data Finder) and search servers (e.g. Excel''s Sorting Engine) are now being developed for networks
these are specialized computers attached to networks for specific functions
map overlay using hardware intersecting tools will be developed in the future, perhaps other GIS functions as well

this trend will continue since such hardware tools can provide enhanced system performance for the entire network

Operating systems

continued diversity is likely for the immediate future

UNIX is making gains, perhaps especially in inexpensive machines, and in scientific and engineering applications, but is it likely to become universally supported?

this will make software development and networking more difficult, and puts a premium on GIS software, database management systems, and applications which can work on different vendors'' machines

Peripheral devices

excellent raster devices are now available (e.g., electrostatic and laser printer/plotters) for graphic/cartographic output
costs of these systems are directly related to the size of the product

scanning has not taken over from digitizing in GIS applications, and is unlikely to do so until some means, perhaps artificial intelligence, can be applied to separating extraneous information

Specialized workstations

a data entry device is needed which will allow correction of data as they are acquired
will probably have a large flat display, multiple graphic memory planes, and interactive data capture capabilities
the workstation should be able to "check-out" and "check-in" work areas (e.g. mapsheets) from the larger database maintained by a server on the network

an "electronic sandbox" will be useful for interactive, GIS-based analysis/modeling and land use planning
the design of this workstation will require some very creative thinking

workstations specialized for particular uses (e.g., land planning, water resources, forestry) are likely to be developed as the number of users increases in such specialized fields
analyzing data on the globe (e.g. oceans, atmosphere) will require a specialized workstation which can display data on the globe''s curved surface
e.g. the globe could be "browsed" using a track-ball to rotate the image

as GIS becomes a standard decision support technology, entire conference rooms will be devoted to its use
containing specialized GIS workstations, large GIS display devices, and GIS planning/conference tables

TRENDS IN GIS
SOFTWARE
Database management systems


while present DBMS''s are effective for managing tabular data, they are not effective for the "long transactions" required when cartographically referenced and topologically related data are altered
such "long transactions" occur because simple changes in cartographically referenced data may require changes in topologically related items, and because the whole process of updating geographic information differs from updating tabular data

transactions on geographical data could be confined to a single specialized workstation

queries and analysis typically do not modify the data, so these are easier to execute across the network, e.g. between a workstation and a central fileserver

Relational DBMSs

trend toward using relational DBMSs (often with SQL style user interfaces), because of their "open architecture"
it is becoming easier to exchange one DBMS for another within a GIS

object-oriented database structures are being proposed for GIS use, but they present some problems
e.g. they are not well adapted to storing natural feature information
not well suited to complex spatial analysis
often have embedded proprietary data structures

DBMS versus Fourth Generation Languages

the DBMS approach often involves highly structured application programming, often at the expense of ad hoc query capabilities
user must learn complex rules of syntax
may be a valid approach for static databases which are only used for simple, repetitive queries

the trend in GISs is toward the use of Fourth Generation Languages (4GLs) which provide commands, tools, procedures, and report writers to permit easy ad hoc querying of a database
these provide intelligent interfaces close to natural language, however, definitions become vague, less rigorous
use of 4GL may detract from the GISs ability to perform complex analysis

GIS system integration

the marketplace increasingly demands compatibility between diverse hardware and GIS software

at the same time, GIS software needs to interface to an increasing diversity of DBMSs, because different applications often require different DBMSs
in many applications, records are already stored in a DBMS
when the GIS capability is added to allow geographical access to these records, it must interface with the existing DBMS

Display products

improved cartographic products are certain, since there is both intense user demand and the technology required to support such improvements
map output will continue to be judged against hand- made products

3D displays, overlaid with both cartographic data and representations of the built environment, are likely early developments in GIS technology
but major questions remain about how to gather, compile, model and structure 3D data

because of these problems, it is not clear what kinds of analysis are needed or appropriate for 3D data

Interfaces to other technologies

interfaces between GIS, CADD, remote sensing, image processing, architectural graphics, and other technologies are going to be increasingly easy to create

the differing data types produced by these technologies will be more frequently combined in shared databases

User interfaces

more sophisticated, flexible and well managed graphic user interfaces are inevitable

users are becoming increasingly impatient with software which requires any training or support from the vendor
training and support add high and probably continuing costs to GIS acquisitions


D. NEW APPLICATIONS OF GIS TECHNOLOGY

because GIS technology is becoming more affordable, more reliable, more widely used and better known, new applications of GIS technology are likely to rapidly increase, just as the applications of computer graphics have increased

Modeling and decision support

Geographic Information Modeling System (GIMS) technology will be developed and used in providing decision support in a growing number of fields

e.g. current interest in GIS applications in transportation planning, requires some modifications to standard GIS models, addition of new functions
e.g. modifications to allow lines which cross but do not intersect
e.g. functions to measure distances between objects via the network
e.g. functions to solve standard problems in transportation, such as predicting traffic flows

can applications in areas as different as transportation planning and forestry be served by the same GIS software?
will there be fragmentation of the GIS field by area of application?

Sciences and mathematics

GIS technology will be applied widely in the sciences in the near and middle term future
e.g. 3D capabilities for geology, geophysics, hydrology, mining
GIS modeling in landscape ecology

in the longer term, applications for GIS technology may develop in areas of image processing, e.g. X-rays, other types of medical imaging, where superimposition of data, analysis may have similar importance

global issues - tropical deforestation, acid rain, greenhouse effects, endangered and threatened species, and similar problems are likely to be analyzed using GIS technology in the 1990s
GIS networks, similar to the global weather monitoring and prediction network, may evolve
they will probably make use of super computers, parallel processing, and artificial intelligence to cope with the massive databases and the complex models involved
such models are currently in very rudimentary form, e.g. global climate models used to predict greenhouse warming effects use very large cell sizes (5 degrees lat/long)
GISs will play a role in managing the global environment, perhaps used to identify the world''s most sensitive habitats; then a country''s agreement to conserve these habitats may be exchanged for forgiveness of international debts


E. NEW SOURCES OF DATA
Remote sensing


following Landsat, ERTS, Thematic Mapper and now EOS, and competing with them to supply the demand for satellite- borne imaging systems are the SPOT system and systems which the Japanese, Russians, and others may bring on line in the 1990s

as data resolution increases, costs fall, service becomes more reliable, and user demand increases, satellite remote sensing will become more important in supplying data for GIS use
already the supply of data vastly exceeds our ability to analyze it
better methods of scanning, archiving large amounts of data will be needed

remote sensing cannot provide the information required for many kinds of analyses
in the longer term (and with such new technologies as the fabrication of ultraminiature sensing devices on silicon chips), the trend will be toward "an instrumented universe"
instruments to monitor the globe will be broadcast over the earth, probably telemetering their information to networks of users

Error/uncertainty

as more data and more forms of data are gathered, and there is increased pressure to combine these data for analyses, increasing attention will have to be paid to data error and uncertainty

Data sharing

essential to reduce costs, solve widespread problems, and fully utilize available technology

security considerations, political divisions, and other factors will continue to inhibit sharing

because problems will increasingly be recognized as crossing international boundaries and having global implications, and because the necessary technology is now becoming available, more global databases will be built

in some situations, private businesses will compete with government agencies in supplying data to both government and the public
questions will increase about how to best serve the public interest in such cases
data gathering by government will always be under fire because of its cost

F. CONCLUSION

the immediate future will be a time of explosive growth in the development and use of GIS technology
the signs are that current growth rates will continue

it is very likely that these predictions about the period will prove to have been too conservative
predictions about the value of technological change often prove to be totally wrong
e.g. the development of microchip technology which was the key to cheap computers was driven first by the need to save space and weight in space vehicles


REFERENCES
Dangermond, Jack and Morehouse, Scott. 1987. "Trends in Hardware for Geographic Information Systems," Proceedings AUTOCARTO 8, ASPRS/ACSM, Falls Church, VA. (Also available from ESRI)

Dangermond, Jack, 1987. "Trends in Geographic Information Systems Software," Proceedings IGIS: The Research Agenda, NASA, Washington, DC. (Also available from ESRI)

U.K. Department of the Environment, 1987. Handling Geographic Information, Report of the Committee of Enquiry chaired by Lord Chorley (the "Chorley Report"), Her Majesty''s Stationery Office, London.


DISCUSSION/EXAMINATION QUESTIONS

1. Imagine a few important future developments which you believe ought to occur in GIS technology. Do they depend chiefly on hardware or software, or both? Explain.

2. Select a particular discipline, field, or specialty to which you think GIS technology might be applied. Discuss some specific future applications and their implications for professional practice in that field. What specialized GIS features would these applications require?

3. What are the characteristics of an "ideal" source of data for use in a GIS? List them and discuss each one. Now, are there any potential sources of GIS data which have not yet been utilized but which meet most of these criteria?

4. Design a GIS workstation for the support of global science (i.e. the analysis and modeling of data for the entire globe). What operations, database models and user interface features would it have?

GENERAL COORDINATE SYSTEMS
INTRODUCTION

B. PLANE COORDINATE SYSTEMS - CARTESIAN COORDINATES
Determining Coordinates
Measuring Distance

C. STORING COORDINATES
Integers vs real numbers
Computer Precision
Precision of Cartesian Coordinates
Propagation of coordinate errors

D. PLANE COORDINATE SYSTEMS - POLAR COORDINATES

E. GLOBAL COORDINATES - LATITUDE AND LONGITUDE
Determining Coordinates
Important Terms
Measuring distance
Question of Precision

F. DETERMINING POSITION

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES

DISCRETE GEOREFERENCING
A. INTRODUCTION

B. STREET ADDRESS
Using addresses in GIS
Method
Example - Addmatch using TIGER

C. POSTAL CODE SYSTEMS
US ZIP Codes
Canadian Postal Code
Problems

D. US PUBLIC LAND SURVEY SYSTEM
PLSS References

E. GEOLOC GRID
GEOLOC References
Precision

F. CENSUS SYSTEMS
Converting to georeferences

G. ISSUES CONCERNING DISCRETE GEOREFERENCING
Hooks
Purpose
Standardization

REFERENCES

DISCUSSION OR EXAM QUESTIONS





NOTES

This lecture concludes the module on geocoding. Several important practical issues are raised here that will be important particularly for those who will be working with economic and demographic databases.

UNIT 29 - DISCRETE GEOREFERENCING


A. INTRODUCTION


the georeferencing methods covered so far (latitude- longitude, Cartesian, projections from latitude/longitude to the plane) are continuous
this means that there is no effective limit to precision, as coordinates are measured on continuous scales

will now look at discrete methods - systems of georeferencing for discrete units on the earth''s surface

many of these methods are indirect
this means that the method provides a key or index, which can then be used with a table to determine latitude/longitude or coordinates
for example: a Zip code is an indirect georeference
rather than give latitude/longitude for a place directly, it provides a unique number which can be looked up on a map if coordinates are needed

because these methods are indirect, it is important to consider the precision of these systems
precision is related directly to the size of the discrete unit which forms the basis of the georeferencing system

many methods of indirect or discrete georeferencing are in common use
following are 5 of the most common


B. STREET ADDRESS

the precision of street addresses as georeferences varies:
is highest for apartments or houses in cities
is lowest for rural addresses or post office box numbers, where the address may indicate only that the place is somewhere in the area served by the post office

Using addresses in GIS

general approach is to match address to a list of streets (called address matching or "addmatch")
spelling and punctuation variations make this difficult
e.g., Ave. or Avenue, apartment number before or after street number
a failure rate of 10% is regarded as good, 40% is not uncommon. In such cases it is necessary to find the street by hand, which may take as much as 5 minutes per address in large cities

Method

1. identify the block containing address from table of address ranges in each block
i.e., 551 B St. lies in the block running from 501 to 599

2. estimate position of house using the coordinates of the end points
the exact position of the house can be estimated by linear interpolation
i.e., 551 is roughly half way down the block
such estimates are crude
in many countries (e.g. India) addresses are not sequential along the street, but reflect date of construction
if the street is curved the estimate can be improved by using intermediate points (called shape points)
shape points are associated with the same information that block endpoints have, including building numbers and other georeferences

databases to support addmatching exist in most industrialized countries.
in the US, DIME files were developed for this purpose in the late 1960s by the Bureau of the Census, and are now being replaced by more comprehensive TIGER files
see Unit 8 for an introduction to TIGER

handout - TIGER system: An overview (9 pages)

Example - Addmatch using TIGER

handout - Map of west central Columbia, MO
note: intersection of West Blvd and W. Broadway is W of center

handout - Portion of the TIGER file (Boone County, MO)
demonstration - the solution of this problem could be demonstrated using the TIGER file for Boone County, MO

TIGER files can be readily accessed and displayed using the SAFARI package from Geographic Data Technologies, Inc

Problem: find the latitude and longitude of 950 West Broadway
Procedure:

1. search the TIGER file for Boone County for features with the name "West Broadway" or equivalent (W. Broadway, Broadway W. etc)

get about 30 matches for the length of W. Broadway

2. find the record that lists the address range which includes address 950:
record #6714 covers the block from Greenwood to West Blvd, and includes the following data:
longitude 92.3503 to 92.3527
latitude 38.9519 to 38.9522 (indicating that the street has been coded from east to west)
ZIP code 65203 on both sides
census tract 6 on the left side, 7 on the right
address ranges 900 to 998 on the left, 901 to 999 on the right
no shape points, so we assume the block is straight

3. determine the coordinates of number 950:
assume that the houses are evenly spaced along the street, and that the full range of addresses is used (this is not necessarily a good assumption, but it''s the best that can be done without more information).
longitude is: 92.3503 + {(950-900) * (92.3527-92.3503) / (998- 900)} = 92.3515
latitude is: 38.9519 + {(950-900) * (38.9522-38.9519) / (998- 900)} = 38.9521
note that the results are given to the same precision as the block endpoints
we could have calculated more digits, but they would have been meaningless given the accuracy of the inputs

problems with determining georeferences by address matching:
cases where matching fails (10 - 40% common)
rural areas and box numbers where there are no street addresses
long blocks with uneven houses
street addresses do not always identify a parcel or lot, and some parcels have many street addresses (e.g., apartments, condominiums)

address matching is very commonly used to determine georeferences for marketing and retailing, health and the collection of social statistics


C. POSTAL CODE SYSTEMS

postal code systems have been set up in many countries

these often provide a high level of spatial precision

US ZIP Codes

in the US, zip codes are designed to assist with mail sorting and delivery
the codes are hierarchically nested, states are uniquely identified by one or more sets of the first 2 numbers
a 5 digit ZIP code identifies the area served by a single post office
this gives precision of many city blocks
the 9 digit ZIP potentially provides a much higher level of spatial resolution, but problems exist
buildings may have different codes for different floors
overlapping and fragmented boundaries

Problems:

addresses associated with a single zip code were developed from lists of addresses representing postal walks, rather than from maps. Addresses were seen as points along the streets rather than parcels of land
as a result, the area associated with a single 5 digit zip code does not necessarily have a well- defined geographical boundary
therefore, areas are sometimes not well defined, and they may overlap
it is possible for the faces of a city block to have different ZIPs
- boundaries of the zip code areas have been interpolated and files giving the coordinates of 5 digit ZIP code boundaries are available from a number of vendors

warning: some of these have used simple Thiessen polygons to delineate associated areas
i.e. the area of a ZIP code has been defined as the area closest to the corresponding post office, instead of the true area

overhead - Rennie''s ZIP code map of Los Angeles
note unusual shapes of zones and boundaries

Canadian Postal Code

the first 3 digits of the Canadian postal code define a Forward Sortation Area which is a useful unit for mapping (average population around 20,000) and is hierarchically nested within provinces

the full 6 digits provide resolution of a few block faces

files exist which allow the 6 digit code to be converted to census reporting zones and latitude/longitude

Problems

postal code systems have great potential as discrete georeferences
however, they have not been designed for this purpose, hence the problems noted above
since their purpose is, in principle, internal to the postal system, it is also difficult to ensure stability through time (codes frequently change)

however, there is great demand for statistics based on postal georeferences because of their applications in retailing and marketing and the ease with which they can be merged with customer account data

DISCRETE GEOREFERENCING
D. US PUBLIC LAND SURVEY SYSTEM

PLSS is the basis for land surveys and legal land description over much of the US
unlike the previous systems, it is designed to reference land parcels

because it is a comprehensive, systematic approach it is possible to use it as a georeference
commonly used by agencies such as the Bureau of Land Management and the US Forest Service, and within the oil and gas industry.

packages exist to convert PLSS descriptions to latitude/longitude

PLSS References

handout - US public land survey system (not included, see Strahler and Strahler 1987, pp. 485-487).

begin with a surveyed principal Meridian, several of which were laid out as north-south baselines in the Western US

the area on both sides of the meridian is then blocked off in 6 mile by 6 mile areas, identified by township and range numbers
since this is a square grid system the township and ranges must be offset as one moves NS along the meridians

the 36 square mile sections within each township are numbered from the top in a standard order

each section is divided into four quartersections, and these can be further divided if higher spatial resolution is needed, as for example in describing the location of an oil well

PLSS is most effective where the simple rules were followed closely, however:
much of the Northeast was settled long before the advent of the PLSS
there are major variations in the Southwest where the PLSS runs up against areas of early Spanish land tenure
errors in the early surveys have become embedded in the system and must be replicated in packages which offer PLSS to latitude/longitude conversion


E. GEOLOC GRID

handout - GEOLOC description (3 pages)

an elaborate and more systematic example is provided by the GEOLOC geographical referencing system (see Whitson and Sety, 1987), which can be used to index every 100 acre parcel in the continental US

GEOLOC References

the first level of partition consists of 2 rows and 3 columns, each partition or tile being 25 degrees of longitude by 13 degrees of latitude
these tiles are ordered row by row from the top left (Pacific Northwest) and numbered 1 to 6

at the next level, each tile is divided into 26 rows of one half degree latitude and 25 columns of one degree longitude, the area covered by one 1:100,000 USGS quadrangle.
each of these subtitles is given a two letter designation using a letter to represent the row (A through Z) and one to represent the column (A through Y)

each subtile is divided into 4 rows and 8 columns of 7.5 minute quads, numbered row by row from 1 to 32

at the next level, these are divided into 4 rows and 2 columns, designated by assigning the letters A through H row by row

finally, each of these divisions is divided into 5 rows, lettered A through E, and 10 columns numbered 0 through 9 to produce 50 cells of approximately 100 acres each

an example of a full designator for a 100 acre parcel (in the Los Angeles area) is 4FG19DC6

Precision

hierarchically nested systems like GEOLOC, and to some extent PLSS, allow the user to vary spatial precision depending on the application
4FG19 would identify a 7.5 minute quadrangle, or an area roughly 9 miles across
the full 4FG19DC6 gives an area roughly 2000 ft across


F. CENSUS SYSTEMS

note: this topic was introduced in Unit 8; a handout and overhead from that unit are reproduced here

the major source of social and economic data in many countries is the Census

statistics are collected and reported using a complex system of several different types of reporting zones:
political or administrative units used for reporting (province, county, city, electoral district)
units defined for ease of data collection (block, block group, enumeration district) but often too small to use for data reporting due to privacy regulations
units designed to be homogeneous for ease of analysis (census tract)

in the US the major units are:
block group (formerly enumeration district)
the smallest reporting unit, about 1000 population
census tract
primarily in large cities, about 5000 population, intended for analysis
Minor civil division (mostly on township boundaries)
County
State

overhead - Hierarchy of census areas, 1990 census
handout - US census units


Converting to georeferences

for the larger units, the main method of converting from census zone to georeference is through boundary files, which are digitized boundaries established for most of the major units and readily available from vendors or the Bureau

for a smaller unit such as the block group (formerly ED) it is often possible to obtain from the Census Bureau a representative point or centroid which can be used as a georeference
for units with uneven population distribution the centroid may be located in the area of highest population density


G. ISSUES CONCERNING DISCRETE GEOREFERENCING
Hooks


is useful to consider how many different reference systems are related to specific datasets
i.e. TIGER has street addresses, census zones and lat/long associated with each record
allows linking of many different data sources

Purpose

many of these systems were set up for special purposes, and have only later become the basis for general georeferencing
e.g. post office does not have a mandate to maintain these systems for georeferencing purposes, therefore will only add ZIPs when mail is delivered to the location
zones may change without notice or record
e.g. census is only updated every 10 years

as a result, these systems do not necessarily have "quality control" in the georeferencing sense
no agency maintains a file of new addresses

Standardization

general purpose systems such as GEOLOC use regular divisions of the earth''s surface, while special purpose systems tend to use irregular divisions

in the past, efforts have been made to impose greater regularity on discrete georeferences
e.g. "gridiron" system of rectangular street networks (Washington, DC)
in the last century some city names were changed so that no two places in a single state had the same name
introduction of the ZIP code

however, such standardization efforts generally are not consistent or long-term
rectangular street networks are no longer in fashion
referencing systems such as PLSS are now fairly chaotic despite simple principles
ZIPs are not consistent

given their usefulness, is it possible to set up a single, common system of discrete georeferencing?


REFERENCES
Strahler, A.N. and A.H. Strahler, 1987. Modern Physical Geography, 3rd edition, Wiley, New York. Contains a thorough description of the US PLSS.

U.S. Department of Commerce, Bureau of the Census, 1988. Tiger/Line File: Boone County, Missouri, Technical Documentation, Washington, D.C.

Whitson, J. and M. Sety, 1987. "GEOLOC Geographic Location System", Fire Management Notes, 46:30-32.


DISCUSSION OR EXAM QUESTIONS

1. Determine the resources available to you in geocoding street addresses for your local area. What sources exist for obtaining (a) street index (DIME or TIGER) files, (b) address matching software, (c) maps with address ranges marked on streets? Estimate the time it would take to geocode 1000 addresses in this area using various combinations of these resources. What percentages of hits and misses would you anticipate? Estimate the cost per address which you would have to charge a sponsoring agency for such a project.

2. Discuss the usefulness of the PLSS as a georeferencing system in your local area. How complete is it? What local agencies or organizations make use of the PLSS? What is its relationship to the local system of land tenure?

3. Determine the 5 discrete georeferences described in this unit for your own residence. What problems do you have in doing this? What is the potential or actual precision of each method?

4. Discuss the ways in which the system of discrete georeferencing in the US (or your own country) might be improved. What is the appropriate level or agency of government to sponsor or undertake such an improvement? Which existing system of georeferencing should it be based on? Who are the potential users of such a system, and how might cost be shared?

STORAGE OF COMPLEX OBJECTS
A. INTRODUCTION

B. REPRESENTATION OF SIMPLE SPATIAL OBJECTS

C. STORAGE OF OBJECT ATTRIBUTES

D. REPRESENTATION OF TOPOLOGY
Relationships in networks
Relationships between areas
The CanSIS data structure

E. DISADVANTAGES OF ARC-BASED REPRESENTATIONS

F. OTHER ISSUES ABOUT DATA STRUCTURES

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES



UNIT 30 - STORAGE OF COMPLEX OBJECTS

Compiled with assistance from David H. Douglas, University of Ottawa


A. INTRODUCTION


previous units have been concerned with specifying and transforming locations

GIS deals with objects such as lines and areas occupying extended locations, and with the complex relationships between them

spatial data normally represented in vector systems as:
objects (points, lines and areas)
attributes associated with objects
relationships between objects

this unit considers how to construct objects out of sets of coordinates, and how to create digital representations of attributes and relationships?
many alternatives exist for structuring spatial data within a digital store
here we review some of the most common which have been proven useful by years of experience and application


B. REPRESENTATION OF SIMPLE SPATIAL OBJECTS

spatial objects - points, lines, areas - can be coded as x,y coordinate pairs:
point: (x,y)
line: (x1,y1), (x2,y2), ... , (xn,yn)
area: (x1,y1), (x2,y2), ... , (xn,yn)
note that the digital representation of the three spatial objects is identical, n=1 in the first case
note the convention used throughout this unit:
the name of the record type, followed by a colon, then the items forming the record

to construct a line or area, we simply connect each consecutive pair of points with straight lines

in the case of an area object we might insist that the last point be the same as the first, or alternatively assume that the last point is connected to the first by a straight line to close the area

points need not always be connected by straight lines - see later discussion in this unit

C. STORAGE OF OBJECT ATTRIBUTES

attributes of objects can be stored as tables

for points, the coordinates can be included as two additional attributes for each object, so that the entire data structure can be a simple table

this is not possible for lines and areas because of the variable number of coordinates
the data structure usually consists of two parts:
coordinates in one file, each set representing a single object identified by a unique ID
attributes in a table with one attribute identifying the objects to which each is linked

in various GIS products are a number of different names used for these associated files:
Attributes: Descriptive Data Set (DDS), Polygon Attribute Table (PAT)
Coordinates: Geometry, Image Data Set (IDS), Locational Data. Geography

databases of this type, populated by objects and their attributes, are common in cartographic or CAD (computer assisted design) databases.
many common packages for mapping use this structure
SAS/GRAPH and ATLAS (from Strategic Locations Planning) are examples


D. REPRESENTATION OF TOPOLOGY

the key to a GIS data structure, as distinct from cartographic databases, is the emphasis on the coding of relationships between objects
in GIS, the term topology is used to refer to these relationships between objects

however, the term topology has a much more precise meaning in mathematics
topological properties are those which are preserved when an object is stretched or distorted, and are therefore distinct from geometrical properties
e.g. a circle can be stretched to form any shape of polygon, but no amount of distortion will make it into a cube

there is an enormous range of possible relationships between objects (see Unit 12 for a detailed discussion of relationships)
simple examples include "nearest to", "crosses", "is connected to"
these expressions can be used to relate two objects together
for example, each object might be given an attribute which is the ID of the nearest other object in the same class, thus coding a relationship between pairs of objects

two specific types of relationships are often coded in GIS databases:

relationships in networks
relationships between areas

Relationships in networks

networks consist of two types of objects:
lines, also known as links, edges or arcs
nodes, also known as intersections or junctions

a simple way to code relationships between links and nodes is to give each link two additional attributes - the IDs of the nodes at each end (to-node and from-node)

overhead and handout - Relationships in networks

thus there will be two types of records:
1. arc coordinates: (x1,y1), (x2,y2), ... , (xn,yn)
2. arc attributes: to-node, from-node, length, attributes

using this structure, it is possible to navigate from link to link by searching for links with matching node numbers
the DIME datasets created by the Bureau of the Census for the 1970 Census used this concept to code US street networks
each node or intersection was given a unique ID
the TIGER line files do not have unique node IDs, but the network can be navigated by matching the (x,y) coordinates of link endpoints

however, searching for matches is not particularly efficient

a better data structure would list the links at each node
this could be done by adding a third type of record:
3. node: (x,y), adjacent arcs (positive for to- node, negative for from-node)
see overhead

finally, it is awkward to have to store a variable number of arc IDs for each node, so it might be better to use two files: 1. a simple ordered list which compresses the node file into a string of arc IDs 2. a table in which the position of the first arc in the list is stored for each node
positionarc nodeposition 1 1 a 1 2 -5 b 3 3 3 c 6 4 2 d 9 5 -1

6 -4 7 -2 8 5 9 4 10 -3


Relationships between areas

knowing adjacency is important when working with area objects
many programs are more efficient if we know which areas share common boundaries

many systems store boundaries as several individual arcs and include arc attributes (pointers) which indicate which polygon falls on each side of the arc

by storing common boundaries, instead of complete polygon boundaries, can avoid:
duplication in digitizing
problems which arise when the two versions of each common boundary do not coincide

many systems would store this set of three areas using three datasets:
overhead/handout - Relationships between areas
a polygon attribute table
an arc attribute table
a set of (x,y) pairs representing the arc geometry
note: in ARC/INFO these are referred to as the .PAT, .AAT and .ARC files respectively

disadvantages:
to construct polygons, must search for arcs with correct polygon IDs and then match node numbers
for polygon B above, the result would be arcs 3, 4 and 5, with 5 in reverse order
this data structure cannot represent area objects which are fragmented - islands, for example

The CanSIS data structure

an example of a more fully developed data structure is the database of the Canadian Soil Information System (CanSIS)
developed by the Canadian Department of Agriculture in the 1970s

has four interrelated datasets, with pointers

a very simple summary of the CanSIS structure''s four datasets is: 1. Object: attributes, first-polygon, last- polygon

soil types would be coded as objects
an object can describe many discontiguous polygons sharing the same attributes 2. Polygon: object ID, next-polygon, first-arc, last-arc
here "object" is the object of which the polygon is a part 3. Arc: R-polygon, L-polygon, next-R-arc, next-L- arc, previous-R-arc, previous-L-arc, first-point, last-point
the arc pointers are to the next arcs around the left and right polygons

diagram








first-point and last-point identify the first and last (x,y) pairs of this arc in the point data below 4. Point: (x,y)
the points owned by each arc are stored in sequence in this dataset

note how each type of record points to records of other types
e.g. each object points to the first and last polygons forming the object
e.g. each arc points to the polygons on its left and right, and also to other arcs

STORAGE OF COMPLEX OBJECTS
E. DISADVANTAGES OF ARC-BASED REPRESENTATIONS

areas do not always exhaust the space
the method may be inefficient for coding data sets which consist of isolated polygons, e.g. woodlots in an agricultural area, various types of land use, house footprints on an urban map

areas often overlap
a database of old burns in a forest contains polygons which may overlap and do not exhaust the space, so there are few if any common boundaries

although the great majority of programs work better for arcs than for polygon representations, it is sometimes necessary to rebuild complete polygons from arcs, e.g. for display when a polygon is to be filled


F. OTHER ISSUES ABOUT DATA STRUCTURES

the network and area data structures discussed above reflect common practice in existing GIS, but are far from comprehensive

a data structure must be chosen to balance the need for: 1. efficient processing
arcs are more efficient than polygons for many operations 2. accurate modeling of reality
objects are abstractions of reality; the conditions imposed, e.g. non-overlapping polygons, will affect the accuracy of the abstraction

the conceptual structure of the data which the system presents to the user need not be closely related to the actual data structure
the simple structures described above can be used to present much more complex views to the user

Example: some systems allow the user to work with complex features, which are aggregates of simple features
a simple feature such as a point can be part of several complex features
this idea is useful in utilities applications, where it may be necessary to group together several objects, such as a house, land parcel, pipe, shutoff valve and gas meter, into a complex object ("account")

Example: analysts of spatial information must often deal with the fact that reporting zones, such as counties, change from time to time

Great American History Project - to analyze the spatial distribution of the US population by county since 1800 requires a database which can present the user with different views of the set of counties at different times, as boundaries change
one solution is to define a common set of arcs, but to build them selectively into area objects at each time period
the arcs list contains every line which has ever been a part of a US county boundary
the boundaries of objects (counties) are defined differently at each time period
an arc is part of the network of boundaries at time period t if the polygon IDs on its right and left belong to different objects at time t
data structure would have these record types: 1. Object: attributes at time t 2. Polygon: objects to which polygon belongs at each time 3. Arc: L-polygon, R-polygon


REFERENCES
Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment, Clarendon Press, Oxford. See Chapter 2.

Haralick, R.M., 1980. "A Spatial Data Structure for Geographic Information Systems," in H. Freeman and G.G. Pieroni, eds., Map Data Processing, Academic Press, New York.

Peuker, T.K., and N. Chrisman, 1975. "Geographic Data Structures," American Cartographer 2(1):55-69.

van Roessel, J.W., and E.A. Fosnight, 1984. "A relational approach to vector data structure conversion," Proceedings, International Symposium on Spatial Data Handling, Zurich, pp. 78-95.


DISCUSSION AND EXAM QUESTIONS

1. Make a list of the kinds of relationships which can exist between pairs of spatial objects, for each pair of points, lines and areas, e.g. point to point, point to line, area to point etc. Are there any examples of relationships between triples of objects, e.g. point-point-point?

2. Write out the CanSIS data structure for a simple map of three or four polygons, forming an equal or smaller number of objects (include the x,y coordinate pairs) (need to include a drawn example).

3. The GIS industry has traditionally provided data models which assume that within any one layer of the database, polygon objects do not overlap, and exhaust the space available. Comment on the degree to which this assumption has limited the application of GIS databases in specific areas. Are these sufficiently significant to warrant a change of data models in the future?

4. Discuss areas of application in which the concept of a complex feature type would be useful. What operations would you want to perform on complex and simple features respectively?

EFFICIENT STORAGE OF LINES - CHAIN CODES
A. INTRODUCTION
Terms

B. TECHNIQUES FOR REPRESENTING IRREGULAR LINES
1. Straight line segments
2. Arcs of circles
3. Splines

C. STORING CHAINS (ARCS)
Advantages and disadvantages
Freeman chain code
Compressing code
Repeating sequences
Summary

D. APPLICATIONS OF CHAIN CODES
Vectorizing raster data
Vectorizing scanner output

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES



UNIT 31 - EFFICIENT STORAGE OF LINES - CHAIN CODES

Compiled with assistance from David H. Douglas, University of Ottawa


A. INTRODUCTION


the previous unit looked at structures which encode certain types of spatial relationships
in this unit we look at the coding of the geometry of lines and areas

overheads - Examples of geographic lines
note the different characteristics of each of these lines
each of these lines suggests different geographic features - can you identify them?

most systems store lines and areas as sequences of points connected by straight lines
is simple, but best for representing lines with sharp breaks of direction, not smooth curves
for example, a meandering river or a railroad could be represented very much more efficiently as a few smooth curves than as a large number of straight lines
this could greatly reduce the effort of digitizing

there are many methods of discretizing a line
this unit looks at some of them, and the applications where they are useful

Terms

because area objects are mostly represented by straight line segments they are often referred to as polygons

a variety of terms are used to describe an irregular line coded as a sequence of straight line segments, particularly when the line is the common boundary between two area objects
these terms include arc, segment, edge, and chain
of these, chain has been established as the standard terminology for US digital cartography when the line is a common boundary, but arc is probably used most often

B. TECHNIQUES FOR REPRESENTING IRREGULAR LINES
1. Straight line segments


if the line on the ground is curved, e.g. a river bank, then straight line segments can only approximate the truth
the degree of approximation depends on the length of straight segment used, but the segments would have to be infinitely short to represent a true curve

thus, straight line segments are a way of representing a continuous line using a finite amount of information, often the coordinates of the segment endpoints

2. Arcs of circles

some systems oriented toward survey data allow both straight lines and arcs of circles between points

for curves, the system must store the radius of curvature as well as the start and end points
it must also correctly identify those segments which are straight lines and those which are arcs

this approach is used in Prime''s System9 GIS

is useful for engineered features, like highways and railroads, which are designed as straight lines and curves

3. Splines

can be used for describing smoothly curving features like meandering rivers
the curve is modelled as a spline function, a mathematical function that passes through specified points but has minimal curvature

splines are often used to smooth the curves drawn by plotters as part of the contouring operation

are more commonly used in CAD than in GIS


C. STORING CHAINS (ARCS)

there is often significant redundancy when a line is stored as a sequence of coordinate pairs
for example, a curved street represented by four points in Columbia, MO might have coordinates as follows:
(38.9519, 92.3503)

(38.9519, 92.3510) (38.9522, 92.3511) (38.9522, 92.3527)


note that the first four digits of each coordinate (latitude,longitude) are the same for every point

can economize greatly on storage by storing the offset from the previous point, in units of 0.0001 degrees:
(38.9519,92.3503) (+00,+07) (+03,+01) (+00,+16)


this would allow every subsequent point to be stored in 4 decimal digits instead of 12, or roughly 12 bits instead of 36
also need one bit each to store the signs of the change in longitude and latitude,
for a total of 14
however this will limit the maximum change between any pair of points to .0099 degrees latitude or longitude

Advantages and disadvantages

the advantage is reduction in storage volume

the disadvantage is loss of generality:
we now have a fixed spatial resolution (.0001 degrees) and maximum distance between points (.0099 degrees latitude or longitude)
this creates problems when using global references and converting coordinates

variations on this technique are particularly applicable for data obtained from scanners and digitizers
these devices have built-in resolution
scanners have a resolution equal to the pixel size of the scanner, and all displacements between points are actually whole numbers of pixels
digitizers also work in discrete units

Freeman chain code

instead of strings of coordinate pairs, systems which rely heavily on scanner input often code lines as lists of incremental movements
a fixed number of move directions is established, usually 8, and assigned the integers 0 through 7 (the "Queen''s case" move set):

diagram








a series of four moves up would be coded as four 0s

the string "01012" would be used to code the following curve:
diagram










each step along the line requires a digit between 0 and 7, or three binary digits between 000 and 111

this technique of incremental coding of lines or chain code is usually associated with Herbert Freeman
however, it was described much earlier in the 19th Century by Francis Galton as a means of transmitting information on fingerprints over a telegraph

see Freeman (1961) for description of a number of algorithms based on chain codes

Compressing code

certain pairs of codes are much more likely to occur in chain codes than others
lines generally turn slowly, and sharp bends are unusual in much geographic data
the next move is most likely the same as the previous one
next most likely is 45 degrees right or left
180 degrees is the least likely next move
thus, a code 4 is most likely to be followed by another 4, or a 3 or a 5, and least likely to be followed by a 0, 1 or 7

this can be exploited by using a code whose length depends on the sharpness of the turn angle:
1 ahead 01 45 degrees right 001 45 degrees left 0001 90 degrees right 00001 90 degrees left 000001 135 degrees right 0000001 135 degrees left


in this way, the total number of binary digits needed to represent a line can be reduced
overhead - Compressing code


Repeating sequences

if the line being coded has long straight stretches, sequences of chain codes will be repeated
overhead - Repeating sequences


for example, to go from one point to the other requires 6 moves to the east and 3 to the northeast (3 1s and 6 2s).
a sequence such as 222222111 would be a serious distortion of the straight line
the straight line is best approximated by the sequence 221221221
repeats the 221 pattern 3 times

straight is always best approximated by a homogeneous mix of codes

to code lines with long straight sections, a way of representing runs of patterns is needed
for example 3(221) might indicate four repeats of the 221 sequence

Summary

handout - Comparison of different encoding schemes

EFFICIENT STORAGE OF LINES - CHAIN CODES
D. APPLICATIONS OF CHAIN CODES

chain codes are useful in two particular applications:

Vectorizing raster data

output from raster data along the boundary separating zones A and B would appear as follows:
A A A B A A B B A B B B B B B B


a line can be assumed along the pixel edges which separate the A''s from the B''s
a vector representation of the line could be coded in chain code as 202020

Vectorizing scanner output

when a line is read by a scanner it looks something like this:
0 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 0


1''s indicate where the scanner "saw" the line and 0''s where it "saw" the blank spaces around the line

the first step in deriving a vector representation of the line is to "thin" the pixels using a standard algorithm
are many different algorithms available for this
a good source on these is Pavlidis (1982)

the result of thinning will be:
0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0


the line can then be derived by following the remaining skeletal 1s
represented in chain code as 111 (three moves northeast)

usually stored with the x,y coordinates of the line followed with the chain code



chain code is most useful in applications where vector data originated in raster form, from a scanner or an image
since it discretizes both distance and direction (both can take a limited set of possible values), it is not particularly good for more general applications


REFERENCES
Freeman, H., 1961. "On the encoding of arbitrary geometric configurations," Institute of Radio Engineers, Transactions on Electronic Computers, EC10:260-8.

Pavlidis, T., 1982. Algorithms for Graphics and Image Processing, Springer-Verlag, Berlin.


DISCUSSION AND EXAM QUESTIONS

1. Using a sample line drawn on a sheet of paper, write out (a) the Freeman chain code representation of the line using single digits ranging from 0 to 7, (b) its representation using the binary code based on turn angles, and (c) its representation using run encoded chains. Which option gives greater data compression? (express the length of each code in bits, and remember that a number between 0 and 7 requires 3 such bits) (need to include a drawn example).

2. The answer to the previous question depends on the type of line being coded. Discuss the characteristics which would give greatest data compression for each of the three options, and describe examples of lines with these characteristics.

3. The accuracy with which a line is represented as a series of straight line segments depends on the process used to select points during digitizing. Discuss the criteria you would use to select points in order to obtain an accurate representation of a line. How would these criteria change depending on the nature of the line?

4. Using a simple classified raster, write out a vectorized representation using arcs and Freeman chain code. Include the left and right polygons and pointers to adjacent arcs as attributes of each arc, and the classes as attributes of each polygon. Show each line as a coordinate pair for the

beginning point, followed by a chain code using digits in the range 0 through 7.

5. In preparation for the Unit 32, figure out how you would set up a program to determine where two lines, defined by their end points, intersect. Start with a diagram. Under what conditions will your program fail?

SIMPLE ALGORITHMS I - INTERSECTION OF LINES
A. INTRODUCTION

B. DEFINITIONS
Algorithms
Heuristics

C. SIMPLEST CASE
Question
Procedure
Solution
General form
Simple program

D. SPECIAL CASES
Vertical lines
Parallel lines
Solution

E. COMPLEX LINES
Minimum enclosing rectangle
Monotonic sections
Sorting lines

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES



UNIT 32 - SIMPLE ALGORITHMS I - INTERSECTION OF LINES

Compiled with assistance from David H. Douglas, University of Ottawa and David M. Mark, State University of New York at Buffalo


A. INTRODUCTION


the intersection of two lines is a critical operation in GIS
is used in polygon overlay operations, merging and dissolving polygons and lines
is the basis for point in polygon operations and critical for sliver removal

therefore, an efficient algorithm to determine the intersection of two lines is very important in any GIS.

GIS algorithms for complex processes are often built up from simple ones
this section will review a few simple algorithms, later sections will show how they can be built up into complex operations

the first operation here determines if two lines cross
begin by examining the algorithm for two straight lines, then go on to two complex lines or polygons.
eventually will see that this algorithm forms the core of numerous GIS operations, including the point in polygon and polygon overlay processes.

the algorithm illustrates one of the principles of this type of programming, that there are numerous special cases which have to be dealt with.


B. DEFINITIONS
Algorithms


an algorithm is a procedure consisting of a set of unambiguous rules which specify a finite sequence of operations that provides the solution to a problem, or to a specific class of problems

each step of an algorithm must be unambiguous and precisely defined
the actions to be carried out must be rigorously specified for each case

an algorithm must always arrive at a problem solution after a finite number of steps
this must also be a reasonable number of steps

every meaningful algorithm provides one or more outputs

it is preferable that the algorithm be applicable to any member of a class of problems rather than only to a single problem

in general, the cost of obtaining a solution increases with the problem size
if the size of the problem is sufficiently small, even an inefficient algorithm will not cost much to run
consequently, the choice of an algorithm for a small problem is not critical unless the problem has to be solved many times.

Heuristics

an heuristic is a rule of thumb, strategy, trick, simplification, or any other kind of device which drastically limits the search for solutions in large problem spaces
they do not guarantee optimal solutions, in fact they do not guarantee any solution at all
(definition from Feigenbaum, E.A. and J. Feldman, eds., 1963, Computers and Thought, McGraw-Hill, p.6)


C. SIMPLEST CASE
Question


Does the line from (4,2) to (2,0) cross the line from (0,4) to (4,0)? If so, where?

Procedure

find the equations of the two lines, and solve them simultaneously for the intersection
the equation of a line is:
y = a + bx where b is the slope



given two points on the line at (x1,y1) and (x2,y2), the slope b can be determined by the expression:
b = (y1 - y2) / (x1 - x2)


the value of a can then be found by solving the equation using either point

Solution

for line 1 above
b = (2 - 0)/(4 - 2) = 1


using point 1 2 = a + 4 thus a = -2
the equation is y = -2 + x

for line 2 similarly, y = 4 - x

solving simultaneously, the two lines intersect at (3,1)

General form

in general, the two lines
y = a1 + b1x and y = a2 + b2x

intersect at:

xi = - (a1 - a2) / (b1 - b2) yi = a1 + b1xi



however, this merely finds the intersection point between two lines of infinite length passing through each pair of points
it is not yet established that the intersection point lies between the pairs, rather than on the imaginary extensions of one or both lines:

diagram







an intersection point at xi lies between x1 and x2, i.e. on line 1, if:
(x1 - xi) (xi - x2) >= 0



similarly the point lies on line 2 with endpoints (u1,v1) and (u2,v2) if:
(u1 - xi) (xi - u2) >= 0


because either line can be vertical (e.g. x1 = xi = x2) we must also check similar conditions on the y coordinates to be sure the lines intersect

Simple program

overhead/handout - Simple program to compute the intersection of two lines

the handout lists a rudimentary program for determining if two lines cross
x and y are used for line 1, u and v for line 2
input x1,y1 input x2,y2 input u1,v1 input u2,v2

b1 = (y2-y1)/(x2-x1) (A) b2 = (v2-v1)/(u2-u1) (B)

a1 = y1-b1*x1 a2 = v1-b2*u1

xi = - (a1-a2)/(b1-b2) (C) yi = a1+b1*xi if (x1-xi)*(xi-x2)>=0 AND (u1-xi)*(xi-u2)>=0 AND (y1-yi)*(yi-y2)>=0 AND (v1-yi)*(yi-v2)>=0 then print "lines cross at",xi,yi else print "lines do not cross" end if

SIMPLE ALGORITHMS I - INTERSECTION OF LINES
D. SPECIAL CASES

unfortunately this program will get into trouble in certain special cases:

Vertical lines

if line 1 is vertical, the instruction labeled (A) will cause an error because of an attempt to divide by zero, as numerical processors cannot deal with infinity

similarly if line 2 is vertical, line (B) will cause an error

Parallel lines

if the two lines are parallel, line (C) will cause an error

Solution

to deal with these special cases we must make the program a little more complex:
handout - Revised program to calculate intersection


these special cases occur in many simple geometrical algorithms.
they can be avoided to some extent by using different approaches


E. COMPLEX LINES

consider two complex lines of n1 and n2 straight line segments respectively:
diagram









these can be processed for all intersections by looping the simple algorithm, testing every segment in one line against every segment in the other
the amount of work to be done is proportional to the product (n1 x n2)

can reduce the amount of work by using a heuristic that will save future computation
although this requires an additional processing step, overall processing time should be reduced

examples of such methods are:

Minimum enclosing rectangle

a minimum enclosing rectangle (MER) of a line is defined by the minimum and maximum x and y coordinates of the line
diagram








a very rough check for intersection can be made by seeing if the enclosing rectangles of two lines overlap overhead - Minimum enclosing rectangles
if they do not intersect, the lines cannot intersect
if they do intersect, then find the MERs for each straight segment of each line to see which, if any of these intersect

Monotonic sections

can divide each line into sections which are monotonically increasing or decreasing in x and y

monotonic segments are such that:
a straight line parallel to either x or y axis cuts the section at most once
there is a break where ever x or y hits a local maximum or minimum

overhead - Monotonic sections

this sets up conditions which can be used to reduce the amount of work done in looking for intersections.
as the segment continues to increase in one direction, it cannot turn and intersect the other line again

overhead - Intersection of monotonic sections

this is the approach used by ARC/INFO and other vendor GISs
the number of calculations required to determine the intersection of two complex lines drops from a value proportional to (n1 x n2) to a value proportional to (n1 + n2)

Sorting lines

when there are many lines with many intersections to be processed, as in the overlay of two complex coverages
can sort the lines by their ranges so that only lines in similar ranges will be considered together


REFERENCES
Douglas, David H., 1974. "It makes me so cross," a paper distributed by the Laboratory for Computer Graphics and Spatial Analysis, Graduate School of Design, Harvard University, September 1974. (This note has been reprinted numerous times in computing magazines, and in Marble, Calkins and Peuquet, 1984.)

Lee, D.T. and F.P. Preparata, 1984. "Computational geometry: a survey," IEEE Transactions on Computers C-33(12):1072- 1101. A good introduction to basic algorithms for geometrical problems.

Little, J.J. and T.K. Peucker, 1979. "A recursive procedure for finding the intersection of two digital curves," Computer Graphics and Image Processing 10:159-71.

Marble, Duane F., Calkins, Hugh W. and Peuquet, Donna J., eds., 1984. Basic Readings in Geographic Information Systems, Williamsville N.Y., SPAD Systems Ltd.,

Saalfeld, Alan, 1987. "It doesn''t make me nearly as CROSS," International Journal of Geographical Information Systems 1(4), pp 379-386

Taylor, George, 1989. "Letters," International Journal of Geographical Information Systems 3(20):192-3. Another look at the line intersection problem.


DISCUSSION AND EXAM QUESTIONS

1. Why is the "crossing segment" or line intersection problem so important in GIS?

2. Identify the rules which can be used to limit searching for intersections between two lines which are both monotonic in x and y. Each line will be one of four combinations - either increasing or decreasing in x, and either increasing or decreasing in y. You will need to deal with 16 combinations when discussing the options for two lines.

3. Review and discuss the technique for line intersection described in Little and Peucker, 1979.

4. Compare raster and vector approaches to the determination of the intersection between two lines. Are there circumstances under which a raster approach might be preferable?

5. Since geographical data is never perfectly accurate, the special cases identified in the algorithm should never occur precisely. Modify the algorithm to deal with special cases by treating data as imprecise. What are the advantages and potential problems with such an approach?

SIMPLE ALGORITHMS II - POLYGONS
A. INTRODUCTION

B. POLYGON AREA
Objective
Method
Problems
Calculating areas of many polygons at once

C. POINT IN POLYGON ALGORITHM
Objective
Generalization
Strategy
Sample code
Special cases
Fuzzy boundaries
Many points in many polygons

D. CENTROID LOCATION
Method

E. SKELETON
Applications

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES



UNIT 33 - SIMPLE ALGORITHMS II - POLYGONS


A. INTRODUCTION


the previous unit looked at a simple algorithm for determining the intersection of two lines

this unit considers at a second simple algorithm and at some extensions


B. POLYGON AREA
Objective


find the area of a polygon which is defined by a sequence of vertices

Method

common method is to use an algorithm that finds the area of a number of trapezoids bounded by:
a line segment
vertical lines dropped to the x axis
the x axis

overhead - Calculating area of trapezoid
diagram









1. Drop vertical lines to the x axis from two adjacent vertices

find the area of the trapezoid so defined:
A = (x2-x1) * (y2+y1) / 2



diagram








i.e. area is difference in x''s times average of y''s

2. Continue around the polygon, finding the area of each trapezoid defined by each line segment
3. Sum the areas of all trapezoids defined to get the area of the polygon:

(note: must have (x(n+1),y(n+1)) = (x(1),y(1)) )
A = S (x(i+1)-x(i)) * (y(i+1)+y(i)) / 2 )


when x(i+1) &LT x(i) the contribution to area is negative
this allows for trapezoids formed by the lower portion of the polygon to be subtracted so that, in the end, the total A is the area of the polygon it self

Problems

this algorithm works fine when the polygon has holes and overhangs

does not work with polygons whose boundaries self-cross (i.e. figure 8)

if the polygon is digitized counterclockwise its area is negative

problems occur when the polygon has negative y values:
cannot "drop" a perpendicular to the x axis
solutions:
add a large value to all y''s
drop vertical lines to points below the lowest y value

if the coordinate system is very precise, i.e. UTM with large numbers, will lose accuracy as a result of the comparative lack of precision of the computer
therefore, does not work on small polygons in large coordinate systems
solution:
move the origin temporarily up to a location near the vertices
this reduces the size of the coordinate values

Calculating areas of many polygons at once

if polygons are defined by arcs with L and R polygons identified
can do calculations of trapezoid area for each arc
keep a running total of the area of each polygon

L polygon gets negative area
R polygon gets positive area

after processing each arc, we have the areas of each polygon
outside world will have area equal to the negative of the total area of all polygons


C. POINT IN POLYGON ALGORITHM
Objective


determine whether a given point lies inside or outside a given polygon

Generalization

find out which polygon of a set contains each of a set of points

examples
determine which county contains each of a set of customers identified by their coordinates, using a county boundary file
identify the attributes of the land use polygon containing a given house

notation
the point is at (u,v)
the polygon has n points, (x(i),y(i)), i=1,...,n, and is closed by making (x(n+1),y(n+1)) = (x(1),y(1))

Strategy

1. draw a vertical line upwards from the point to infinity
2. count the number of times this line intersects the boundary of the polygon

if the count is odd the point is inside
if it is even the point is outside

Sample code

overhead - Simple point in polygon program
handout - Simple point in polygon program

ni = +1 for i=1 to n if x(i+1) &LT> x(i) then (A) if (x(i+1)-u) * (u-x(i)) >= 0 then (B) if x(i+1) &LT> u or x(i) &LT= u then (C) if x(i) &LT> u or x(i+1) >= u then (D) b = (y(i+1)-y(i)) / (x(i+1)-x(i)) a = y(i)-b * x(i) yi = a+b*u if yi > v then ni = ni*(-1) end if end if end if end if end if next i



b is the slope of the line from (x(i),y(i)) to (x(i+1),y(i+1))
by substituting x=u in the equation y=a+b*x we get the y coordinate of the intersection between this line and the vertical line through x=u
then if y(i) > v the intersection must be above the point, and so is counted

the value of ni alternates between +1 and -1
it is flipped every time an intersection is found
at the end, ni=-1 if the point is in, +1 if the point is out

Special cases

lines (A) through (D) deal with special cases, as follows:
A. if the line from (x(i),y(i)) to (x(i+1),y(i+1)) is vertical there can be no intersection

diagram









B. there can only be an intersection if the ith point is on one side of the vertical line through u and the i+1th point is on the other side

diagram









C.D. if either (x(i),y(i)) or (x(i+1),y(i+1)) lies exactly on the line, i.e. the point at (u,v) lies directly below a vertex, we may miscount:

diagram









solution to this is:
if (x(i),y(i)) lies exactly on the line
count an intersection only if (x(i+1),y(i+1)) lies to the right, or
if (x(i+1),y(i+1)) lies exactly on the line
count an intersection only if (x(i),y(i)) lies to the left

Polygon with isolated islands
diagram









works fine

Polygon has hole with an island
diagram







works fine

Concave polygons
diagram









works fine

What if the point is on the boundary?

some point in polygon routines deal with this explicitly, but this does not
sometimes it finds the point inside, sometimes outside

Fuzzy boundaries

Blakemore (see Blakemore, 1984) described a "fuzzy" generalization of the point in polygon problem:
location of the boundary is uncertain
a band of width epsilon is drawn around the boundary, and represents the band within which the true boundary is assumed to lie
points can now be classified as "definitely out", "probably out", "probably in" and "definitely in"

Many points in many polygons

polygons will likely be stored by arc

check all arcs against vertical lines drawn upward from each point overhead - Many points in many polygons
if an arc between polygons A and B intersects, record an intersection with the boundary of both A and B, irrespective of which side of the arc A and B lie on
this will require a counter for each point/polygon combination
not very efficient

a better algorithm is as follows:

when an arc is found to intersect a vertical line, record the y value of the intersection and the polygon lying below the intersection (right polygon if the arc runs from W to E, left polygon otherwise)
for each point, keep account of: (a) the lowest such intersection found, and
(b) the polygon below that intersection

after all arcs have been examined, the polygon below the lowest intersection is the containing polygon
now have just two things to store for each point

will still need to check every arc against every point, unless some sorting is done first

SIMPLE ALGORITHMS II - POLYGONS
D. CENTROID LOCATION

overhead - Examples of centroid locations

the centroid is potentially useful as a representative, central point in the polygon

the centroid can be defined as the point about which the polygon would balance if it were cut out in plywood of uniform thickness and suspended
unfortunately the centroid is not always inside the polygon, which reduces its usefulness as a central point

some programs calculate the centroid by averaging the x and y coordinates of the polygon vertices
this does not give the centroid

diagram









Method

a better algorithm uses trapezoids dropped to the axes, as in the area algorithm
since each trapezoid consists of a triangle and a rectangle, the centroid of the polygon can be found from a weighted average of triangle and rectangle centroids

overhead - Centroid location code
handout - (cont) Centroid location code

X = S ((y(i) - y(i+1)) (x(i)2 + x(i)x(i+1) + x(i+1)2)/6A) Y = S ((x(i+1) - x(i)) (y(i)2 + y(i)y(i+1) + y(i+1)2)/6A) where A is the area of the polygon



note: as with the area algorithm, the polygon must be coded clockwise and all y coordinates must be non- negative

E. SKELETON

is the network of lines inside a polygon constructed so that each point on the network is equidistant from the nearest two edges in the polygon boundary
nodes on the network are equidistant from the three nearest edges

a skeleton is what remains when a polygon contracts
each of its straight edges moves inward at a constant rate
can be thought of as the opposite of the buffer operation

overhead - Polygon skeleton
at the convex corners of the polygon, the bisectors of their adjacent edges form lines that are traced inward

at each concave corner, the reduced polygon consists of the arc of a circle centered on the corner
as the process continues the bisectors and arcs eventually merge, forming a tree-like structure

as the polygon gets smaller, it may form into two or more islands

ultimately, the polygon is reduced to a point
this point is:
furthest from the original boundary
the center of the largest circle one could draw inside the original polygon

Applications

finding good locations for labels for polygons
label might be drawn:
along the skeleton axis of a polygon
at the point remaining after the polygon has been shrunk to a point

breaking a city block into polygons, one for each face of the block, for labeling


REFERENCES
Blakemore, M., 1984, "Generalization and error in spatial databases," Cartographica 21:131-9.

Shamos, M.I., and F.P. Preparata, 1985. Computational Geometry, Springer-Verlag, Berlin. The standard but technically complex work on geometric algorithms.


DISCUSSION AND EXAM QUESTIONS

1. Discuss the results of using the polygon area and point in polygon algorithms when the polygon is not correctly structured (e.g. unclosed, figure-of-eight).

2. Modify the point in polygon algorithm to determine correctly if the point lies on the boundary of the polygon, in addition to inside or outside.

3. Derive the equations for polygon area and centroid from first principles.

4. Modify the polygon area algorithm to test for and accommodate negative y coordinates.

THE POLYGON OVERLAY OPERATION
A. INTRODUCTION
Traditions of polygon overlay use

B. GENERAL CONCEPTS OF POLYGON OVERLAY OPERATIONS
Operations requiring polygon overlay

C. OVERLAY ALGORITHMS
Objective
Given
Procedure

D. COMPUTATIONAL COMPLEXITY

E. INTERSECTION PROBLEMS
1. Adjacent lines
2. Sliver polygons
Removing slivers during overlay
Removing slivers after overlay

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES



UNIT 34 - THE POLYGON OVERLAY OPERATION


Compiled with assistance from Denis White, Environmental Protection Agency, Corvallis, OR


A. INTRODUCTION

the simple algorithms discussed previously form the basis for one of the most complex operations of vector GIS systems - polygon overlay

Traditions of polygon overlay use

the three traditions of polygon overlay are:
1. Landscape planning

superimposing layers of geographical data (e.g. environmental and social factors) so that their spatial relationships can be used in making land use decisions
a key reference is McHarg, 1969, Design with Nature

2. Set theoretic
a polygon can be thought of as representing a set
when two sets (polygons) A and B are overlaid, we have a graphic interpretation of the set concepts of intersection and union

diagram








the area of overlap of A and B is A.AND.B (intersection)
the combined area is A.OR.B (union)

overhead/handout Sixteen combinations of Boolean operations
it is possible to identify 16 such combinations, or Boolean expressions including e.g.
A.AND.(NOT.B), i.e. all of the area of A which is not overlapped by B
NOT.(A.OR.B), i.e. the outside world (= (NOT.A).AND.(NOT.B)
in most polygon overlay operations it is the intersection which is of most interest, i.e. the area which is common to A and B

3. Area interpolation
Given: the population of area A and the fact that areas A and B overlap Estimate: the population of area B

diagram








this problem can be solved by apportioning the population of A so that the amount in the area covered by B is proportional to the area of overlap

this is a simple way of estimating the population of areas from statistics based on other areal units
e.g. estimating populations of voting districts from populations of census tracts

this method assumes that density is uniform within A, but more realistic versions of the technique exist
see Unit 41


B. GENERAL CONCEPTS OF POLYGON OVERLAY OPERATIONS

in GIS, the normal case of polygon overlay takes two map layers and overlays them
each map layer is covered with non-overlapping polygons

if we think of one layer as "red" and the other as "blue", the task is to find all of the polygons on the combined "purple" layer
diagram










attributes of a "purple" polygon will contain the attributes of the "red" and "blue" polygons which formed it
can think of this process as "concatenating" attributes
usually a new attribute table is constructed that consists of the combined old attributes, or new
attributes formed by logical or mathematical operations on the old ones


number of polygons formed in an overlay is difficult to predict
there may be many polygons formed from a pair of "red" and "blue" polygons, with the same "purple" attributes

when two maps are overlaid, will result in a map with a mixture of 3 and 4 arc intersections
four arc intersections do not generally occur on simple polygon maps

Operations requiring polygon overlay

1. Windowing
the windowing operation, in which a window is superimposed on a map and everything outside the window is discarded, is a special case of polygon overlay

2. Buffering
buffering around points, lines and polygons is another case
buffers are generated around each point or straight line segment
the combined buffer is found by polygon overlay

diagram








3. Planar Enforcement

the process of building points, lines and areas from digitized "spaghetti" (see Unit 12)
wherever intersections occur between lines, the lines are broken and a point is inserted
the result is a set of points, lines and areas which obey specific rules:

THE POLYGON OVERLAY OPERATION
C. OVERLAY ALGORITHMS

to overlay polygons it is necessary first to find all intersections between "red" and "blue" boundary lines

arcs are more efficient than polygons for this operation because:
intersections need to be found only once rather than four times
more information is available about labels (see below)

example of overlay operation:
overheads - Simple overlay example


Objective

overlay two maps of different themes and determine the combined attributes of the new polygons
e.g. overlay a soils map on a vegetation map and create a new set of polygons with a new set of attributes

Given
two overlapping polygons as follows:
red map: a polygon with attribute A
blue map: a polygon with attribute 1
the outside world labelled 0 on both maps

two intersecting arcs defining the boundaries of the polygons:
1. Red Map (light lines): (0,1) (0,3) (2,3) (2,1) (0,1) Polygons - Right: A Left: 0

2. Blue Map (heavy lines): (1,0) (3,0) (3,2) (1,2) (1,0) Polygons - Right: 0 Left: 1


Procedure

after intersections have been found, six new arcs are formed, three from arc 1 and three from arc 2:
1. (0,1) (0,3) (2,3) (2,2) 2. (2,2) (2,1) (1,1) 3. (1,1) (0,1) 4. (1,0) (3,0) (3,2) (2,2) 5. (2,2) (1,2) (1,1) 6. (1,1) (1,0)


because the right and left polygon labels are known for each input arc, we know the labels of the new polygons as soon as the intersections have been found
there are four new polygons
their attributes combine red and blue attributes: 00, A0, A1 and 01

the arc right and left labels, deduced from the geometry of the intersections, are:
Arc Right Left 1 A0 00 2 A1 01 3 A0 00 4 00 01 5 A0 A1 6 00 01


a more complex example:
overhead - Complex overlay example


in this case the right and left polygon labels for arcs 1, 2, 4, 5 and 7 would be known from the geometry of the intersections:
1R: A0 1L: 00 2R: A1 2L: 01 4R: A1 4L: 01 5R: 01 5L: 00 7R: A1 7L: A0


the labels of the remaining arcs must be determined

labels can be passed from one arc to another around a polygon:
3R: must be the same as 2R and 4R 6L: from 2L, 4L


arc 3 was part of the red network, so its soils labels are known, the remaining (blue) part of its left label must be the same as the blue part of its right label
3R is A1 - thus 3L is B1
thus 6R is B1

how to get the blue labels of arc 8?
use a point in polygon routine to find the enclosing blue polygon
use a data structure in which arcs on the inside of the polygon boundary "point" to arcs on the outside of any enclosed islands
e.g. 5R -> 4L -> 6L -> 8L -> 2L -> 5R
thus the labels of arc 8 are 8L: 01, 8R: C1

the final step in the algorithm is to identify all new polygons by following around each polygon from one arc to the next until every right and left side of every arc has been identified with a uniquely numbered polygon


D. COMPUTATIONAL COMPLEXITY

polygon overlay is numerically intensive and time consuming, therefore it is the most complex operation of most vector-based GIS programs

notation: if computing time to process n objects is proportional to n, the computational complexity is "order n", or O(n)
if it is proportional to n2, we way it is "order n squared" or O(n2)

it is important to know:
how long does it take to overlay a given number of polygons?

what affects the amount of time taken?
obviously, the number of arcs and polygons affects the number of computations required
it is usually possible to determine the number of polygons being overlaid
the number of arcs is roughly 3 times the number of polygons
other factors, such as the wiggliness of boundaries, affect the time, but it is difficult to measure these

if n1 polygons are overlaid on n2, how many polygons result? (assuming maps are different)
minimum is n1+n2, polygons on the two maps do not intersect at all
maximum is infinity, lines have infinite wiggliness

typical is 3 or 4 times (n1+n2), discounting spurious polygons

if every one of n1 "red" polygons has to be compared to every one of n2 "blue" polygons, the algorithm will be O(n1n2)

if we could immediately find all "blue" arcs likely to intersect with a given "red" arc, we could build an algorithm which would be O(n1), which means it would be much more efficient for a given size of problem

to find arcs in this way we would need an efficient method of spatial indexing
one of the most successful methods uses the moving band concept:
sort both "red" and "blue" arcs in ascending order of x
process the arcs beginning at the left, moving to the right, sweeping a "band" across the map
only those arcs falling in the band are examined
since the arcs are sorted, we can find those in the band on either red or blue maps quickly

some of the best polygon overlay routines now available in the GIS market operate in close to O(n1+n2) time
a map with tens of thousands of polygons can be overlaid on another map with a similar number in an hour of computing time on a moderate-sized machine


E. INTERSECTION PROBLEMS

because of computer precision, lines will be represented in the computer with great precision even though the accuracy of the representation is low

1. Adjacent lines

the following diagram represents a case where two lines cross
diagram










the following diagram represents a similar case where the two lines do not actually cross
diagram










it is necessary to provide algorithms which distinguish between these very different conditions

2. Sliver polygons

overlay algorithms compute the exact intersections between lines (see Unit 14)

in any overlay operation, it is likely that there will be pairs of lines which should coincide, but do not because of differences in digitizing
these are called slivers, spurious polygons or "coastline weave"

overhead - Overlay of two images

if an arc or polygon of n1 points is overlaid on one of n2 points, up to: ( 2n1n2/(n1+n2) - 3 ) slivers may be generated

two possible approaches: a. delete during the overlay operation, or b. delete afterwards

Removing slivers during overlay

this is the most common approach used in commercial GIS programs

this approach deals with the line as if it were fuzzy
diagram








define a tolerance limit for each line which indicates the amount of uncertainty that exists regarding the true position of the digitized line
provides a band of width "epsilon" around every line
for digitized lines this width may be 1 mm

using epsilon, can then conclude that lines which have a difference in position less than epsilon are the same line
these two lines can then be collapsed to represent a single line

Problem

it is easy to get into difficulty:
lines A and B within epsilon, therefore the same:
lines B and C within epsilon, therefore the same:
but lines A and C are not necessarily within epsilon of each other

polygon overlay routines which do sliver removal "on the fly" must deal with this problem

Removing slivers after overlay

need intelligent criteria to distinguish between slivers and real polygons

criteria for removal include:
area: slivers are small
shape: slivers are long and thin
number of arcs: slivers generally have only 2 bounding arcs while real polygons rarely have only 2
alternating attributes: if a "red" arc between polygons A and B is overlaid on a "blue" arc between polygons 1 and 2, the slivers will alternate between A2 and B1

diagram








junctions: slivers terminate in 4 arc junctions, but 3 arc junctions are more common in real polygons
chaining: slivers tend to occur in chains
it is likely that the confidence with which it can be concluded that two arcs are forming slivers will increase steadily as we work along the arc

i.e. the attribute "sliver" is strongly correlated

if a sliver is detected, it can be replaced by an arc along its center line


REFERENCES
McHarg, I.L., 1969. Design with Nature, Doubleday, New York

Goodchild, M.F., and N. S. Lam, 1980. "Areal Interpolation," Geoprocessing 1:297-312.


DISCUSSION AND EXAM QUESTIONS

1. McHarg described the overlay technique well before the advent of GIS and polygon overlay. Discuss the advantages and possible disadvantages of a computer implementation of the technique.

2. Write out and illustrate the 16 Boolean combinations of two polygons A and B.

3. Review and discuss the alternative forms of areal interpolation described by Goodchild and Lam, 1980.

4. Discuss the relative advantages of raster and vector approaches to polygon overlay. Identify the application areas likely to adopt each method given their advantages.

RASTER STORAGE
A. INTRODUCTION
Why use raster?
Objectives

B. STORAGE OPTIONS FOR RASTER DATA
What if there is more than one layer?
What do raster systems store in each pixel?
Raster/Vector combinations

C. RUN ENCODING
Problems

D. SCAN ORDER
1. Row order
2. Row prime order (Boustrophedon)
3. Morton order
4. Peano scan (also Pi-Order or Hilbert)
Comparing scan orders

E. DECODING SCAN ORDERS
Method
Generalization

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES

The latter portion of this unit and the following two require familiarity with numbering systems in base 2 and 4 as well as techniques for conversion between these and decimal. You may wish to provide your students with some background material on these topics before tackling these units.

UNIT 35 - RASTER STORAGE

Compiled with assistance from Donna Peuquet, Pennsylvania State University


A. INTRODUCTION

Why use raster?


data are acquired in that form from remote sensing, photogrammetry or scanning

is a common way of structuring digital elevation data

raster assumes no prior knowledge of the phenomenon, sampling is done uniformly
knowledge of variability would allow us to sample more heavily in areas of high variability (rugged terrain) and less heavily in smooth terrain

data are often converted to raster as a common format for data interchange

for merging with remote sensing images or DEMs

raster algorithms are often simpler and faster
e.g. buffer zone generation is simpler in raster

raster may be appropriate if the solution requires uniform resolution, e.g. in finding optimum routes for linear features such as power lines, or in inferring the locations of stream networks from DEMs

Objectives

there are many options for storing raster data (many data structures)

some are more economical than others in use of storage

some are more efficient in access and processing speed

this unit looks at some of the options and issues involved

many of these issues were introduced in Unit 4, they are expanded upon here


B. STORAGE OPTIONS FOR RASTER DATA

by convention, raster data is normally stored row by row from the top left

this is the European/North American reading order
is also the order of scan of a TV image

example

the image A A A A A B B B A A B B A A A B

would be stored in 16 memory positions, one for each pixel, in the sequence:
A A A A A B B B A A B B A A A B



What if there is more than one layer?

two options:
1. store the layers separately

this is the normal practice

2. store all information for each pixel together
this requires extra space to be allocated initially within each pixel''s storage location for layers which might be created later during analysis
this is usually difficult to anticipate

(note in remote sensing, these concepts are called "band sequential" and "band interleaved by pixel" respectively)

What do raster systems store in each pixel?

some allow only an integer, in a fixed range, e.g. -127 to +127 (1 byte per pixel) or -32767 to +32767 (2 bytes per pixel)

some allow integers, real (decimal) numbers and mixed alphabetic letters and numbers in each pixel
in this case it helps if the system keeps track of what type of data is stored in each layer and stops the user doing wrong types of analysis on the data
example:
vegetation data is recorded as a class (A thru G) in each pixel
elevation data is recorded as a decimal number (e.g. 100.3 m)
the system should not allow the user to add the pixel values from the two layers (A + 100.3) or perform any other kind of arithmetic operation on the vegetation data

Raster/Vector combinations

many raster-based systems allow vector input Example:
a polygon, defined by its vertices, is input
convert this to a raster
e.g. assign 1 to all pixels inside the polygon, 0 to all outside

some forms of data are really hybrids of raster and vector:
Freeman chain code has finite resolution based on pixels (raster-like) but defines lines and the boundaries of objects (vector-like)
a raster can be used to define objects at fixed resolution if every pixel is given an object number instead of a value
the object numbers are pointers to an attribute table:
Raster ObjectAttributes

23 23 23 24 23 A 100.0 23 23 24 24 24 B 101.1 23 23 24 24 23 23 23 24

this gives us an object with its attributes, plus a list of pixels associated with the object instead of the object''s coordinates

in this sense, a raster is a finite resolution geometry rather than an alternative way of structuring spatial data


C. RUN ENCODING

geographical data tends to be "spatially autocorrelated", meaning that objects which are close to each other tend to have similar attributes
Tobler expressed it this way: "All things are related, but nearby things are more related than distant things"

because of this principle, we expect neighboring pixels to have similar values
so instead of repeating pixel values, we can code the raster as pairs of numbers - (run length, value)
e.g. instead of 16 pixel values in original raster matrix, we have:
4A 1A 3B 2A 2B 3A 1B


produces 7 integer/value pairs to be stored

if a run is not required to break at the end of each line we can compress this further:
5A 3B 2A 2B 3A 1B = 6 pairs


however, it helps to limit the possible size of the run so that we can use less space to store the run length, as the amount of space allocated must be sufficient for the maximum run length

Problems

layers now have different lengths depending on the amount of compression (lengths of runs)

storing all layers together for each pixel now makes no sense

run encoding would be little use for DEM data or any other type of data where neighboring pixels almost always have different values

RASTER STORAGE
D. SCAN ORDER
1. Row order


described already

are there better ways of ordering the raster than row by row from the top left?
other orders may produce greater compression

overhead/handout - Standard scan orders

2. Row prime order (Boustrophedon)

suppose we reverse every other row:
diagram






this has the charming name boustrophedon from the Greek for "how an oxen plows a field"

avoids a long jump at the end of each row, so perhaps the raster would produce fewer runs and thus greater compression

this order is used in the Public Land Survey System: the sections in each township are numbered in this way

one the original raster (page 35-3) it results in:
4A 3B 3A 3B 3A = 5 runs



3. Morton order

overhead/handout - (cont) Standard scan orders

Morton order is the basis of many efforts to reduce database volume
named for Guy Morton who devised it as a way of ordering data in the Canada Geographic Information System
however, this way of ordering or scanning a raster was well known long before Morton
it is associated with the names of several mathematicians and geometers: Hilbert, Peano, and Koch
coincidentally, Morton is the name of the lower left corner county in Kansas

the strategy is to exhaust each area of the map in sequence, whereas row by row order scans from one side to the other
this minimizes the number of large jumps

diagram









this is one of several hierarchical ordering systems
it is built up level by level, repeating the same pattern at each level, as follows
2 3 10 11 14 15 42 43 46 47 58 59 62 63 0 1 8 9 12 13 40 41 44 45 56 57 60 61 2 3 6 7 34 35 38 39 50 51 54 55 0 1 4 5 32 33 36 37 48 49 52 53 10 11 14 15 26 27 30 31 8 9 12 13 24 25 28 29

2 3 6 7 18 19 22 23 0 1 4 5 16 17 20 21



it is only valid for square arrays where the numbers of rows and columns are powers of 2
e.g. 2x2, 4x4, 8x8, 16x16, 32x32, 64x64, etc.

how does it do on our 4x4 array?
5A 3B 1A 1B 2A 2B 2A = 7 runs


which is as long as row by row compression

4. Peano scan (also Pi-Order or Hilbert)

the Peano scan or Pi-order is like boustrophedon in always moving to a neighboring pixel
diagram









the name Peano is associated with both this and Morton orders, though more often with this

it is also hierarchical, but the pattern appears in different orientations at different levels

Comparing scan orders

it is useful to look at a comparison of the compression rates obtained by the different orders (see Goodchild and Grandfield, 1983)
overhead/handout - Scan order comparison


the comparison shown used a number of 64x64 pixel images:
all pixels are either B or W
vary from:
images with large patches of black or white to
very chaotic images in which each pixel is independently black or white
in the table, H indicates the amount of chaos in the image:
the higher H, the larger the patches
low H corresponds to chaotic images
each line in the table gives the numbers of runs required to code the same black and white image
the values in the last line were calculated theoretically

this table shows that scan order makes little difference to data compression
the number of runs is not greatly affected by scan order for a given image

however, the orders which move to adjacent pixels (boustrophedon and Peano) tend to do better than row and Morton

E. DECODING SCAN ORDERS

since Morton and Peano orders are useful but complex, two types of questions arise when they are used:
1. What are the row and column numbers for a given pixel?

2. What is the position in the scan order for a given row and column number?


Method

start by numbering the rows and columns from 0 up:
3 10 11 14 15 2 8 9 12 13 1 2 3 6 7 0 0 1 4 5

0 1 2 3

- row 2, column 3 is position 13 in the Morton sequence

1. How to go from row 2, column 3 to Morton sequence? a. convert row and column numbers to binary representations:

16s 8s 4s 2s 1s 1 0 row 2 1 1 column 3

b. interleave the bits, alternating row and column bits (called bit interleaving):

1 1 0 1 row col row col

c. evaluate this sequence of bits as a binary number:

Answer: 8 + 4 + 1 = 13


so to get the Morton position, interleave the bits of the row and column number

2. How to find row and column number from Morton position 9? a. convert the position number to a binary number
16s 8s 4s 2s 1s 1 0 0 1 (8 + 1 = 9) row col row col

b. separate the bits:

1 0 row = 2 0 1 col = 1


Generalization

can express the row and column number to any base, not just base 2 (binary), and including mixtures of bases

example: row 6, column 15, using base 4 instead of base 2
64s 16s 4s 1s 1 2 row 6 = 1x4 + 2x1 3 3 col 15 = 3x4 + 3x1

interleaving:

1 3 2 3 1x64 + 3x16 + 2x4 + 3x1 = 123

answer: row 6 column 15 is position 123


what does this sequence look like? overhead - Base 4 x base 4 scan order
arrays of 4 rows by 4 columns, scanned row by row, then repeated at higher levels

can generate a wide range of possible scan patterns by interleaving digits of different bases
the principle of digit interleaving is very widespread, and is built into the PLSS and the GEOLOC grid, as well as numerous systems for map indexing and georeferencing


REFERENCES
Abel, D.J., 1986. "Bit interleaved keys as the basis for spatial access in a front-end spatial database management system," Proceedings, Tesseral Workshop #2, Reading, England.

Franklin, W., 1979. "Evaluation of algorithms to display vector plots on raster devices," Computer Graphics and Image Processing 11:377-397.

Goodchild, M.F., and A.W. Grandfield, 1983. "Optimizing raster storage: an examination of four alternatives," Proceedings, AutoCarto 6, Ottawa, 1:400-7.

Peuquet, D., 1981. "An examination of techniques for reformatting digital cartographic data, Part II, The vector-to-raster Process," Cartographica 18(3):21-33.

DISCUSSION AND EXAM QUESTIONS

1. What systems are used for topographic map indexing in the US and other countries? Discuss the use of digit interleaving in this context, using different national examples.

2. The term metadata is used to refer to information carried with a map layer, such as its accuracy, numbers of rows and columns, type of data stored for each pixel, etc. Discuss the importance of metadata in limiting the operations which a user is allowed to perform on a map layer.

3. Raster and vector have developed as two partially independent traditions in GIS. Summarize the dimensions of the raster-vector debate, particularly in the importance of spatial objects in the two systems.

4. All of the scan orders discussed in this unit visit each pixel exactly once. Discuss the potential advantages, if any, of scan orders which visit certain pixels more than once. Give examples.

5. Any raster GIS places restrictions on what can be stored in each pixel of a map and what operations can be carried out. Discuss this point as it applies to IDRISI, and any other raster GIS to which you may have access. Will it let you store an alphabetic value such as A in a pixel and then allow you to carry out arithmetic operations on this layer?

6. Find out what raster storage option (row by row, run encoded, pixel by pixel, layer by layer, etc.) is used by IDRISI and any other raster GIS (GRASS, MAP, etc.) to which you have access.

HIERARCHICAL DATA STRUCTURES
A. INTRODUCTION

B. INDEXING PIXELS
Procedure
Decoding locations

C. THE QUADTREE
Coding quadtrees
Accessing data through a quadtree
Comparison of different data structures

D. VARIANTS OF QUADTREES

E. ADVANTAGES OF HIERARCHICAL DATA STRUCTURES

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES



UNIT 36 - HIERARCHICAL DATA STRUCTURES


A. INTRODUCTION


different scan orders produce only small differences in compression
the major reason for interest in Morton and other hierarchical scan orders is for faster data access

the amount of information shown on a map varies enormously from area to area, depending on the local variability
it would make sense then to use rasters of different sizes depending on the density of information
large cells in smooth or unvarying areas, small cells in rugged or rapidly varying areas
unfortunately unequal-sized squares won''t fit together ("tile the plane") except under unusual circumstances
one such circumstance is when small squares nest within large ones

there are, however, some methods for compressing raster data that do allow for varying information densities


B. INDEXING PIXELS

overhead - Raster to quadtree I and II
handout - Raster to quadtree


consider the 16 by 16 array in which just one cell is different
notation: row and column numbering starts at 0
thus the odd cell is at row 4, column 7

Procedure

begin by dividing the array into four 8x8 quadrants, and numbering them 0, 1, 2 and 3 as in the Morton order
quads 1, 2 and 3 are homogeneous (all A)
quad 0 is not homogeneous, so we divide only it into four 4x4 quads
these are numbered 00, 01, 02 and 03 because they are partitions of the 8x8 quad 0
of these, 00, 01 and 02 are homogeneous, but 03 is divided again into 030, 031, 032 and 033
now only 031 is not homogeneous, so it is divided again into 0310, 0311, 0312 and 0313

what we have done is to recursively subdivide using a rule of 4 until either:

a square is homogeneous or
we reach the highest level of resolution (the pixel size)

this allows for discretely adaptable resolution where each resolution step is fixed

this concept is related to the use of Morton order for run encoding
if we had coded the raster using Morton order, each homogeneous square would have been a run
8x8 squares are runs of 64 in Morton order, 4x4 are runs of 16, etc
the run encoded Morton order would have been:
16A 16A 16A 4A 1A 1B 1A 1A 4A 4A 64A 64A 64A


if we allow runs to continue between blocks we could reduce this to:
53A 1B 202A


i.e. a homogeneous block of 2m by 2m pixels is equivalent to a Morton run of 22m pixels

Decoding locations

the conversion to row and column is the same as for decoding Morton numbers except that in this case the code is in base 4
in the example the lone B pixel is assigned code 0311

1. convert the code to base 2
hint: every base 4 digit converts to a pair of base 2 digits
thus 0311 becomes 00110101

2. separate the bits to get:
row 0100 = 4
column 0111 = 7

so the numbering system is just the Morton numbering of blocks, expressed in base 4

however, sequence and data compression are not the most useful aspects of this concept


C. THE QUADTREE

can express this sequencing as a tree
the top is the entire array
at each level there is a four-way branching
each branch terminates at a homogeneous block

the term quadtree is used because it is based on a rule of 4
overhead - Quadtree


each of the terminal branches in the tree (the ones having values) is known as a leaf
in this case there are 13 leafs or homogeneous square blocks

Coding quadtrees

to store this tree in memory, need to decide what to store in each memory location
there are many ways of storing quadtrees, but they all share the same basic ideas

one way is to store in each memory location EITHER: 1. the value of the block (e.g. A or B), or or 2. a pointer to the first of the four "daughter" blocks at the next level down

all four daughter blocks of any parent always occur together
overhead - Coding quadtrees


thus, the quadtree might be stored in memory as:
Position: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Contents: 2 6 A A A A A A 10 A 14 A A A B A A

(level):0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4


the content of position 1 is a pointer indicating that the map is subdivided into four blocks whose contents can be found starting at position 2
position 2 indicates that the four parts of the 0 block can be found beginning at position 6
positions 3, 4 and 5 indicate that the other three level 1 blocks are all A and are not further subdivided

Accessing data through a quadtree

consider two ways in which this quadtree may be accessed: 1. find all parts of the map with a given value 2. determine the contents of a given pixel

notation: if the array has 2n by 2n pixels
there are n possible levels in the tree, or n+1 if we count the top level (level 0)
use m for the number of leafs

1. to find the parts of a map with a given value we must examine every leaf to see if its value matches the one required
this requires m steps as there are m leafs

2. to find the contents of a given pixel, start at the top of the tree
if the entire map is homogeneous, stop as the contents of the pixel are known already
if not, follow the branch containing the pixel
do know which branch to follow:
take the row and column numbers, write them in binary, interleave the bits, and convert to base 4

e.g. row 4, column 7 converts to 0311
at each level, use the appropriate digit to determine which branch to follow
e.g. for 0311, at level 0 follow branch 0, at level 1 follow branch 3, etc.
in the worst case, may have to go to level n to find the contents of the pixel, so the number of steps will be n

Comparison of different data structures

summary of the work needed to perform the two types of queries:
OptionFind parts with given valueFind contents of pixel

Quadtree m n

Row by row 22n (a) 1 (b)

Row by row run encoded m (c) m (d)

Morton run encoded m (c) m (d)


notes: (a) must examine every pixel in the array (b) can calculate the position of every pixel and access it directly (c) the number of runs will be approximately the same as the number of leafs
although the orders are different, we decided earlier that order made little difference to the number of runs (d) each run must be examined to see if it contains the pixel

thus, quadtree structures offer definite advantages over other systems for queries

HIERARCHICAL DATA STRUCTURES
D. VARIANTS OF QUADTREES

the octree (or octtree) is a three-dimensional version of the quadtree, based on a rule of 8
the cube is divided recursively into eight pieces
octrees are useful for 3D data, particularly in mining and geology, and in medical imaging
see Unit 42 for more on this topic

global data presents significant problems

we might use a projection such as the Mercator, and represent the data as a raster on this projection
the area and shape of the area represented by each pixel would be significantly distorted, particularly near the poles
the relationships between neighboring pixels would be distorted as well
in reality all of the pixels in the top and bottom rows are neighbors of each other across the poles

diagram








these problems create serious distortions in models based on such data

a more suitable approach devised by Geoffrey Dutton is as follows: overhead/handout Global tesselation
project the globe onto an octahedron, consisting of eight triangles
the vertices of the octahedron are at the poles, and 90 degrees apart around the equator
number these from 0 through 7
each triangle is recursively divided using a rule of 4 into 4 smaller triangles, by connecting the midpoints of the edges
number these 0 (central triangle), 1 (vertically above or below), 2 (diagonally to the left) and 3 (diagonally to the right)
level 20 in this scheme has a resolution (triangle size) of about 1 m on the earth''s surface
its address requires one base 8 digit and 20 base 4 digits, or 43 (binary) bits


E. ADVANTAGES OF HIERARCHICAL DATA STRUCTURES

both coordinates are in a single address
a single number indicates a 2D location
every square meter on the earth''s surface has a consistent 21-digit address

resolution is automatically known from the length of the address
in the global scheme described above, a 21-digit address has 1 m resolution, for 1 km resolution we need only a 13-digit address

in comparison, a lat/long address needs two numbers, and it is not always easy to tell resolution from the way the numbers are presented


REFERENCES
Gargantini, I., 1982. "An effective way to represent Quadtrees," Communications of the ACM 25:905-910.

Mark, D.M., J.P. Lauzon and J.A. Cebrian, 1989. "A review of quad-tree based strategies for interfacing coverage data with digital models in grid form," International Journal of Geographical Information Systems 3(1):3-14.

Samet, Hanan, 1984. "The quadtree and related hierarchical data structures," ACM Computer Surveys 16(2):187-260.

Samet, Hanan, 1989. The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, MA. Contains an excellent review of quadtrees.

Samet, Hanan, 1989. Applications of Spatial Data Structures. Addison-Wesley, Reading, MA. Contains an extensive review of applications of quadtrees.

Shaffer, C.A., H. Samet and R.C. Nelson, 1990. "QUILT: a geographic information system based on quadtrees," International Journal of Geographical Information Systems 4(2):103-32.

Waugh, Thomas C., 1986. "A response to recent papers and articles on the use of quadtrees for geographic information systems," Proceedings, Second IGU Symposium, Seattle, pp. 33-37. An interesting critique of quadtrees.


DISCUSSION AND EXAM QUESTIONS

1. "Hierarchical data structures are one of the few genuinely new concepts to come out of GIS research". Discuss.

2. Discuss the arguments presented for and against quadtrees in Waugh, 1986.

3 Summarize the arguments for and against the octahedral decomposition of the globe presented in the unit, as a means of representing and analyzing global databases.

4. Modify the quadtree concept so that it is suitable as a method of storing digital elevation data. What are the advantages and disadvantages of this approach over a simple raster?

5. What would be the advantages and disadvantages of using Dutton''s global tesselation as the primary means of discrete georeferencing, rather than street address?

6. Three-dimensional spatial databases are of great interest in geology, geophysics, subsurface hydrology, atmospheric science, oceanography and the mining and oil and gas industries. With reference to one or more of these, discuss the suitability of the octtree as a data model.

QUADTREE ALGORITHMS AND SPATIAL INDEXES
A. INTRODUCTION
Definition

B. AREA ALGORITHM
Procedure
Example

C. OVERLAY ALGORITHM
Procedure
Result

D. ADJACENCY ALGORITHM
Problem
Definition
Two cases
Tesseral Arithmetic
Determining Adjacency
Length of common boundary

E. AREA OF A CONTIGUOUS PATCH ALGORITHM
Problem
Method
Algorithm
Results

F. QUADTREE INDEXES
Indexing using quadtrees
Setting up the index
Using the index
Generalizations

G. R-TREE INDEXES
Method
Problem

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES

This unit is very long and deals with more advanced algorithms. Depending on the abilities and interests of your students, you may want to omit the third and fourth algorithms included or consider providing this as extra handouts. Advanced students may be pleased to have the opportunity to examine the more subtle, complex nature of these advanced algorithms. The later section on indexes does not depend on material covered in the earlier sections.

UNIT 37 - QUADTREE ALGORITHMS AND SPATIAL INDEXES


A. INTRODUCTION


the previous unit defined the basic idea of a quadtree

this unit examines how quadtrees are used in several simple processes, including:
measurement of area
overlay
finding adjacent leafs
measuring the area of contiguous patches

in addition, this unit will look at how quadtrees can be used to provide indexes for faster access to vector-coded objects

finally, alternative forms of spatial indexing will be reviewed

Definition

to traverse a quadtree:
begin by moving down the leftmost branch to the first leaf
after processing each leaf in this branch, move back up to the previous branching point, and turn right
this will either lead down to another leaf, or back to a previous branching point

diagram








overhead - First map

several of the following examples use this simple raster and its associated quadtree


B. AREA ALGORITHM
Procedure


to measure the area of A on the map:
traverse the tree and add those leafs coded A, weighted by the area at the level of the leaf

Example

in the example quadtree, elements at level 0 have area 16, at level 1 - area 4, at level 2 - area 1

thus, area of A is:
1 (leaf 00) + 1(leaf 02) + 1 (leaf 03) + 4 (leaf 2) + 1 (leaf 32)

= 8 units



C. OVERLAY ALGORITHM

overhead - Second map
note: this overhead can be physically overlayed on First map

Procedure

to overlay the two maps:
traverse the trees simultaneously, following all branches which exist in either tree
where one tree lacks branches (has a leaf where the other tree has branches), assign the value of the associated leaf to each of the branches
e.g. node 3 is branched on map 1, not on map 2
the leafs derived from this node (30, 31, 32 and 33) have values B, B, A and B on map 1, all 2 on map 2

the new tree has the attributes of both of the maps, e.g. A1, B2

Result

overhead - First map + Second map


D. ADJACENCY ALGORITHM
Problem


find if two leafs (e.g. 03 and 2) are adjacent
Corollary: find the leafs adjacent to a given leaf (e.g. 03)


note that in arc based systems adjacencies are coded in the data structure (R and L polygons), so this operation is simpler with vector based systems

Definition

here adjacent means sharing a common edge, not just a common point
diagram










Two cases

leaf codes are: 1. same length (same size blocks, e.g. 01 and 02) or 2. one is longer than the other (different size blocks, e.g. 03 and 2)

solving this problem requires the use of: 1. conversion from base 4 to binary and back
base 4 because of the "rule of 4" used in constructing quadtrees 2. bit interleaving 3. a new concept called Tesseral Arithmetic

Tesseral Arithmetic

tesseral arithmetic is an alternate arithmetic useful for working with the peculiarities of quadtree addressing

to add binary numbers normally, a "carry" works to the position to the left
e.g. adding 1 to 0001 gives 0010
this is the same as decimal arithmetic except that carries occur when the total reaches 2 instead of 10

in tesseral arithmetic, a "carry" works two positions to the left
e.g. adding 1 to 0001 gives 0100

the reverse happens on subtraction
1000 less 1 is 0010 not 0111, as the subtraction affects only the alternate bits

in other words, if we number the bits from the left starting at 1
adding or subtracting 1 affects only the even- numbered bits
adding or subtracting 2 (binary 10) affects only the odd-numbered bits

Determining Adjacency

handout - Determining adjacency
1. same size blocks:


two leafs are adjacent if their binary representations differ by binary 1 or 10 (decimal 1 or 2) in tesseral arithmetic
example: 01 and 03 are adjacent because 0001 and 0011 differ by binary 10, or decimal 2
example: 033 and 211 are adjacent because in tesseral arithmetic
001111 + 10 = 100101, or 100101 - 10 = 001111



2. different size blocks:
taking the longer of the two codes:
convert it from base 4 to binary
tesseral-add and -subtract 01 and 10 to create four new codes
reject any cases where subtracting was not possible (a "negative" code would have resulted, or a "carry" would have been necessary to the left of the leftmost digit)
discard the excess rightmost digits in the resulting transformed longer codes
convert back to base 4 to get the leaf
the two blocks are adjacent if any of the transformed and truncated codes are equal to the shorter code

example: Are 02 and 2 adjacent?
convert 02 to binary = 0010
0010 + 1 = 0011 0010 + 10 = 1000 0010 - 1 (impossible) 0010 - 10 = 0000


truncating gives 00 and 10
these are equal to 0 and 2 in base 4
therefore, 02 and 2 are adjacent (also 02 and 0 are adjacent)

example: Are 033 and 2 adjacent?
convert 033 to binary = 001111
001111 + 1 = 011010 001111 + 10 = 100101 001111 - 1 = 001110 001111 - 10 = 001101


truncating to two digits gives 01, 10 and 00
these are equal to 1,2 and 0 in base 4
therefore, 033 and 2 are adjacent

example: Find leafs adjacent to 03 in the first map above

method: find the codes of adjacent blocks of the same size, then work down the tree to find the appropriate leaf
(note: can only find equal or shorter codes - equal or bigger leaf blocks)
0011 + 1 = 0110 = 12 : leaf 1 0011 + 10 = 1001 = 21 : leaf 2 0011 - 1 = 0010 = 02 : leaf 02 0011 - 10 = 0001 = 01 : leaf 01


Length of common boundary

the length of common boundary between the two blocks is determined by the level of the longer code
can use this to construct an algorithm to determine the perimeter of a patch
e.g. the length of the A/B boundary in the first example map

diagram

QUADTREE ALGORITHMS AND SPATIAL INDEXES
E. AREA OF A CONTIGUOUS PATCH ALGORITHM
Problem


find the area of a contiguous patch of the same value, e.g. all A
Corollary: How many separate patches of A are there?


note: this is a general method which can be used in both quadtree and vector data structures
i.e. find contiguous sets of quadtree blocks or irregularly shaped polygons, given that adjacencies are known or can be determined

the following example uses the original raster map
note that there are only two contiguous patches; the areas of A and B form only one patch each

Method

handout - Area of a contiguous patch (2 pages)

create a list of leafs, with their associated codes, by traversing the tree

allow space for a "pointer" for each leaf, and give it an initial value of 0 (see handout)

Algorithm

for each leaf i:
find all adjacent leafs j with equal or shorter length codes (4 maximum)
if the adjacent leaf j has the same value, determine which of i and j has the higher (larger value) position in the list, and set its pointer to the lower position
(note: if a pointer has already been changed, it may be changed again or left, the result is the same)

this produces the final pointer list


Results

1. the number of contiguous patches will be equal to the number of zeros
in the example, two pointers are zero, indicating two contiguous patches

2. the value of each patch can be obtained by looking up the values of leafs with 0 pointers
in the example, leafs 00 and 01 have 0 pointers
these have the values A and B respectively

3. to find the area of each patch, select one of the zeros and sum its area plus the areas of any leafs which point to it directly or indirectly
the component leafs of each patch can be found by starting at with a leaf at the end (or beginning) of the list and following the pointers until a 0 is found
e.g. leaf at position 10 (code 33) points to 8, which points to 7, which points to 5, which points to 2, which has a zero pointer
therefore, leaf position 10 (code 33) is part of the same patch as leaf 2 (code 01) and has the value B
the areas can be found by summing the leaf areas
for the example:
A leafs: 00 02 03 2 32 A positions: 1 3 4 6 9 Area of A: 1 + 1 + 1 + 4 + 1 = 8

B leafs: 01 1 30 31 33 B positions: 2 5 7 8 10 Area of B: 1 + 4 + 1 + 1 + 1 = 8


F. QUADTREE INDEXES
Indexing using quadtrees


indexes are used in vector systems to get fast access to the objects in a particular area of a map
very useful in searching for potentially overlapping or intersecting objects
therefore, they are an essential part of a polygon overlay operation

in Unit 34, looked at the usefulness of a simple sort of objects on one axis (e.g. x) in the moving band operation for intersection calculations
now will look at methods which can be thought of as sorting on both axes simultaneously

these use 2D coding systems and a simple one dimension sort

Setting up the index

overhead - Quadtree indexes

steps are:
1. for each object (point, line, area) in the database, find the smallest quadtree leaf which encloses the object

some large objects will have to be classified as NULL, as they span more than one of the four leafs in the first branching (0, 1, 2 and 3)
other smaller objects may be enclosed within a small leaf, e.g. 031

2. sort or index the objects by the enclosing quadtree leafs

Using the index

to find all objects which might intersect an area, line or point of interest
find the quadtree leaf enclosing the object of interest
starting at this point follow up the quadtree through all branching points that contain the original cell and down the quadtree to all branching points and leafs below the cell

example: the area of interest is enclosed in leaf 31 of the original example quadtree (overhead - First map)

the objects which may intersect the area of interest are those in leaf 31 and all leafs above it
thus, these are 3 and the null leaf
objects in other (remote) leafs cannot intersect the area of interest, so need not be checked

example: the area of interest is enclosed in leaf 0
the objects which may intersect the area are in leaf 0, the null leaf and all leafs below 0 - 00, 01, 02, 03
there may be other leafs below these as well such as 010, 011, 012, 013, etc

Generalizations

quadtree indexing is most effective for small objects, particularly points
large objects tend to require large enclosing leafs even though they may not fill much of the space (i.e. highway corridors)
these objects will always need to be checked for intersection
it may pay to subdivide objects so that the pieces fall entirely within smaller leafs

indexing in this way is intuitively more efficient than indexing by x or y alone since the quadtree index is effectively two-dimensional

the divisions at each branching need not be equal in size
it may pay to have some blocks of smaller area and some of larger area, rather than four equal squares at each branching
however, for general efficiency the blocks should be rectangular


G. R-TREE INDEXES

R-tree indexes are a response to the problem of indexing large areas
R stands for "range", a concept similar to MER

Method

overhead - R-tree

find two, possibly overlapping, rectangles (aligned with x and y axes) such that:
as many objects as possible are wholly within one or the other rectangle
there are roughly equal numbers of objects wholly enclosed in each rectangle

the overlap between the rectangles is minimum

indexing is determined by the rectangle in which the object is contained
objects which are wholly within a rectangle are associated with that rectangle
objects which are not wholly within either of the two rectangles are associated with the undivided map

apply the procedure recursively, finding two new smaller rectangles within each existing rectangle

this creates a tree structure similar to the quadtree
every object is associated with some node in the tree

to find the objects which might intersect a given area of interest:
find the smallest rectangle used in the indexing procedure which wholly encloses the area of interest
the objects are those associated with this rectangle and all nodes above and below it in the tree

Problem

although benchmark tests have shown that R-trees are generally more efficient than quadtrees and simple 1-D sorts, they are computationally intensive to construct


REFERENCES
Buchmann, A., O. Gunther, T.R. Smith and Y.-F. Wang. Design and Implementation of Large Spatial Databases, Unit Notes in Computer Science 409, Springer Verlag, Berlin. Contains a collection of papers on spatial data indexing.

Guttman, A, 1984. "R-trees: A dynamic index structure for spatial searching," ACM SIGMOD, pp. 47-57.

Mark, D.M., and J.P. Lauzon, 1984. "Linear quadtrees for Geographic Information Systems," Proceedings, International Symposium on Spatial Data Handling, Zurich, 2:412-430.

Noronha, V., 1988. "A survey of Hierarchical Partitioning Methods for Vector Images," Proceedings, Third International Symposium on Spatial Data Handling, Sydney, Australia, pp. 185-199.

Oosterom, P. van, 1990. "A modified binary space partitioning tree for geographic information systems," International Journal of Geographical Information Systems 4(2):133-46.

The two Samet books listed as references for Unit 36 contain useful discussions of quadtree algorithms.


DISCUSSION AND EXAM QUESTIONS

1. Compare the formal methods of indexing (quadtree, R-tree, 1-D sort) with informal methods in common use (e.g. continents, nation-states, major civil divisions, ZIP codes, etc.).

2. How would you design a study to compare the effectiveness of different indexing schemes? What data would you use? What measures would you compare?

3. Current vector-based systems use a wide variety of indexing schemes. Why is there no consensus as to the best? What methods are best for what purposes and area of application?

4. Devise a means of measuring the Manhattan distance between two quadtree blocks (assume the codes have the same length).

DIGITAL ELEVATION MODELS
A. INTRODUCTION
What is a Digital Elevation Model?
Creation of DEMs
Uses of DEMs
Hydrologic functions on DEMs

B. ESTIMATING ELEVATION

C. ESTIMATING SLOPE AND ASPECT

D. DETERMINING DRAINAGE NETWORKS
Determining the watershed
Determining the network
Characteristics of automatically derived networks

REFERENCES

DISCUSSION AND EXAMINATION QUESTIONS

NOTES



UNIT 38 - DIGITAL ELEVATION MODELS

Compiled with assistance from Brian Klinkenberg, University of British Columbia


A. INTRODUCTION


surfaces such as the surface of the earth, are continuous phenomena rather than discrete objects
to fully model the surface, would need an infinite amount of points

there are various ways of representing continuous surfaces in digital form using a finite amount of storage
Unit 11 introduces spatial database models that are used for continuous surfaces

this unit will look at digital elevation models as one way of representing surfaces and will examine some important algorithms based on DEMs

What is a Digital Elevation Model?

the term digital elevation model or DEM is frequently used to refer to any digital representation of a topographic surface
however, most often it is used to refer specifically to a raster or regular grid of spot heights
this is the definition that is used here
digital terrain model or DTM may actually be a more generic term for any digital representation of a topographic surface, but it is not so widely used

the DEM is the simplest form of digital representation of topography and the most common
a variety of DEMs are available, including coverage of much of the US from the US Geological Survey

the resolution, or the distance between adjacent grid points, is a critical parameter
the best resolution commonly available is 30 m, with a vertical resolution of 1 m
coverages of the entire globe, including the ocean floor, can be obtained at various resolutions

Creation of DEMs

several different methods have been used to create DEM series like those from the USGS
see USGS (1987) for more details on the following

conversion of printed contour lines
existing plates used for printing maps are scanned

the resulting raster is vectorized and edited
contours are "tagged" with elevations
additional elevation data are created from the hydrography layer
i.e. shorelines provide additional contours
finally, an algorithm is used to interpolate elevations at every grid point from the contour data

by photogrammetry
this can be done manually or automatically:
manually, an operator looks at a pair of stereophotos through a stereoplotter and must move two dots together until they appear to be one lying just at the surface of the ground
automatically, an instrument calculates the parallax displacement of a large number of points
e.g. for USGS 7.5 minute quadrangles, the Gestalt Photo Mapper II correlates 500,000 points
extraction of elevation from photographs is confused by flat areas, especially lakes, and wherever the ground surface is obscured (buildings, trees)

there are two techniques for choosing sample points when using manual photogrammetry: 1. profiling
the photo is scanned in rows, alternately left to right and right to left, to create profiles
a regular grid is formed by resampling the points created in this process
because the process tends to underestimate elevations on uphill parts of each profile and overestimate on downhill parts, the resulting DEMs show a characteristic "herringbone" effect when contoured 2. contour following
contour lines are extracted directly from stereopairs during compilation of standard USGS maps
contour data are processed into profile lines and a regular grid is interpolated using the same algorithms used for manual profiling data

DEMs from each source display characteristic error artifacts
e.g. effects of mis-tagged contours in the products of scanned contour lines

Uses of DEMs

determining attributes of terrain, such as elevation at any point, slope and aspect

finding features on the terrain, such as drainage basins and watersheds, drainage networks and channels, peaks and pits and other landforms

modeling of hydrologic functions, energy flux and forest fires

Hydrologic functions on DEMs

the principal components of a drainage basin are its topographic form and the topologic structure of its drainage network
the quantification of these components is tedious and time consuming when accomplished manually
the automated determination of these components is an ideal application of GIS technology

watersheds comprise one method of completely partitioning space and many environmental phenomena can be related to watersheds

determination of the drainage network and the associated drainage divides provides an important first step in the creation of a hydrologic information system

registration and segmentation of digital imagery can be enhanced if use is made of the drainage basin information

knowledge of the drainage divides and of the drainage network can be used to provide better estimates of slopes and aspects (e.g., slopes should break at divides and at channels)

in this unit we look at a number of simple algorithms for DEMs


B. ESTIMATING ELEVATION

to estimate the elevation of some point, we need to know first whether the point of interest is exactly at a point in the raster, or in between

in the first case, the elevation can be taken directly from the database

in the second case, we need to develop some method of interpolation, or estimation of elevation
can use the elevation of the nearest point, but this leads to sharp changes of elevation halfway between points











instead, the normal approach is to fit a plane to the nearby raster points, and use it to estimate elevation at any point










the plane passing through these points is represented as:
z = a + bx + cy



since a plane will generally not pass exactly through all the points
the plane which minimizes the sum of squared elevation differences between the plane and the data at each of the nearby points is often used

can determine the equation of the plane as follows:
use the four nearest grid points (known as the "neighborhood" of the point or the "2x2 window" around the point)
define an origin in the middle of the 2x2 window, and give the four neighboring points the coordinates (-1,-1), (-1,1), (1,-1) and (1,1)










since the four points are evenly spaced, the coefficients in the equation can be calculated from the following:
a = (z1 + z2 + z3 + z4)/4

b = (-z1 + z2 - z3 + z4)/4

c = (-z1 - z2 + z3 + z4)/4


note: the coefficients can be solved using larger neighborhoods, e.g. the nearest 9 points (see handout)
having determined the coefficients, the elevation (z) can be determined from:
z = a + bx + cy

DIGITAL ELEVATION MODELS
C. ESTIMATING SLOPE AND ASPECT

slope and aspect can be calculated from the fitted plane
to estimate these at a raster point, a 3x3 window centered on the point is usually used

slope is calculated from:
/ (b2 + c2)


aspect is calculated from:
tan-1 c/b


normally a "slope map" or "aspect map" will display the attribute values generalized over areas (regions) instead of at points, such that within each area, all slopes fall into a certain range (e.g. 10-15%) or all aspects fall into a certain quadrant (e.g. NW)
to generate such a map, slope or aspect is determined at each raster point, and then these values are aggregated into polygons based on a set of predefined ranges
this way of representing slope or aspect is not as accurate as the original raster form

since both slope and aspect are derivable from elevation by a simple process, is there any need to store them as separate layers?


D. DETERMINING DRAINAGE NETWORKS

a raster DEM contains sufficient information to determine general patterns of drainage and watersheds

think of each raster point as the center of a square cell
the direction of flow of water out of this cell will be determined by the elevations of surrounding cells

algorithms to determine the flow direction generally use one of the following cases: 1. assume only 4 possible directions of flow (up, down, left, right - the Rook''s move directions from chess); or 2. assume 8 possible directions (the Queen''s move directions)

in both cases, number the move directions clockwise from up











water is assumed to flow from each cell to the lowest of its neighbors
if no neighbor is lower, the cell is a "pit" and gets code "0"

combinations which would be hydrologically impossible, such as a 4 to the left of a 6 in the 8 move case, are logically impossible in this scheme

Determining the watershed

a watershed is defined here as an attribute of each point on the network which identifies the region upstream of that point

to find a watershed
begin at the specified cell and label all cells which drain to it, then all which drain to those, etc. until the upstream limits of the basin are defined
the watershed is then the polygon formed by the labeled cells

Determining the network

to draw the drainage network, connect the moves with arrows
a zero on the edge of the array is interpreted as a channel which flows off the area

since in natural systems, small quantities of water generally flow overland, not in channels, we may want to accumulate water as it flows downstream through the cells so that channels begin only when a threshold volume is reached

accumulation of volume proceeds as follows:
start by setting each cell to zero
then beginning at each cell, add one to it and all cells downstream of it, following the directions indicated in the network

to simulate actual stream channels, assume a channel begins only when the accumulated water passing through a cell reaches some critical value
this means that small tributaries in the examples above will be deleted

in the example, channels start only when the flow reaches a volume of 2

the networks found by this process can be thought of as estimates of real channel networks
real networks consist of junctions or forks, links, and sources, and all of these can be identified on the simulated networks

Characteristics of automatically derived networks

how do networks obtained from DEMs differ from real ones?

real streams sometimes branch downstream
but this is impossible using this method, the simulated networks cannot bifurcate

DEM data contains large numbers of ties of elevation, because the vertical resolution is not very high

using this method, water cannot "flow" from one cell to an adjacent cell with the same elevation
as a result, ties can lead to large numbers of unwanted pits










e.g. in this example, using Rook''s case (4 directions) central cell has no outflow direction
to avoid the problem, allow water to flow between neighbors at the same elevation, determining the direction of flow by evaluating local slope (i.e. over a larger window)










e.g. here the local slope is to the south
alternatively, deal with the problem by regarding the cell as a very small lake, and simulating its overflow (see next point)

pits occur frequently on DEMs, largely as a result of data errors

if a cell has no lower neighbors, it is a pit

the pit can be "flooded" to form a "lake" by the following process:

initiate a lake at the elevation of the cell, with a "shoreline" defined by the cell''s perimeter
find the lowest cell adjacent to the shoreline, raise the lake to that level and expand the shoreline to include it
if one of the neighbors is now lower than the lake, it is the outlet: terminate the process
if the lowest neighbor is part of another lake, merge the lakes and continue

the number of streams joining at a junction, known as the valency of the junction, is almost always 3 in reality, but may be as high as 4 with the 4-move case, and 8 with the 8-move case

junction angles are determined by the cell geometry in the simulation, but in reality are a function of the terrain and the erosion process

in areas of uniform slope the technique generates large numbers of parallel streams










in reality streams tend to wander because of unevenness, and the resulting junctions reduce the density of streams in areas of approximately uniform slope

drainage density is very high in the simulations

in many types of terrain, channels are incised, and often the width of the incised channel is too small to show on the DEM
this problem can be dealt with by searching the DEM for possible channels - see Band (1986) for example

Summary

these methods do well on highly dissected landscapes of high drainage density
they do better at modeling watershed boundaries than drainage channels
therefore, ideally, a spatial database for modeling runoff and other processes related to hydrology should include both the DEM and the stream channels themselves (the "blue lines" of a topographic map)


REFERENCES
Band, L.E., 1986. "Topographic partition of watersheds with digital elevation models," Water Resources Research 22(1):15-24.

Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment, Clarendon, Oxford.

Chapter 3 reviews alternative methods of terrain representation.

Evans, I.S., 1980. "An integrated system for terrain analysis and slope mapping," Zeitschrift fur Geomorphologie 36:274-95.

Marks, D., J. Dozier and J. Frew, 1984. "Automated basin delineation from digital elevation data," Geoprocessing 2:299-311.

O''Callaghan, J.F. and D.M. Mark 1984. "The extraction of drainage networks with lakes," Water Resources Research, 18(2):275-280.

Pfaltz, J.L., 1976. "Surface networks," Geographical Analysis 8:77-93. Discussion of surface-specific points and their relationship to ridge and channel lines.

USGS, 1987. Digital Elevation Models, Data Users Guide 5, US Department of the Interior, USGS, Reston, VA. Describes the creation and data structures of USGS DEMs in detail.


DISCUSSION AND EXAMINATION QUESTIONS

1. Discuss some of the problems encountered with algorithms which extract drainage networks from digital elevation models, and present some possible solutions to those problems.

2. How would the incorporation of hydrologic information-- such as drainage divides and stream networks--into a GIS assist a resource manager?

3. Discuss the problems of obtaining maps of slope and aspect from DEMs.

4. What possible ways are there for displaying a DEM on a computer screen? Discuss the advantages and disadvantages of each from the point of view of a) the users and b) the programmers.

THE TIN MODEL
A. INTRODUCTION
The TIN model
Creating TINs

B. HOW TO PICK POINTS
1. Fowler and Little algorithm
2. VIP (Very Important Points) Algorithm
3. Drop heuristic

C. HOW TO TRIANGULATE A TIN
1. Distance ordering
2. Delaunay triangulation

D. ALTERNATIVE METHODS OF CREATING TINS
Break lines
TINs from contours

E. STORING TINS
1. Triangle by triangle
2. Points and their neighbors
Comparison of the two structures

F. ALGORITHMS ON TINS
Slope and aspect
Contouring
Finding Drainage Networks

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES



UNIT 39 - THE TIN MODEL

Compiled with assistance from Thomas K. Poiker, Simon Fraser University


A. INTRODUCTION


the Triangulated Irregular Network model is a significant alternative to the regular raster of a DEM, and has been adopted in numerous GISs and automated mapping and contouring packages

the TIN model was developed in the early 1970''s as a simple way to build a surface from a set of irregularly spaced points

several prototype systems were developed in the 1970''s

commercial systems using TIN began to appear in the 1980''s as contouring packages, some embedded in GISs

The TIN model

irregularly spaced sample points can be adapted to the terrain, with more points in areas of rough terrain and fewer in smooth terrain
an irregularly spaced sample is therefore more efficient at representing a surface

in a TIN model, the sample points are connected by lines to form triangles
within each triangle the surface is usually represented by a plane

by using triangles we ensure that each piece of the mosaic surface will fit with its neighboring pieces - the surface will be continuous - as each triangle''s surface would be defined by the elevations of the three corner points

it might make sense to use more complex polygons as mosaic tiles in some cases, but they can always be broken down into triangles
for example, if a plateau is eroded by gullies, the remaining plateau would be a flat (planar) area bounded by an irregular, many-sided polygon. In the TIN model it would be represented by a number of triangles, each at the same elevation

diagram









for vector GISs, TINs can be seen as polygons having attributes of slope, aspect and area, with three vertices having elevation attributes and three edges with slope and direction attributes

the TIN model is attractive because of its simplicity and economy

in addition, certain types of terrain are very effectively divided into triangles with plane facets
this is particularly true with fluvially-eroded landscapes

however, other landscapes, such as glaciated ones, are not well represented by flat triangles
triangles work best in areas with sharp breaks in slope, where TIN edges can be aligned with breaks, e.g. along ridges or channels

Creating TINs

despite its simplicity, creating a TIN model requires many choices:
how to pick sample points
in many cases these must be selected from some existing, dense DEM or digitized contours
normally, a TIN of 100 points will do as well as a DEM of several hundred at representing a surface
how to connect points into triangles
how to model the surface within each triangle
this is almost always resolved by using a plane surface
however, if the surface is contoured, the contours will be straight and parallel within each triangle, but will kink sharply at triangle edges

diagram








consequently, some implementations of TIN represent the surface in each triangle using a mathematical function chosen to ensure that slope changes continuously, not abruptly, at the edges of the triangle


B. HOW TO PICK POINTS

given a dense DEM or set of digitized contours, how should points be selected so that the surface is accurately represented?
consider the following 3 methods for selecting from a DEM
all of them try to select points at significant breaks of the surface
such breaks are common on terrain, absent on smooth mathematical surfaces

1. Fowler and Little algorithm

this approach is based on the concept of surface-specific points which play a specific role in the surface
e.g. represent features such as peaks and pits

Procedure

first examine the surface using a 3x3 window, looking at a small array of 9 points at each step
label the 8 neighbors of the central point + if higher, - if lower
a point is a peak if its 8 neighbors are all lower (8 +s)

a point is a pit if its 8 neighbors are all higher (8 -s)
a point is a pass if the +s and -s alternate around the point with at least two complete cycles, e.g
+ + - - - - + + (2 cycles)

or + - + - - + - + (4 cycles)


next the surface is examined using a 2x2 window
except at the edges, every point appears in four positions of the window
a point is a potential ridge point if it is never lowest in any position of the window
a point is a potential channel point if it is never highest in any position of the window

then starting at a pass, search through adjacent ridge points until a peak is reached
similarly, search from the pass through adjacent channel points until a pit is reached

Finishing the TIN

the result of this process is a connected set of peaks, pits, passes, ridge lines and channel lines
Fowler and Little recommend that the number of points in each ridge and channel line be reduced by thinning using a standard thinning algorithm
it may be desirable to add additional points from the DEM which are not on ridges or channels if we can significantly reduce any substantial differences from the real surface by doing so

triangles are built between all selected points

the resulting surface will differ from the original DEM, perhaps substantially in some areas
Comments


the Fowler and Little algorithm is complex
performs better on some types of landscape than others, particularly where there are sharp breaks of slope along ridges, and where channels are sharply incised
it may require substantial "fine tuning" to work well

2. VIP (Very Important Points) Algorithm

unlike the previous algorithm which tries to identify the major features of the terrain, VIP works by examining the surface locally using a window

this is a simplification of the technique used in ESRI''s ARC/INFO
Procedure


each point has 8 neighbors, forming 4 diametrically opposite pairs, i.e. up and down, right and left, upper left and lower right, and upper right and lower left

for each point, examine each of these pairs of neighbors in turn
connect the two neighbors by a straight line, and compute the perpendicular distance of the central point from this line

diagram








average the four distances to obtain a measure of "significance" for the point

delete points from the DEM in order of increasing significance, deleting the least significant first
this continues until one of two conditions is met:
the number of points reaches a predetermined limit
the significance reaches a predetermined limit

Comments

because of its local nature, this method is best when the proportion of points deleted is low

because of its emphasis on straight lines, and the TIN''s use of planes, it is less satisfactory on curved surfaces
diagram










3. Drop heuristic

this method treats the problem as one of optimization
given a dense DEM, find the best subset of a predetermined number of points such that when the points are connected by triangles filled with planes, the TIN gives the best possible representation of the surface

Procedure

start with the full DEM
examine each point in turn
temporarily drop the point and modify the surrounding triangles accordingly

diagram









find the triangle containing the dropped point
measure the difference between the elevation of the point, and the elevation of the new surface at the point
restore the dropped point, storing the calculated elevation difference

continue the process dropping each point in turn

when all the points have been dropped, remove the point which produced the least difference and start the process again
Comments


the TIN will likely be more accurate if the differences are measured not only for the point being dropped, but for all previously dropped points lying within the modified triangles as well, but this would be time- consuming



rather than select points from the DEM, the best solution (in the sense of producing the best possible TIN for a given number of points) may be to locate TIN points at locations and elevations not in the original raster
these points may be chosen from air photographs or ground surveys

THE TIN MODEL
HOW TO TRIANGULATE A TIN

having selected a set of TIN points, these will become the vertices of the triangle network
there are several ways to connect vertices into triangles

"fat" triangles with angles close to 60 degrees are preferred since this ensures that any point on the surface is as close as possible to a vertex
this is important because the surface representation is likely most accurate at the vertices

consider the following two methods for building the triangles
in practice almost all systems use the second

1. Distance ordering

Procedure

compute the distance between all pairs of points, and sort from lowest to highest

connect the closest pair of points
connect the next closest pair if the resulting line does not cross earlier lines

repeat until no further lines can be selected

the points will now be connected with triangles
this tends to produce many skinny triangles instead of the preferred "fat" triangles

2. Delaunay triangulation

by definition, 3 points form a Delaunay triangle if and only if the circle which passes through them contains no other point
diagram










another way to define the Delaunay triangulation is as follows:
partition the map by assigning all locations to the nearest vertex
the boundaries created in this process form a set of polygons called Thiessen polygons or Voronoi or Dirichlet regions overhead - Delaunay triangles from Thiessen polygons
two vertices are connected in the Delaunay triangulation if their Thiessen polygons share an edge

this method produces the preferred fat triangles

the boundary edges on the Delaunay network form the Convex Hull, which is the smallest polygon to contain all of the vertices
Procedure


there are several techniques for building the triangles: 1. since the convex hull will always be part of the Delaunay network
start with these edges and work inwards until the network is complete 2. connect the closest pair which by definition must be a Delaunay edge
search for a third point such that no other point falls in the circle through them
continue working outward from these edges for the next closest point

Problems

Delaunay triangles are not hierarchical
they cannot be aggregated to form bigger triangles
if they are divided into smaller triangles, the results tend to be poorly shaped (not "fat")

D. ALTERNATIVE METHODS OF CREATING TINS
Break lines


methods presented above concentrate on finding TIN vertices, then connecting them with triangles

a major advantage of TINs is their ability to capture breaks of slope, if edges can be aligned with known ridges or channels

this requires a different approach, where "breaklines" are incorporated into the triangle network as edges after the points have been triangulated
the result is generally non-Delaunay, i.e. an edge need not be an edge in the Delaunay network of the vertices

this approach is now incorporated into some TIN software, e.g. the ARC/INFO TIN module

TINs from contours

contours are a common source of digital elevation data

rather than convert from contours to a grid (DEM) and then to a TIN, it is more direct to obtain the TIN from contours directly

a TIN can be created by selecting points from the digitized contour lines

selection may create a triangle with three vertices on the same contour (at the same elevation)

such a "flat triangle" has no defined aspect, causes problems in modeling runoff
several ways of avoiding this problem have been devised


E. STORING TINS

there are basically two ways of storing triangulated networks: 1. Triangle by triangle 2. Points and their neighbors
overhead - Storing TINs


1. Triangle by triangle

in this case, a record usually contains:
a reference number for the triangle

the x,y,z-coordinates of the three vertices
the reference numbers of the three neighboring triangles

since a vertex participates in, on the average, six triangles, repetition of coordinates can be avoided by creating a separate vertex file and referencing them in the triangle files

2. Points and their neighbors

the alternative is to store for every vertex:
an identification number
the xyz coordinates
references (pointers) to the neighboring vertices in clockwise or counter-clockwise order

this structure was the original TIN structure (Peucker et al, 1978)

Comparison of the two structures

both structures are necessary, depending on the purpose
slope analysis needs the first
contouring and other traversing procedures work best with the second

as long as one can be extracted from the other in close to linear time (i.e., without an exhaustive search per point), either will do

the second generally needs less storage space
however, the savings within different TIN structures is minor compared to the reduction of points from the regular grid to the triangular network


F. ALGORITHMS ON TINS
Slope and aspect


compared to the DEM, it is simple to find slope and aspect at some location using a TIN - we simply find the slope and aspect attributes of the containing triangle

Contouring

overhead - Contouring TINs
Example: find the 100 m contour


begin by determining, with a different algorithm, that an edge intersects the contour level of 100 meters

the vertex above the contour level is the "reference point" (r) and the one below the "sub-point" (s)
establish a position in memory for the pair of points which we will call present_edge

computing P1, the first contour point, is then a trivial linear calculation

now shift over to the traversing procedure
look at the third vertex in the triangle and ask whether it is a reference or a sub-point by testing whether it is above or below the contour level
the result replaces the appropriate value in present_edge and the next contour point is calculated.

the vertices in present_edge represent a second triangle whose third vertex is the next candidate

Finding Drainage Networks

two approaches can be used to find drainage networks and watersheds:
1. treat each triangle as a discrete element

as with the DEM, water is passed from one triangle to another, selecting the neighbor in the direction of steepest slope in each case

diagram








2. treat the surface as a mosaic of planes

two forms of flow occur - channel and overland
water flows over each triangle as a continuous sheet, and collects along edges
in this model, it is possible for water to collect in a "channel" between two triangles, flow to a vertex, and flow into the top of one or more triangles
in this case we must allow channel flow down the line of steepest slope from the apex of these triangles

diagram



if there is more than one such triangle, then a bifurcation is implied, with water flowing in more than one direction from the apex, and into more than one drainage basin

this is awkward, as we no longer have a clear definition of watershed

diagram










REFERENCES
Chen, Z., and J.A. Guevara, 1987. "Systematic selection of very important points (VIP) from digital terrain models for construction triangular irregular networks," Proceedings, AutoCarto 8, ASPRS/ACSM, Falls Church, VA, pp. 50-56. A description of ESRI''s VIP approach to constructing a TIN.

Fowler, R.J., and J.J. Little, 1979. "Automatic extraction of irregular network digital terrain models," Computer Graphics 13:199-207.

Heller, M., 1986. "Triangulation and Interpolation of Surfaces," in R. Sieber and K. Brassel (eds), A Selected Bibliography on Spatial Data Handling: Data Structures, Generalization and Three-Dimensional Mapping, Geo- Processing Series, vol 6, Department of Geography, University of Zurich, pp 36 - 45. A good overview with literature, mainly on triangulation.

Mark, D. M., 1975. "Computer Analysis of Topography: A Comparison of Terrain Storage Methods," Geografisker Annaler 57A:179-188. A quantitative comparison of regular grids and triangulated networks.

Mark, D.M., 1979. "Phenomenon-Based Data-Structuring and Digital Terrain Modelling," Geo-Processing 1:27-36. A very interesting conceptual article proposing a phenomenon-based approach to data structuring. Such an approach has to involve expert knowledge of the phenomenon.

Peucker, T.K., R.J. Fowler, J.J. Little and D.M. Mark, 1978. "The Triangulated Irregular Network," Proceedings, American Society of Photogrammetry: Digital Terrain Models (DTM) Symposium, St. Louis, Missouri, May 9-11,

1978, pp 516-540. The basic description of the original TIN project.


DISCUSSION AND EXAM QUESTIONS

1. Argue the differences between the regular grid and the triangular net approaches. Apply the argument to the computation of slope, contouring and visibility.

2. Mark''s article in 1979 argued that the TIN model was more appropriate to the nature of certain geographical phenomena. Do you agree? For what types of landforms is TIN most and least appropriate?

3. Discuss the various methods proposed for selecting TIN vertices from a DEM, and their relative strengths and weaknesses.

4. Describe how information on directions of flow can be obtained from a TIN, and the nature of the extracted stream network. How does this compare to networks derived from DEMs?

SPATIAL INTERPOLATION I
A. INTRODUCTION

B. CLASSIFICATION OF INTERPOLATION PROCEDURES
1. Point Interpolation/Areal Interpolation
2. Global/Local Interpolators
3. Exact/Approximate Interpolators
4. Stochastic/Deterministic Interpolators
5. Gradual/Abrupt Interpolators

C. POINT BASED INTERPOLATION - EXACT METHODS
1. Proximal
2. B-splines
3. Kriging
Variograms
Deriving the variogram
Computing the estimates
4. Manual Interpolation or "eyeballing"

D. POINT BASED INTERPOLATION - APPROXIMATE METHODS
1. Trend Surface Analysis
2. Fourier Series
3. Moving average/distance weighted average

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES



UNIT 40 - SPATIAL INTERPOLATION I

Compiled with assistance from Nigel M. Waters, University of Calgary


A. INTRODUCTION


spatial interpolation is the procedure of estimating the value of properties at unsampled sites within the area covered by existing observations
in almost all cases the property must be interval or ratio scaled

can be thought of as the reverse of the process used to select the few points from a DEM which accurately represent the surface

rationale behind spatial interpolation is the observation that points close together in space are more likely to have similar values than points far apart (Tobler''s Law of Geography)

spatial interpolation is a very important feature of many GISs

spatial interpolation may be used in GISs:
to provide contours for displaying data graphically
to calculate some property of the surface at a given point
to change the unit of comparison when using different data structures in different layers
frequently is used as an aid in the spatial decision making process both in physical and human geography and in related disciplines such as mineral prospecting and hydrocarbon exploration

many of the techniques of spatial interpolation are two- dimensional developments of the one dimensional methods originally developed for time series analysis

this unit introduces spatial interpolation and examines point based interpolation, while the next looks at areal procedures and some applications


B. CLASSIFICATION OF INTERPOLATION PROCEDURES

there are several different ways to classify spatial interpolation procedures:

1. Point Interpolation/Areal Interpolation

point based

given a number of points whose locations and values are known, determine the values of other points at predetermined locations

diagram








point interpolation is used for data which can be collected at point locations e.g. weather station readings, spot heights, oil well readings, porosity measurements
interpolated grid points are often used as the data input to computer contouring algorithms
once the grid of points has been determined, isolines (e.g. contours) can be threaded between them using a linear interpolation on the straight line between each pair of grid points
point to point interpolation is the most frequently performed type of spatial interpolation done in GIS

lines to points
e.g. contours to elevation grids

areal interpolation
given a set of data mapped on one set of source zones determine the values of the data for a different set of target zones
e.g. given population counts for census tracts, estimate populations for electoral districts

diagram









2. Global/Local Interpolators

global interpolators determine a single function which is mapped across the whole region
a change in one input value affects the entire map

local interpolators apply an algorithm repeatedly to a small portion of the total set of points
a change in an input value only affects the result within the window

global algorithms tend to produce smoother surfaces with less abrupt changes
are used when there is an hypothesis about the form of the surface, e.g. a trend

some local interpolators may be extended to include a large proportion of the data points in set, thus making them in a sense global

the distinction between global and local interpolators is thus a continuum and not a dichotomy
this has led to some confusion and controversy in the literature

3. Exact/Approximate Interpolators

exact interpolators honor the data points upon which the interpolation is based
the surface passes through all points whose values are known
honoring data points is seen as an important feature in many applications e.g. the oil industry

proximal interpolators, B-splines and Kriging methods all honor the given data points
Kriging, as discussed below, may incorporate a nugget effect and if this is the case the concept of an exact interpolator ceases to be appropriate

approximate interpolators are used when there is some uncertainty about the given surface values
this utilizes the belief that in many data sets there are global trends, which vary slowly, overlain by local fluctuations, which vary rapidly and produce uncertainty (error) in the recorded values
the effect of smoothing will therefore be to reduce the effects of error on the resulting surface

4. Stochastic/Deterministic Interpolators

stochastic methods incorporate the concept of randomness
the interpolated surface is conceptualized as one of many that might have been observed, all of which could have produced the known data points

stochastic interpolators include trend surface analysis, Fourier analysis and Kriging
procedures such as trend surface analysis allow the statistical significance of the surface and uncertainty of the predicted values to be calculated

deterministic methods do not use probability theory (e.g. proximal)

5. Gradual/Abrupt Interpolators

a typical example of a gradual interpolater is the distance weighted moving average
usually produces an interpolated surface with gradual changes
however, if the number of points used in the moving average is reduced to a small number, or even one, there would be abrupt changes in the surface

it may be necessary to include barriers in the interpolation process
semipermeable, e.g. weather fronts
will produce quickly changing but continuous values
impermeable barriers, e.g. geologic faults
will produce abrupt changes

SPATIAL INTERPOLATION I
C. POINT BASED INTERPOLATION - EXACT METHODS

Lam (1983) and Burrough (1986) describe a variety of quantitative interpolation methods suitable for computer contouring algorithms
in this and the next sections, these are divided into exact and approximate methods
this section deals with exact methods

1. Proximal

all values are assumed to be equal to the nearest known point
is a local interpolator
computing load is relatively light

output data structure is Thiessen polygons with abrupt changes at boundaries
has ecological applications such as territories and influence zones

best for nominal data although originally used by Thiessen for computing areal estimates from rainfall data

is absolutely robust, always produces a result, but has no "intelligence" about the system being analyzed

available in very few mapping packages, SYMAP is a notable exception

2. B-splines

uses a piecewise polynomial to provide a series of patches resulting in a surface that has continuous first and second derivatives
ensures continuity in:
elevation (zero-order continuity) - surface has no cliffs
slope (first-order continuity) - slopes do not change abruptly, there are no kinks in contours
curvature (second order continuity) - minimum curvature is achieved
produces a continuous surface with minimum curvature

output data structure is points on a raster
note that maxima and minima do not necessarily occur at the data points

is a local interpolator
can be exact or used to smooth surfaces
computing load is moderate

best for very smooth surfaces

poor for surfaces which show marked fluctuations, this can cause wild oscillations in the spline

are popular in general surface interpolation packages but are not common in GISs

can be approximated by smoothing contours drawn through a TIN model

see Burrough (1986), Davis (1986) and mathematical aspects in Lam (1983) and Hearn and Baker (1986)
also described in "numerical approximation theory"

3. Kriging

developed by Georges Matheron, as the "theory of regionalized variables", and D.G. Krige as an optimal method of interpolation for use in the mining industry

the basis of this technique is the rate at which the variance between points changes over space
this is expressed in the variogram which shows how the average difference between values at points changes with distance between points

Variograms

diagram




De (vertical axis) is E(zi - zj)2, i.e. "expectation" of the difference
i.e. the average difference in elevation of any two points distance d apart
d (horizontal axis) is distance between i and j

most variograms show behavior like the diagram
the upper limit (asymptote) of De is called the sill
the distance at which this limit is reached is called the range
the intersection with the y axis is called the nugget
a non-zero nugget indicates that repeated measurements at the same point yield different values

in developing the variogram it is necessary to make some assumptions about the nature of the observed variation on the surface:
simple Kriging assumes that the surface has a constant mean, no underlying trend and that all variation is statistical

universal Kriging assumes that there is a deterministic trend in the surface that underlies the statistical variation

in either case, once trends have been accounted for (or assumed not to exist), all other variation is assumed to be a function of distance

Deriving the variogram

the input data for Kriging is usually an irregularly spaced sample of points

to compute a variogram we need to determine how variance increases with distance

begin by dividing the range of distance into a set of discrete intervals, e.g. 10 intervals between distance 0 and the maximum distance in the study area

for every pair of points, compute distance and the squared difference in z values

assign each pair to one of the distance ranges, and accumulate total variance in each range

after every pair has been used (or a sample of pairs in a large dataset) compute the average variance in each distance range

plot this value at the midpoint distance of each range

Computing the estimates

once the variogram has been developed, it is used to estimate distance weights for interpolation
interpolated values are the sum of the weighted values of some number of known points where weights depend on the distance between the interpolated and known points

weights are selected so that the estimates are:
unbiased (if used repeatedly, Kriging would give the correct result on average)
minimum variance (variation between repeated estimates is minimum

problems with this method:
when the number of data points is large this technique is computationally very intensive
the estimation of the variogram is not simple, no one technique is best
since there are several crucial assumptions that must be made about the statistical nature of the
variation, results from this technique can never be absolute


simple Kriging routines are available in the Surface II package (Kansas Geological Survey) and Surfer (Golden Software), and in the GEOEAS package for the PC developed by the US Environmental Protection Agency

4. Manual Interpolation or "eyeballing"

traditionally not a highly regarded method among geographers and cartographers

however, Dutton-Marion (1988) has shown that among geologists this is a very important procedure and that most geologists actually distrust the more sophisticated, mathematical algorithms
they feel that they can use their expert knowledge, modelling capabilities and experience and generate a more realistic interpolation by integrating this knowledge into the construction of the geological surface

attempts are now being made to use knowledge engineering techniques to extract this knowledge from experts and build it into an expert system for interpolation
see Unit 74 for more on this topic

characteristics of this method include:
procedures are local as different methods may be used by the expert on different parts of the map
tend to honor data points
abrupt changes such as faults are more easily modelled using these methods

the surfaces are subjective and vary from expert to expert

output data structure is usually in the form of a contour


D. POINT BASED INTERPOLATION - APPROXIMATE METHODS
1. Trend Surface Analysis


surface is approximated by a polynomial

output data structure is a polynomial function which can be used to estimate values of grid points on a raster or the value at any location

the elevation z at any point (x,y) on the surface is given by an equation in powers of x and y

e.g. a linear equation (degree 1) describes a tilted plane surface:
z = a + bx + cy


e.g. a quadratic equation (degree 2) describes a simple hill or valley:
z = a + bx + cy + dx2 + exy + fy2


in general, any cross-section of a surface of degree n can have at most n-1 alternating maxima and minima
e.g. a cubic surface can have one maximum and one minimum in any cross-section
equation for the cubic surface:
z = a + bx + cy + dx2 + exy + fy2 + gx3 + hx2y + ixy2 + jy3



a trend surface is a global interpolator
assumes the general trend of the surface is independent of random errors found at each sampled point

computing load is relatively light

problems
statistical assumptions of the model are rarely met in practice
edge effects may be severe
a polynomial model produces a rounded surface
this is rarely the case in many human and physical applications

available in a great many mapping packages
see Davis (1973) and Sampson (1978) for non- orthogonal polynomials; Mather (1976) for orthogonal polynomials

2. Fourier Series

approximates the surface by overlaying a series of sine and cosine waves

a global interpolator
computing load is moderate

output data structure is the Fourier series which can be used to estimate grid values for a raster or at any point

best for data sets which exhibit marked periodicity, such as ocean waves

rarely incorporated in computing packages
simple program and discussion in Davis (1973)

3. Moving average/distance weighted average

estimates are averages of the values at n known points:
z = S wizi/ S wi

where w is some function of distance, such as:

w = 1/dk w = e-kd



an almost infinite variety of algorithms may be used, variations include:
the nature of the distance function
varying the number of points used
the direction from which they are selected

is the most widely used method

objections to this method arise from the fact that the range of interpolated values is limited by the range of the data
no interpolated value will be outside the observed range of z values

diagram









other problems include:
how many points should be included in the averaging?
what to do about irregularly spaced points?
how to deal with edge effects?


REFERENCES
Burrough, P.A., 1986. Principles of Geographical Information Systems for land Resources Assessment, Clarendon, Oxford. See Chapter 8.

Davis, J.C., 1986. Statistics and Data Analysis in Geology, 2nd edition, Wiley, New York. (Also see the first, 1973, edition for program listings.)

Dutton-Marion, K.E., 1988. Principles of Interpolation Procedures in the Display and Analysis of Spatial Data: A Comparative Analysis of Conceptual and Computer Contouring, unpublished Ph.D. Thesis, Department of Geography, University of Calgary, Calgary, Alberta.

Hearn, D., and Baker, M.P., 1986. Computer Graphics, Prentice-Hall Inc, Englewood Cliffs, N.J.

Jones, T.A., Hamilton, D.E. and Johnson, C.R., 1986. Contouring Geologic Surfaces with the Computer, Van Nostrand Reinhold, New York

Lam, N., 1983. "Spatial Interpolation Methods: A Review," The American Cartographer 10(2):129-149.

Mather, P.M., 1976. Computational Methods of Multivariate Analysis in Physical Geography, Wigley, New York.

Sampson, R.J., 1978. Surface II, revised edition, Kansas Geological Survey, Lawrence, Kansas.

Waters, N.M., 1988. "Expert Systems and Systems of Experts," Chapter 12 in W.J. Coffey, ed., Geographical Systems and Systems of Geography: Essays in Honour of William Warntz, Department of Geography, University of Western Ontario, London, Ontario.

An important class of interpolation methods is missing here - so called radial basis functions, such as multiquadrics, thin plate spline, thin plate spline with tension, regularized spline with tension and a large number of other flavours of this approach (also sometimes refered to as variational approach). These methods are available in almost every GIS, from ArcINFO, GRASS, SURFER to specialized visualization packages. The description can be found at Mitas, L., Mitasova, H., 1999, Spatial Interpolation. In: P.Longley, M.F. Goodchild, D.J. Maguire, D.W.Rhind (Eds.), Geographical Information Systems: Principles, Techniques, Management and Applications, GeoInformation International, Wiley, 481-492.

DISCUSSION AND EXAM QUESTIONS

1. Are there other techniques for surface generation? How many of the above procedures are commonly used? How would they be ranked in terms of popularity? Give examples from the literature of where they have been used.

2. How does hand contouring rate as an alternative? What did you think of it and have you changed your mind? What are the key features and processes involved in hand contouring?

3. Explain the advantages and disadvantages of manual interpolation as used in hand contouring over computer based interpolation as used in a computer contouring package.

4. Describe the different ways in which spatial interpolation algorithms can be classified.

SPATIAL INTERPOLATION II
A. INTRODUCTION

B. AREAL INTERPOLATION - NON-VOLUME PRESERVING
Procedure

C. AREAL INTERPOLATION - VOLUME-PRESERVING
1. Overlay
2. Pycnophylactic
Boundary conditions

D. SPECIAL CASES OF SPATIAL INTERPOLATION
1. Mapping populated areas
2. Estimating trade areas

E. A GIS PERSPECTIVE ON INTERPOLATION
Expert systems for spatial interpolation algorithms
Conclusion

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES



UNIT 41 - SPATIAL INTERPOLATION II

Compiled with assistance from Nigel M. Waters, University of Calgary


A. INTRODUCTION


this unit continues the examination of spatial interpolation by looking at areal interpolation techniques and some applications

areal interpolation is the problem of transferring data from one set of areas (source reporting zones) to another (target reporting zones)
this is easy if the target set is an aggregation of the source set, but more difficult if the boundaries of the target set are independent of the source set

later we look at applications that do not fall easily into either point or areal interpolation categories


B. AREAL INTERPOLATION - NON-VOLUME PRESERVING

e.g. interpolating population counts from census tracts to school districts

Procedure

overhead - Non-volume preserving areal interpolation

calculate the population density for each source census tract by dividing population by area

identify a centroid for each region
assign to the point located at each centroid, the population density value determined for its enclosing area

using this set of points, interpolate a gridded population density surface using any of the methods described previously

convert each grid cell''s value to a population by multiplying the estimated density by the cell''s area

overlay the interpolated grid on the target map and assign each grid value to each its target region (school district)

calculate the total population in each target region

this method is criticized because:

choosing the center point is ill-defined
inadequacy of point based interpolation methods
most importantly, the total value of each zone is not conserved
e.g. if a source zone is divided into two target zones, the total population of the target zones after interpolation need not equal the population of the source zone

SPATIAL INTERPOLATION II
C. AREAL INTERPOLATION - VOLUME-PRESERVING
1. Overlay


discussed by MacDougall (1976) and Goodchild and Lam (1980)

procedure involves:
overlay of target and source zones
determining the proportion of each source zone that is assigned to each target zone
apportioning the total value of the attribute for each source zone to target zones according to the areal proportions











assumes uniform density of the attribute within each zone
e.g. uniform population density if the attribute is total zone population

2. Pycnophylactic

see Tobler (1979) for the original algorithm

the technique has two objectives: 1. create a smooth surface, no steps
attribute values should not change suddenly at zone boundaries 2. the total value of the attribute within each zone must be correct

procedure: 1. overlay a dense raster on a choropleth map 2. divide each zone''s total value equally among the raster cells that overlap the zone
3. smooth the values by replacing each cell''s value with the average of its neighbors 4. sum the values of the cells in each zone 5. adjust the values of all cells within each zone proportionally so that the zone''s total is the same as the original total

e.g., if the total is 10% low, increase the value of each cell by 10% 6. repeat steps 3, 4 and 5 until no more changes occur

does not require an assumption of homogeneity within zones but rapid variation within zones may affect the quality of interpolation

output is a contour or continuously shaded map

Boundary conditions

at the boundary of the reporting zones, pixels will have neighbors outside the study area and therefore without values
some decision must be made about the behavior of the surface outside the study area
e.g. population density equals zero (a lake or rural area)
e.g. population density unknown, assumed equal to the values of the outermost pixels of the study area


D. SPECIAL CASES OF SPATIAL INTERPOLATION
1. Mapping populated areas


objective is to create a map showing "populated areas", given point population values for a number of cities and towns

this problem arises frequently when populated areas are represented as points

it arises for small reporting zones when boundary files are unavailable, but data includes centroid locations e.g. US or UK census data

are several methods that could be used

a simple approach would be to estimate the populated area using an empirical relationship like:
A is proportional to p0.84


and draw a circle around the point, of radius:
/ (A / p)



Bracken and Martin (1989) have developed methods for replacing ED centroids by disks, the radius of each disk being estimated from the distances to neighboring centroids
the method works very well with UK ED data

an alternative approach might proceed as follows:
establish a critical population density for defining an urban area
spread the population over each urban area so that population density is highest in the center and decreases gradually outwards
e.g. use a normal distribution function










interpolate densities to a raster, accumulating values where the population spread from two urban areas overlap










draw contours at the critical value to define the boundaries of the populated areas

both of these methods fall within the general heading of density estimation
a density is being estimated from a collection of points
see Silverman

2. Estimating trade areas

in marketing, it is often desirable to plot the boundary of a trade area for e.g. a store, given information on the home locations of customers

simplest case is when the location of all customers and non-customers is known
simply draw a boundary contour between them











if the location of non-customers is not known: 1. calculate the average distance to all customers and draw a circle or 2. divide the area into sectors, average the distance to customers within the sectors and draw a distance arc for each sector (see Huff and Batsell, 1977)










these techniques do not pick up islands or holes in the trade area









or 3. give each customer a small probability surface

accumulate values as in the populated areas example
set critical value for delimiting trade area


E. A GIS PERSPECTIVE ON INTERPOLATION

both point and areal interpolation try to estimate a continuous surface
in the point case, the surface has been measured at sample points
in the areal case, the surface of population density is estimated from total population counts in each reporting zone

in other cases it is impossible to conceive of a continuous surface
e.g. if each point is a city and the attribute is city population
if city A has population 1 million and city B 100 km away has population 2 million, there is no reason to believe in the existence of a city half way between A and B with population 1.5 million
in this case, the variable population exists only at the points, not as a continuous surface
in other cases the variable might exist only along lines e.g. traffic density on a street network

we must distinguish here between layer and object views of the world
a continuous surface of elevations is a layer view of the world - there is one value of elevation at an infinite number of possible places in the space
the point map of cities is an object view of the world - the space in between points is empty, and has no value of the population variable
the street map is an object view of the world - the world is empty except where there are streets - only along streets is traffic density defined

spatial interpolation implies a layer view of the world, and it requires special techniques (e.g. density estimation) to apply it to objects such as store customers

Expert systems for spatial interpolation algorithms

a good GIS should include a range of spatial interpolation routines so that the user can choose the most appropriate method for the data and the task

ideally, these routines should provide a natural language interface which would lead the user through an appropriate series of questions about the intentions, goals and aims of the user and about the nature of the data

a number of prototype expert systems for guiding the choice of a spatial interpolation algorithm have been developed

these may be written in the form of:
an expert system shell (Waters, 1988)
in one of the artificial intelligence languages such as Prolog or LISP (see Dutton-Marion, 1988)
or in a high level language such as Pascal (Maslyn, 1987)

Conclusion

if computer contouring and surface generation techniques are to be incorporated successfully into GIS, they must be easy to use and effective
"easy to use" implies that those without a detailed knowledge of the mathematical and statistical characteristics of the procedure should be able to choose the correct technique for displaying a particular data set for a particular purpose
note: statisticians argue that this is not an ideal goal as people may use techniques without a proper understanding of the underlying assumptions

"effective" means that these techniques should be informative, highlighting the essential nature of the data and/or surface and serving the purpose of the researcher/analyst
the researcher''s measure of success will be largely subjective and visual - does the result look right?

this purpose may vary from an attempt to model all the "real" intricacies of the surface to simply trying to highlight the general, spatial trend of the data in order to aid in the decision-making process


REFERENCES
Bracken, I. and D. Martin, 1989. "The generation of spatial population distributions from census centroid data," Environment and Planning A 21:537-44.

Dutton-Marion, K.E., 1988. Principles of Interpolation Procedures in the Display and Analysis of Spatial Data: A Comparative Analysis of Conceptual and Computer Contouring, unpublished Ph.D. Thesis, Department of Geography, University of Calgary, Calgary, Alberta.

Goodchild, M.F., and Lam, N., 1980. "Areal Interpolation: A Variant of the Traditional Spatial Problem," Geo- Processing 1: 297-312.

Huff, D.L. and R.R. Batsell, 1977. "Delimiting the areal extent of a market area," Journal of Marketing Research 14:581-5.

MacDougall, E.B., 1976. Computer Programming for Spatial Problems, Arnold, London.

Maslyn, R.M., 1987. "Gridding Advisor: An Expert System for Selecting Gridding Algorithms," Geobyte 2(4):42-43.

Silverman, B.W., 1986. Density Estimation for Statistics and Data Analysis, Chapman and Hall, London.

Tobler, W.R., 1979. "Smooth pycnophylactic interpolation for geographical regions," Journal of the American Statistical Association 74:519-30.

Waters, N.M., 1988. "Expert Systems and Systems of Experts," Chapter 12 in W.J. Coffey, ed., Geographical Systems and Systems of Geography: Essays in Honour of William Warntz, Department of Geography, University of Western Ontario, London, Ontario.

DISCUSSION AND EXAM QUESTIONS

1. What are the main considerations to be aware of in computer contouring? What are the key aspects for the design of an expert system to aid in choosing a computer contouring algorithm within a GIS? How long do you think it will be before such expert systems become widely available?

2. Describe how Tobler''s pycnophylactic method differs from volume-preserving overlay. What model of the underlying spatial distribution is assumed by each? Give examples of phenomena and application which fit each method''s assumptions.

3. Describe the application of areal interpolation in political districting.

4. One test of a spatial interpolation method is that its results would be judged as equal or better to hand contouring by a specialist, e.g. a field geologist, with detailed knowledge of the phenomenon being mapped. How well do the methods discussed in these two units do against this criterion?

TEMPORAL AND THREE-DIMENSIONAL REPRESENTATIONS
A. INTRODUCTION

B. VERTICAL DIMENSION ("3D")
Uses of 3D representations

C. CHARACTER OF THE PHENOMENON
Distribution
Topological complexity
Geometric complexity

D. METHODS OF REPRESENTATION
2 1/2 dimensions
True three dimensional representations
Summary

E. TIME DEPENDENCE
Possible models
Summary

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES




UNIT 42 - TEMPORAL AND THREE-DIMENSIONAL REPRESENTATIONS
Compiled with assistance from John H. Ganter, University of Pennsylvania



A. INTRODUCTION

although the vast majority of GISs currently work only in two dimensions, across the plane, certain applications require the addition of other dimensions, namely time or elevation/depth
most geological applications require a consideration of attributes in the vertical dimension as well as the horizontal ones
temporal variations are important in many economic and social studies
oceanographic and meteorological models need to consider variations both in the vertical and the temporal dimensions

this unit will look briefly at how these additional dimensions can be incorporated into GISs


B. VERTICAL DIMENSION ("3D")

there are two very different ways of looking at representations of the vertical dimension (normally called the third dimension) in GIS

the most commonly recognized is a data structure where a z value (normally elevation) is recorded as an attribute for each data point (x,y)
these z values can be used in a perspective plot to create the appearance of 3 dimensions
this is not true three dimensional representation and is often referred to as "2 1/2 dimensions"

overhead - Perspective plot - Tiefort Mountains, CA, same data used for graphics in Unit 11

these 2 1/2D plots are an attractive way of displaying topography and other continuous surfaces from DEMs or TINs
perspective plots can be computed from any viewpoint
additional layers can be "draped" over the surface using color
"artist''s impressions" can be created by converting classes (e.g. of land cover) to simulated trees, etc.

with powerful computers, it is possible to animate 2 1/2D plots to create simulations of flying over topography
"LA the Movie" was created by Jet Propulsion Lab, Pasadena CA, by draping a Landsat scene over a DEM of LA, then simulating the view from a moving aircraft

true three dimensional representations store data in structures that reference locations in 3D space (x,y,z)
here z is not an attribute but an element of the location of the point
this permits data to be recorded at several points with equal x and y coordinates, e.g., soundings in the ocean or atmosphere, geologic logs of wells

true 3D representations allow:
visualization of volumes
is difficult to understand volumes when they are represented by several orthographic projections
modeling of volumes
algorithms for spatial analysis of volumes are simpler if the data is in a volumetric form

Uses of 3D representations

3D representations of spatial information have several important applications:
designing major developments such as mines, quarries, dams and reservoirs
geologic exploration
scientific explanation of three dimensional processes such as ocean currents
here don''t necessarily know what is being sought
therefore, the structure of the representation can constrain the types of analyses that are performed and what is found


C. CHARACTER OF THE PHENOMENON

a major determinant of the type of representation used is the ''phenomenon-structure'' itself

three dimensional phenomena have several characteristics:

Distribution

1. Continuous
present in some quantity in all places
e.g. land, stratigraphic or piezometric surface
similar to 2D raster representations

2. Discrete
distinct objects which occupy specific locations
e.g. lithology, ore bodies, tunnels, caves
similar to 2D vector objects

Topological complexity

how the object is composed
this has a major effect on the data structure used

overhead - Topological complexity
1. Compound (One class)

composed of identical smaller objects
well casing or well log

ore body composed of smaller bodies

2. Mixed (Multiple classes)
composed of smaller, dissimilar objects
mine composed of shafts, adits, etc. which are hierarchically-arranged either adjacent to, or wholly within each other

3. Interpenetrating (Multiple phenomena)
mixed, but objects may share subsets of each other''s volumes
large-scale structural geology
karst features intersecting with the water table and geologic structures

Geometric complexity

degree to which the representation is irregular or convoluted

involves questions of:
Accuracy


how much is required?
design applications (e.g. tunneling) - must be highly accurate
prospecting or science applications - general relations may be more important

Precision

resolution of measurement and detail of analysis
often depends on scale of examination and nature of phenomenon
may have to filter and generalize to reduce storage and computation burden

TEMPORAL AND THREE-DIMENSIONAL REPRESENTATIONS
D. METHODS OF REPRESENTATION

specific approach taken is a function of:
user''s needs and capabilities
character, distribution and complexity of phenomenon
characteristics of the data available, or the means to collect it

2 1/2 dimensions

1. Single-valued surfaces
single z (elevation) value for each coordinate pair
usually continuous distributions
topologic complexity is low
geometric complexity can be high
defines a surface with no thickness
usually displayed with isolines

geometrically 3-D, but topologically 2-D
suited to visualization and some modeling
available in many mapping and statistical packages

2. Multi-valued surfaces and volumes
more than one z-value for each x,y pair
usually continuous distributions

topological complexity is low
geometric complexity can be high
can be displayed with isolines
may become difficult to comprehend
often subjected to geostatistical analysis in prospecting and scientific research
not as widely available in turnkey systems

True three dimensional representations

1. Boundary representations (B-reps) overhead - B-Rep of a Cave Passage
objects are defined as polyhedra bounded by planes or faces
can be displayed with hidden line removal for easier comprehension
each object can be represented by a number of: faces - flat planes, usually triangular (a mixture of rectangles and hexagons on the overhead) edges - define the edges of the faces (3 per triangular face) points - define the ends of the edges (2 per edge)
suited to discrete objects
topological complexity can be high
geometric complexity can be high
well suited to design, some exploration and explanation applications
widely available in CAD systems
the TIN is a type of B-rep, constrained to be single-valued (i.e. one value of z for every x,y)
requires a powerful user-interface to construct combinatorially-complex objects
each part of the B-rep (planes, edges, points) must be carefully and consistently defined for each application in order to maintain validity
performance degrades rapidly with high geometric complexity

2. Spatial occupancy enumeration (SOE) overhead - SOE of a Mine/Quarry
volume is divided into cubes or voxels
can have on (full) or off (empty) status
or, can have attribute values
vertical resolution is often different (higher) from horizontal resolution, e.g. modeling the atmosphere
objects can be displayed as positive (casts) or negative (molds)
suited to discrete objects or continuous distributions
combinatorial complexity can be very high

geometric complexity can be high, within limits of voxel resolution
suited to exploration and explanation applications, also analytical operations in design
some systems exist for mine modeling, also medical applications
usually produced by converting from B-reps (similar to converting vectors to rasters in 2D)
properties like mass, volume and surface area are quickly computed as Boolean operations or voxel counts
these can be indexed using octtrees (also octrees) overhead - Octree
is an extension of the quadtree concept to 3 dimensions
cells are numbered by starting on one level and using the same pattern as a quadtree, then moving up a level and continuing with the same pattern but numbering from 4 to 7

Summary

these 3D representations are relatively new, so there is little collective experience on how to implement them in the earth sciences and engineering

it may be easiest to utilize technology developed in other fields (mechanical engineering, medicine) and adapt to needs
however, the needs of medical imaging are different from earth science
medical imaging technology is not designed for modelling, it does not need analytical tools for abstraction and interpretation that earth science applications do
medical imaging is time dependent (it is usually necessary to track moving objects between one 3D image and the next) while many earth science applications do not require this


E. TIME DEPENDENCE

time dependence adds a third dimension to spatial data, just as the vertical dimension does
Hagerstrand (1970) has used the vertical dimension to visualize movement in human systems - movements in the plane become trajectories in three dimensions

suggested overhead - Time dependence as a third dimension based on Hagerstrand model (not supplied, see for example Haggett, P., A.D. Cliff, A. Frey, 1977, Locational Models, Edward Arnold, London, p. 16)

computer science deals with time dependence of records in databases
records may be valid only for limited times

the geographical cases are more complex - objects may have limited existence, but may also move, change shape, and change attributes

similarly to the 3D case, the set of database models for time dependent data has not been fully developed

Possible models

1. boundaries of reporting zones change through time
since the boundaries turn on and off rather than move, the solution is to store all boundary lines which ever existed, then to reconstruct objects from the
boundaries at any given time

e.g. Great American History Project stored boundaries for all definitions of US counties since the early 19th Century, reconstructed counties at any Census year from selected boundary pieces (see Unit 30)

2. attributes of objects change through time - define a limited number of time "slices", and store the attributes as separate tables for each time slice
if attributes are needed between time slices, interpolate

3. shapes of objects change through time
define time slices, and store the objects at each slice
may be difficult to identify objects from one time slice to the next because objects may coalesce or split - e.g. kelp beds in the ocean off the coast of California
may be easier to avoid identifying objects, and store classified but unrelated rasters at each time - equivalent to the SOE or voxel solution to 3D data
alternatively, use a 3D space with the vertical dimension as time, populated by 3D objects, e.g. the lines in Hagerstrand''s diagrams - equivalent to B- reps
attach attributes to these objects
if the attributes change through time, we have a problem similar to that of continuously varying attributes on transportation networks

Summary

the main issue is the extent to which objects should be identified - either in 2 or 3D

solutions vary from one extreme of no objects to the other of fully 3D objects:
no objects at all - voxels
e.g. remotely sensed images
objects at each time slice, but unrelated from one time to another - layers
e.g. GAHP US counties
objects at each time slice, related or tracked from one time to another - related layers
e.g. migration data
objects defined continuously in the time dimension - 3D objects
e.g. individual space-time travel behavior

REFERENCES
Bouille, F. 1976. "A Model of Scientific Data Bank and Its Applications to Geological Data," Computers and Geosciences 2: 279-291.

Carter, J.R. 1988. "Digital Representations of Topographic Surfaces: An Overview," in American Congress on Surveying and Mapping and American Society for Photogrammetry and Remote Sensing, Technical Papers 5:54-60.

Ganter, J.H. 1989. "A Comparison of Representations for Complex Earth Volumes," Auto-Carto 9: Proceedings of the Ninth International Symposium on Computer-Assisted Cartography, Baltimore, MD.

Hagerstrand, T., 1970. "What about people in Regional Science?" Papers, Regional Science Association 24:7-21. Discusses the concept of space-time prisms in human spatial behavior.

Kavouras, M. and S. Masry, 1987. "An Information System for Geosciences: Design Considerations," Auto-Carto 8: Proceedings of the Eighth International Symposium on Computer-Assisted Cartography, ASPRS/ACSM, Falls Church, VA, pp. 282-291.

Langran, G., 1989. "A review of temporal database research and its use in GIS applications," International Journal of Geographical Information Systems 3(3):215-32. Can research on time dependence in databases help in representing the effects of time in GIS data?

Langran, G. and N.R. Chrisman, 1988. "A framework for temporal geographic information," Cartographica 25(3):1- 14. Discusses models for representing temporal change in GIS.

Mark, D.M. and J.A. Cebrian, 1986. "Oct-trees: A Useful Data- Structure for the Processing of Topographic and Sub- Surface Data," ACSM/ASPRS Technical Papers 1:104-113.

Raper, Jonathan. Three Dimensional Applications in GIS, Taylor and Francis, New York. A collection of papers on the developing technology of 3D GIS.

Requicha, A.A.G. 1980. "Representations for Rigid Solids: Theory, Methods and Systems," ACM Computing Surveys 12:437-464.

DISCUSSION AND EXAM QUESTIONS

1. You are directed to build a 3-D representation for (a) a civil engineer; (b) a petroleum geologist; (c) a hydrogeologist; (d) a meteorologist. What kinds of questions might you ask each specialist about their job and what they want from the representation?

2. How do the distributions considered by each of the individuals (above) compare?

3. What would be the advantages and limitations of surface representations, B-reps and SOE in different disciplines and applications?

4. Many people have assumed that the problems of moving from 2D to 3D in GIS are comparable to those of moving from 2D to time-dependent 2D. Do you agree?

NEXT PAGE
Hosted by uCoz