Народ.Ру: DATABASE MANAGEMENT (next)

DATABASE MANAGEMENT (next)

DATABASE CONCEPTS I
A. INTRODUCTION
Two ways to use DBMS within a GIS
GIS as a database problem

B. CONCEPTS IN DATABASE SYSTEMS
Definition
Advantages of a database approach
Views of the database

C. DATABASE MANAGEMENT SYSTEMS
Components
Types of database systems

D. HIERARCHICAL MODEL
Summary of features
Advantages and disadvantages

E. NETWORK MODEL
Restrictions
Summary

F. RELATIONAL MODEL
Terminology
Examples of relations
Keys
Normalization
Advantages and disadvantages

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

UNIT 43 - DATABASE CONCEPTS I

Compiled with assistance from Gerald White, California State University, Sacramento

A. INTRODUCTION

very early attempts to build GIS began from scratch, using very limited tools like operating systems and compilers

more recently, GIS have been built around existing database management systems (DBMS)
purchase or lease of the DBMS is a major part of the system''s software cost
the DBMS handles many functions which would otherwise have to be programmed into the GIS

any DBMS makes assumptions about the data which it handles
to make effective use of a DBMS it is necessary to fit those assumptions
certain types of DBMS are more suitable for GIS than others because their assumptions fit spatial data better

Two ways to use DBMS within a GIS

1. Total DBMS solution
all data are accessed through the DBMS, so must fit the assumptions imposed by the DBMS designer

2. Mixed solution
some data (usually attribute tables and relationships) are accessed through the DBMS because they fit the model well
some data (usually locational) are accessed directly because they do not fit the DBMS model

GIS as a database problem

some areas of application, notably facilities management:
deal with very large volumes of data
often have a DBMS solution installed before the GIS is considered

the GIS adds geographical access to existing methods of search and query

such systems require very fast response to a limited number of queries, little analysis

in these areas it is often said that GIS is a "database problem" rather than an algorithm, analysis, data input or data display problem

B. CONCEPTS IN DATABASE SYSTEMS
Definition

a database is a collection of non-redundant data which can be shared by different application systems
stresses the importance of multiple applications, data sharing
the spatial database becomes a common resource for an agency

implies separation of physical storage from use of the data by an application program, i.e. program/data independence
the user or programmer or application specialist need not know the details of how the data are stored
such details are "transparent to the user"

changes can be made to data without affecting other components of the system. e.g.
change format of data items (real to integer, arithmetic operations)
change file structure (reorganize data internally or change mode of access)
relocate from one device to another, e.g. from optical to magnetic storage, from tape to disk

Advantages of a database approach

reduction in data redundancy
shared rather than independent databases
reduces problem of inconsistencies in stored information, e.g. different addresses in different departments for the same customer

maintenance of data integrity and quality

data are self-documented or self-descriptive
information on the meaning or interpretation of the data can be stored in the database, e.g. names of items, metadata

avoidance of inconsistencies
data must follow prescribed models, rules, standards

reduced cost of software development
many fundamental operations taken care of, however DBMS software can be expensive to install and maintain

security restrictions
database includes security tools to control access, particularly for writing

Views of the database

overhead - Views of the database

the database can present different views of itself to users, programmers
these are built and maintained by the database administrator (DBA)

the internal data representation (internal view) is normally not seen by the user or applications programmer

the conceptual view or conceptual schema is the primary means by which the DBA builds and manages the database

the DBMS can present multiple views of the conceptual schema to programmers and users, depending on the application
these are called external views or schemas

overhead - Water district database

C. DATABASE MANAGEMENT SYSTEMS
Components

Data types
includes:
integer (whole numbers only)
real (decimal)
character (alphabetic and numeric characters)
date
more advanced systems may include pictures and images as data types
e.g. a database of buildings for the fire department which stores a picture as well as address, number of floors, etc.

Standard operations
e.g. sort, delete, edit, select records

Data definition language (DDL)
the language used to describe the contents of the database
e.g. attribute names, data types - "metadata"

Data manipulation and query language
the language used to form commands for input, edit, analysis, output, reformatting etc.

some degree of standardization has been achieved with SQL (Standard Query Language)

Programming tools
besides commands and queries, the database should be accessible directly from application programs through e.g. subroutine calls

File structures
the internal structures used to organize the data

Types of database systems

several models for databases:
tabular ("flat file") - data in a single table
hierarchical
network
relational

the hierarchical, network and relational models all try to deal with the same problem with tabular data:
inability to deal with more than one type of object, or with relationships between objects
e.g. database may need to handle information on aircraft, crew, flights and passengers - four types of records with different attributes, but with relationships between them (e.g. "is booked on" between passenger and flight)

database systems originated in the late 1950s and early 1960s largely by research and development of IBM Corporation

most developments were responses to needs of business, military, government and educational institutions - complex organizations with complex data and information needs

trend through time has been increasing separation between the user and the physical representation of the data - increasing "transparency"

DATABASE CONCEPTS I
D. HIERARCHICAL MODEL

early 1960s, IBM saw business world organizing data in the form of a hierarchy

rather than one record type (flat file), a business has to deal with several types which are hierarchically related to each other
e.g. company has several departments, each with attributes: name of director, number of staff, address

each department requires several parts to make its product, with attributes: part number, number in stock
each part may have several suppliers, with attributes: address, price

diagram

certain types of geographical data may fit the hierarchical model well
e.g. Census data organized by state, within state by city, within city by census tract

diagram

the database keeps track of the different record types, their attributes, and the hierarchical relationships between them

the attribute which assigns records to levels in the database structure is called the key (e.g. is record a department, part or supplier?)

Summary of features

a set of record "types"
e.g. supplier record type, department record type, part record type

a set of links connecting all record types in one data structure diagram (tree)

at most one link between two record types, hence links need not be named
for every record, there is only one parent record at the next level up in the tree
e.g. every county has exactly one state, every part has exactly one department

no connections between occurrences of the same record type
cannot go between records at the same level unless they share the same parent

diagram

Advantages and disadvantages

data must possess a tree structure
tree structure is natural for geographical data

data access is easy via the key attribute, but difficult for other attributes
in the business case, easy to find record given its type (department, part or supplier)
in the geographical case, easy to find record given its geographical level (state, county, city, census tract), but difficult to find it given any other attribute
e.g. find the records with population 5,000 or less

tree structure is inflexible
cannot define new linkages between records once the tree is established
e.g. in the geographical case, new relationships between objects
cannot define linkages laterally or diagonally in the tree, only vertically
the only geographical relationships which can be coded easily are "is contained in" or "belongs to"

DBMSs based on the hierarchical model (e.g. System 2000) have often been used to store spatial data, but have not been very successful as bases for GIS

E. NETWORK MODEL

developed in mid 1960s as part of work of CODASYL (Conference on Data Systems Languages) which proposed programming language COBOL (1966) and then network model (1971)
other aspects of database systems also proposed at this time include database administrator, data security, audit trail

objective of network model is to separate data structure from physical storage, eliminate unnecessary duplication of data with associated errors and costs

uses concept of a data definition language, data manipulation language

uses concept of m:n linkages or relationships
an owner record can have many member records
a member record can have several owners
hierarchical model allows only 1:n

example of a network database
a hospital database has three record types:
patient: name, date of admission, etc.
doctor: name, etc.
ward: number of beds, name of staff nurse, etc.
need to link patients to doctor, also to ward
doctor record can own many patient records
patient record can be owned by both doctor and ward records

network DBMSs include methods for building and redefining linkages, e.g. when patient is assigned to ward

Restrictions

links between records of the same type are not allowed

while a record can be owned by several records of different types, it cannot be owned by more than one record of the same type (patient can have only one doctor, only one ward)

Summary

the network model has greater flexibility than the hierarchical model for handling complex spatial relationships

it has not had widespread use as a basis for GIS because of the greater flexibility of the relational model

F. RELATIONAL MODEL

the most popular DBMS model for GIS
the INFO in ARC/INFO
EMPRESS in System/9
several GIS use ORACLE
several PC-based GIS use DBase III

flexible approach to linkages between records comes closest to modeling the complexity of spatial relationships between objects

proposed by IBM researcher E.F. Codd in 1970

more of a concept than a data structure

internal architecture varies substantially from one RDBMS to another

Terminology

each record has a set of attributes
the range of possible values (domain) is defined for each attribute

records of each type form a table or relation
each row is a record or tuple
each column is an attribute

note the potential confusion - a "relation" is a table of records, not a linkage between records

the degree of a relation is the number of attributes in the table
1 attribute is a unary relation
2 attributes is a binary relation
n attributes is an n-ary relation

Examples of relations
unary: COURSES(SUBJECT) binary: PERSONS(NAME,ADDRESS) OWNER(PERSON NAME,HOUSE ADDRESS) ternary: HOUSES(ADDRESS,PRICE,SIZE)

Keys

a key of a relation is a subset of attributes with the following properties:
unique identification
the value of the key is unique for each tuple
nonredundancy
no attribute in the key can be discarded without destroying the key''s uniqueness
e.g. phone number is a unique key in a phone directory
in the normal phone directory the key attributes are last name, first name, street address
if street address is dropped from this key, the key is no longer unique (many Smith, John''s)

a prime attribute of a relation is an attribute which participates in at least one key
all other attributes are non-prime

Normalization

concerned with finding the simplest structure for a given set of data
deals with dependence between attributes
avoids loss of general information when records are inserted or deleted

overhead - Normalization

consider the first relation (prime attribute underlined):
this is not normalized since PRICE is uniquely determined by STYLE

problems of insertion and deletion anomalies arise
the relationship between ranch and 50000 is lost when the last of the ranch records is deleted
a new relationship (triplex costing 75000) must be inserted when the first triplex record occurs

consider the second relation:
here there are two relations instead of one
one to establish style for each builder
the other price for each style

several formal types of normalization have been defined - this example illustrates third normal form (3NF), which removes dependence between non-prime attributes

although normalization produces a consistent and logical structure, it has a cost in increased storage requirements
some GIS database administrators avoid full normalization for this reason

a relational join is the reverse of this normalization process, where the two relations HOMES2 and COST are combined to form HOMES1

Advantages and disadvantages

the most flexible of the database models

no obvious match of implementation to model - model is the user''s view, not the way the data is organized internally

is the basis of an area of formal mathematical theory

most RDBMS data manipulation languages require the user to know the contents of relations, but allow access from one relation to another through common attributes
Example: Given two relations: PROPERTY(ADDRESS,VALUE,COUNTY_ID) COUNTY(COUNTY ID,NAME,TAX_RATE)

to answer the query "what are the taxes on property x" the user would:
retrieve the property record
link the property and county records through the common attribute COUNTY_ID
compute the taxes by multiplying VALUE from the property tuple with TAX_RATE from the linked county tuple

REFERENCES
Standard database texts:

Date, G.J., 1987. An Introduction to Database Systems, Addison-Wesley, Reading, MA.

Howe, D.R., 1983. Data Analysis for Data Base Design, Arnold, London.

Kent, W., 1983. "A simple guide to five normal forms in relational database theory," Communications of the Association for Computing Machinery 26:120.

Tsichritzis, D.C. and F.H. Lochovsky, 1977, Database Management Systems, Academic Press, New York.

The relational model in GIS:

van Roessel, J.W., 1987. "Design of a spatial data structure using the relational normal forms," International Journal of Geographical Information Systems 1:33-50.

EXAM AND DISCUSSION QUESTIONS

1. Compare the four database models (flat file, hierarchical, network and relational) as bases for GIS. What particular features of the relational model account for its popularity?

2. Polygon overlay has been called a spatial analog of a relational join. Do you agree?

3. Summarize the arguments against organizing spatial databases as flat files.

4. Why do you think the term "relation" was chosen for a table of attributes in the relational model?

DATABASE CONCEPTS II
A. INTRODUCTION
Databases for spatial data
The relational model in GIS

B. DATA SECURITY
Integrity constraints
Transactions

C. CONCURRENT USERS
Three types of concurrent access
Checkout/checkin
Determining extent of data locking
Deadlock

D. SECURITY AGAINST DATA LOSS

E. UNAUTHORIZED USE
Summary

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

UNIT 44 - DATABASE CONCEPTS II

Compiled with assistance from Gerald White, California State University, Sacramento

A. INTRODUCTION

setting up and maintaining a spatial database requires careful planning, attention to numerous issues

many GIS were developed for a research environment of small databases
many database issues like security not considered important in many early GIS
difficult to grow into an environment of large, production-oriented systems

Databases for spatial data

many different data types are encountered in geographical data, e.g. pictures, words, coordinates, complex objects

very few database systems have been able to handle textual data
e.g. descriptions of soils in the legend of a soil map can run to hundreds of words
e.g. descriptions are as important as numerical data in defining property lines in surveying - "metes and bounds" descriptions

variable length records are needed, often not handled well by standard systems
e.g. number of coordinates in a line can vary
this is the primary reason why some GIS designers have chosen not to use standard database solutions for coordinate data, only for attribute tables

standard database systems assume the order of records is not meaningful
in geographical data the positions of objects establish an implied order which is important in many operations
often need to work with objects that are adjacent in space, thus it helps to have these objects adjacent or close in the database
is a problem with standard database systems since they do not allow linkages between objects in the same record type (class)

there are so many possible relationships between spatial objects, that not all can be stored explicitly
however, some relationships must be stored explicitly as they cannot be computed from the
geometry of the objects, e.g. existence of grade separation at street crossing

the integrity rules of geographical data are too complex
e.g. the arcs forming a polygon must link into a complete boundary
e.g. lines cannot cross without forming a node

effective use of non-spatial database management solutions requires a high level of knowledge of internal structure on the part of the user
e.g. user may need to be aware that polygons are composed of arcs, and stored as arc records, cannot treat them simply as objects and let the system take care of the internal structure
users are required to have too much knowledge of the database model, cannot concentrate on knowledge of the problem
users may have to use complex commands to execute processes which are conceptually simple

The relational model in GIS

the relational model captures geographical reality through a set of tables (relations) linked by keys (common fields or attributes)
each table contains a set of records (tuples)
tables are normalized to minimize redundancy of information, maximize integrity

in general, the relational model is a convenient way to represent reality
each table corresponds to a set of real-world features with common types of attributes
the user needs to know which features are stored in which tables

however the relational model has certain deficiencies for spatial data
many implementations (e.g. ARC/INFO) store only the attribute tables in the relational model, since it is less straightforward to store the geometrical descriptions of objects - such systems have been called "hybrid"
most spatial operations are not part of the standard query language of RDBMSs, e.g. find objects within a user-defined polygon, e.g. overlay, e.g. buffer zone generation
the relational model does not deal easily and efficiently with the concept of complex objects (objects formed by aggregating simple objects) - this concept is more compatible with the hierarchical data model

DATABASE CONCEPTS II
DATA SECURITY

many systems for small computers, and systems specializing in geometric and geographical data, do not provide functionality necessary to maintain data integrity over long periods of time

Integrity constraints

integrity constraints are rules which the database must obey in order to be meaningful
attribute values must lie within prescribed domains
relationships between objects must not conflict, e.g. "flows into" relationship between river segments must agree with "is fed by" relationship
locational data must not violate rules of planar enforcement, contours must not cross each other, etc.

Transactions

transactions may include:
modifications to individual data items
addition or deletion of entire records

addition or deletion of attributes
changes in schema (external views of the database)
e.g. addition of new tables or relations, redefinition of access keys

all of the updates or modifications made by a user are temporary until confirmed
system checks integrity before permanently modifying the database ("posting" the changes to the database)
updates and changes can be abandoned at any time prior to final confirmation

C. CONCURRENT USERS

in many cases more than one user will need to access the database at any one time
this is a major advantage of multi-user systems and networks

however, if the database is being modified by several users at once, it is easy for integrity constraints to be violated unless adequate preventative measures exist

changes may interfere and produce loss of integrity
e.g. user B may change an object while user A is processing it
the results will not be valid for either the old or the new version of the object
e.g. a dispatching system
operator A receives a fire call, sends a request to fire station 1 to dispatch a vehicle, waits for fire station to confirm
operator B receives a fire call after A''s call but before A confirms the dispatch
result may be that both A and B request a dispatch of the same fire truck
solution should be to "lock" the first request until confirmed

automatic control of concurrent use is based on the transaction concept
the database is modified only at the end of a transaction
concurrent users never see the effects of an incomplete transaction
interference between two concurrent users is resolved at the transaction level

Three types of concurrent access

unprotected - applications may retrieve and modify concurrently

in practice, no system allows this, but if one did, system should provide a warning that other users are accessing the data

protected - any application may retrieve data, but only one may modify it
e.g. user B should be able to query the status of fire trucks even after user A has placed a "hold" on one

exclusive - only one application may access the data

Checkout/checkin

in GIS applications, digitizing and updating spatial objects may require intensive work on one part of the database for long periods of time
e.g. digitizer operator may spend an entire shift working on one map sheet
work will likely be done on a workstation operating independently of the main database

because of the length of transactions, a different method of operation is needed

at beginning of shift, operator "checks out" an area from the database

at end of work, the same area is "checked in", modifying and updating the database

while an area is checked out, it should be "locked" by the main database
this will allow other users to read the data, but not to check it out themselves for modification
this resolves problems which might occur
e.g. user A checks out a sheet at 8:00 am and starts updating
user B checks out the same sheet at 9:00 am and starts a different set of updates from the same base
if both are subsequently allowed to check the sheet back in, then the second checkin may try to modify an object which no longer exists

the area is unlocked when the new version is checked in and modifies the database

the amount of time required for checkout and checkin must be no more than a small part of a shift

Determining extent of data locking

how much data needs to be locked during a transaction?

changing one item may require other changes as well, e.g. in indexes
in principle all data which may be affected by a transaction should be locked
it may be difficult to determine the extent of possible changes

e.g. in a GIS
user is modifying a map sheet
because objects on the sheet are "edgematched" to objects on adjacent sheets, contents of adjacent sheets may be affected as well
e.g. if a railroad line which extends to the edge of the mapsheet is deleted, should its continuation on the next sheet be affected? if not, the database will no longer be effectively edgematched
should adjacent sheets also be locked during transaction?

levels of data locking:
entire database level
"view" level
lock only those parts of the database which are relevant to the application''s view
record type level
lock an entire relation or attribute table
record occurrence level
lock a single record
data item level
lock only one data item

Deadlock

is when a request cannot continue processing

normally results from incremental acquisition of resources

e.g. request A gets resource 1, request B gets resource 2
request A now asks for resource 2, B asks for resource 1
A and B will wait for each other unless there is intervention

e.g. user A checks out an area from a spatial database, thereby locking the contents of the area and related contents
user B now attempts a checkout - some of the contents of the requested area have already been locked by A
therefore, the system must unlock all of B''s requests and start again - B will wait until A is finished

this allows other users who need the items locked by B to proceed
however, this can lead to endless alternating locking attempts by B and another user - the "accordion" effect as they encounter collisions and withdraw
it can be very difficult for a DBMS to sense these effects and deal with them

D. SECURITY AGAINST DATA LOSS

the cost of creating spatial databases is very high, so the investment must be protected against loss
loss might occur because of hardware or software failure

operations to protect against loss may be expensive, but the cost can be balanced against the value of the database

because of the consequences of data loss in some areas (air traffic control, bank accounts) very secure systems have been devised

the database must be backed up regularly to some permanent storage medium, e.g. tape
all transactions since the last backup must be saved in case the database has to be regenerated
unconfirmed transactions may be lost, but confirmed ones must be saved

two types of failure:
interruption of the database management system because of operating errors, failure of the operating system or hardware, or power failures
these interruptions occur frequently - once a day to once a week
contents of main memory are lost, system must be "rebooted"
contents of database on mass storage device are usually unaffected
loss of the storage medium, due to operating or hardware defects ("head crashes"), or interruption during transaction processing
these occur much less often, slower recovery is acceptable
database is regenerated from most recent backup, plus transaction log if available

E. UNAUTHORIZED USE

some GIS data is confidential or secret, e.g. tax records, customer lists, retail store performance data

contemporary system interconnections make unauthorized access difficult to prevent
e.g. "virus" infections transmitted through communication networks

different levels of security protection may be appropriate to spatial databases:
keeping unauthorized users from accessing the database - a function of the operating system
limiting access to certain parts of the database
e.g. census users can access counts based on the census, but not the individual census questionnaires (note: Sweden allows access to individual returns)
restricting users to generalized information only
e.g. products from some census systems are subjected to random rounding - randomly changing the last digit of all counts to 0 or 5 - to protect confidentiality

Summary

flexibility, complexity of many GIS applications often makes it difficult to provide adequate security

REFERENCES
Standard database texts listed under unit 43

Abel, D.J., 1989. "SIRO-DBMS: a database tool-kit for geographical information systems," International Journal of Geographical Information Systems 3:103-116. An extension of the relational model for spatial data.

Frank, A.U., 1984. "Requirements for database systems suitable to manage large spatial databases," Proceedings, International Symposium on Spatial Data Handling, University of Zurich, pp. 38-60.

Nyerges, T.L., 1989. "Schema integration analysis for the development of GIS databases," International Journal of Geographical Information Systems 3:153-184. Looks at formal procedures for comparing and merging spatial database schemas.

EXAM AND DISCUSSION QUESTIONS

1. In what ways are the database issues of GIS different from those of databases generally?

2. What is meant by data integrity in a spatial database? Give examples.

3. Give examples of the ways in which the integrity of a spatial database can degrade without adequate access controls.

4. Examine the database access controls which exist in any GIS to which you have access. Would they be adequate for a large, production-oriented agency application?

ACCURACY OF SPATIAL DATABASES
A. INTRODUCTION

B. DEFINITIONS
Accuracy
Precision
Components of data quality

B. POSITIONAL ACCURACY
How to test positional accuracy?

C. ATTRIBUTE ACCURACY
How to test attribute accuracy?
How to summarize the matrix?

E. LOGICAL CONSISTENCY

F. COMPLETENESS

G. LINEAGE

H. ERROR IN DATABASE CREATION
Positional measurement error
Attribute errors
Compilation errors
Processing errors

I. DATA QUALITY REPORTS
USGS
British Ordnance Survey
US National standards

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES

UNIT 45 - ACCURACY OF SPATIAL DATABASES

Compiled with assistance from Nicholas R. Chrisman, University

of Washington and Matt McGranaghan, University of Hawaii

A. INTRODUCTION

the course thus far has looked at technical issues in:
georeferencing, i.e. describing locations
data structures - how to create digital representations of spatial data
algorithms - how to process these digital representations to generate useful results

among other technical issues in GIS, accuracy is perhaps the most important - it covers concerns for data quality, error, uncertainty, scale, resolution and precision in spatial data and affects the ways in which it can be used and interpreted

all spatial data is inaccurate to some degree but it is generally represented in the computer to high precision

need to consider:
how well do these digital structures represent the real world?
how well do algorithms compute the true values of products?

B. DEFINITIONS
Accuracy

defined as the closeness of results, computations or estimates to true values (or values accepted to be true)
since spatial data is usually a generalization of the real world, it is often difficult to identify a true value, and we work instead with values which are accepted to be true
e.g., in measuring the accuracy of a contour in a digital database, we compare to the contour as drawn on the source map, since the contour does not exist as a real line on the surface of the earth

the accuracy of the database may have little relationship to the accuracy of products computed from the database
e.g. the accuracy of a slope, aspect or watershed computed from a DEM is not easily related to the accuracy of the elevations in the DEM itself

Precision

defined as the number of decimal places or significant digits in a measurement
precision is not the same as accuracy - a large number of significant digits doesn''t necessarily indicate that the measurement is accurate
a GIS works at high precision, mostly much higher than the accuracy of the data itself

since all spatial data are of limited accuracy, inaccurate to some degree, the important questions are:
how to measure accuracy
how to track the way errors are propagated through GIS operations
how to ensure that users don''t ascribe greater accuracy to data than it deserves

Components of data quality

recently a National Standard for Digital Cartographic Data (see reference) was developed by a coordinated national effort in the US
this is a standard model to be used for describing digital data accuracy
similar standards are being adopted in other countries

this standard identifies several components of data quality:
positional accuracy
attribute accuracy
logical consistency
completeness
lineage

each of these will now be examined

B. POSITIONAL ACCURACY

defined as the closeness of locational information (usually coordinates) to the true position

conventionally, maps are accurate to roughly one line width or 0.5 mm
equivalent to 12 m on 1:24,000, or 125 m on 1:250,000 maps

within a database, a typical UTM coordinate pair might be:
Easting 579124.349m Northing 5194732.247m

if the database was digitized from a 1:24,000 sheet, the last four digits in each coordinate (units, tenths, hundredths and thousandths) would be spurious

How to test positional accuracy?

use an independent source of higher accuracy
find a larger scale map
use the Global Positioning System (GPS)
use raw survey data

use internal evidence
unclosed polygons, lines which overshoot or undershoot junctions, are indications of inaccuracy - the sizes of gaps, overshoots and undershoots may be used as a measure of positional accuracy

compute accuracy from knowledge of the errors introduced by different sources, e.g
1 mm in source document
0.5 mm in map registration for digitizing
0.2 mm in digitizing
if sources combine independently, we can get an estimate of overall accuracy by summing the squares of each component and taking the square root of the sum:
(12 + 0.52 + 0.22)0.5 = 1.14 mm

C. ATTRIBUTE ACCURACY

defined as the closeness of attribute values to their true value

note that while location does not change with time, attributes often do

attribute accuracy must be analyzed in different ways depending on the nature of the data

for continuous attributes (surfaces) such as on a DEM or TIN:
accuracy is expressed as measurement error
e.g. elevation accurate to 1 m

for categorical attributes such as classified polygons:
are the categories appropriate, sufficiently detailed and defined?
gross errors, such as a polygon classified as A when it should have been B, are simple but unlikely

e.g. land use is shopping center instead of golf course
more likely the polygon will be heterogeneous:
e.g. vegetation zones where the area may be 70% A and 30% B
worse, A and B may not be well-defined, may not be able to identify the class clearly as A or B
e.g. soils classifications are typically fuzzy
at the center of the polygon, may be confident that the class is A, but more like B at the edges

How to test attribute accuracy?

prepare a misclassification matrix as follows:
take a number of randomly chosen points
determine the class according to the database
then determine the class in the field by ground check
complete the matrix:
Class in Class on ground database A B C D A . . . . B . . . . C . . . . D . . . .

ideally, want all points to lie on the diagonal of the matrix - this indicates that the same class was observed on the ground as is recorded in the database

an error of omission occurs when a point''s class on the ground is incorrectly recorded in the database
the number of class B points incorrectly recorded is the sum of column B row A, column B row C and column B row D, i.e. the number of points that are B on the ground but something else in the database
that is, the column sum less the diagonal cell

an error of comission occurs when the class recorded in the database does not exist on the ground
e.g. the number of errors of comission for class A is the sum of row A column B, row A column C, row A column D, i.e. the points falsely recorded as A in the database
that is, the row sum less the diagonal cell

How to summarize the matrix?

the percent of cases correctly classified is often used
this is the percent of cases located in the diagonal cells of the matrix
however, even in the worst case we would expect some cases in the diagonal cells by chance

an index kappa (Cohen''s kappa) adjusts for this by subtracting the number expected by chance
the number expected by chance in each diagonal cell is found by multiplying the appropriate row and column totals and dividing by the total number of cases

overhead - Calculating Kappa

then:
k = (d-q)/(N-q)

where d is the number of cases in diagonal cells q is the number of cases expected in diagonal cells by chance N is the total number of cases

kappa is 1 for perfectly accurate data (all N cases on the diagonal), zero for accuracy no better than chance

compare a map with a few large polygons to one with a large number of smaller polygons
is it easier to get a high kappa in the first case?
if so, is there a way of adjusting kappa to account for this difference?

we expect attribute accuracy to vary over the map, so it would be useful to have an indication of the spatial variation in misclassification probability, not just a summary statistic

the remaining aspects of data quality apply to the database as a whole, rather than to the objects, attributes or coordinates within it

ACCURACY OF SPATIAL DATABASES
E. LOGICAL CONSISTENCY

refers to the internal consistency of the data structure, particularly applies to topological consistency
is the database consistent with its definitions?
if there are polygons, do they close?
is there exactly one label within each polygon?
are there nodes wherever arcs cross, or do arcs sometimes cross without forming nodes?

F. COMPLETENESS

concerns the degree to which the data exhausts the universe of possible items

are all possible objects included within the database?
affected by rules of selection, generalization and scale

G. LINEAGE

a record of the data sources and of the operations which created the database
how was it digitized, from what documents?
when was the data collected?
what agency collected the data?
what steps were used to process the data?
precision of computational results

is often a useful indicator of accuracy

H. ERROR IN DATABASE CREATION

error is introduced at almost every step of database creation
what are these steps, and what kinds of error are introduced?

Positional measurement error

Geodetic Control and GPS
the most accurate basis of absolute positional data is the geodetic control network, a series of points whose positions are known with high precision
however, it is often difficult to tie a dataset to one of these high quality monuments
global positioning systems is a powerful way of augmenting the geodetic network

Aerial Photography and Satellite Imagery
most positional data is derived from air photos
here accuracy depends on the establishment of good control points
data from remote sensing is more difficult to position accurately because of the size of each pixel

Text Descriptions
some positional data comes from text descriptions
old surveys tied in to marks on trees
boundary follows watershed, or midline of river
this type of source is often of very poor positional accuracy

Digitizing

digitizers encode manuscript lines as sets of x-y coordinate pairs
see Units 7 and 13 for introductions to digitizing

resolution of coordinate data is dependent on mode of digitizing:
point-mode
digitizing operator specifically selects and encodes those points deemed "critical" to represent the geomorphology of the line or politically-significant coordinate pairs
requires intelligence, knowledge about the line representation that will be needed
stream-mode
digitizing device automatically selects points on a distance or time parameter
generally, an unnecessary high density of coordinate pairs is selected.

two types of errors normally occur in stream-mode digitizing:
physiological errors are caused by involuntary muscular spasms that tend to parallel the longitudinal axis of the centerline
these errors are caused by agitations as the operator''s hand twitches and jerks when digitizing
three specific types may be identified: spikes, switchbacks and polygonal knots (loops)

diagram

these are fairly simple to remove automatically
software has been developed to clean the initial digital data of duplicate coordinate pairs and simple physiological errors
a related problem in point mode digitizing is duplicate coordinate pairs which occur when the button is hit twice
psychological errors are caused by psychomotor problems in line-following
the digitizing operator either cannot see the line or cannot properly move the crosshairs along the line
results in the diagonal line being displaced laterally from the intended position

may also involve misinterpretation, too much generalization
these are not easy to remove automatically

in spite of the above, digitizing itself is not a major source of positional error
it is not difficult for a digitizer operator to follow a line to an accuracy equal to the line''s width
typical error in 0.5 mm range

a common test of digitizing accuracy is to compare the original line with its digitized and plotted version, and to see if daylight can be seen between the two
errors in registration and control points affect the entire dataset
errors are also introduced because of poor stability of base material
paper can shrink and stretch significantly (as much as 3%) with change in humidity

Coordinate transformation
coordinate transformation introduces error, particularly if the projection of the original document is unknown, or if the source map has poor horizontal control

Attribute errors

attributes are usually obtained through a combination of field collection and interpretation

categories used in interpretation may not be easy to check in the field
concepts of "diversity" and "old growth" used in current forest management practice are highly subjective

attributes obtained from air photo interpretation or classified satellite images may have high error rates

for social data, the major source of inaccuracy is undercounting
e.g. in the Census undercount rates can be very high (>10%) in some areas and in some social groups

Compilation errors

common practices in map compilation introduce further inaccuracies:
generalization
aggregation
line smoothing

separation of features
e.g. railroad moved on map so as not to overlap adjacent road

however, many of these may also be seen as improving the usefulness and meaning of the data

Processing errors

processing of data produces error
misuse of logic
generalization and problems of interpretation
mathematical errors
accuracy lost due to low precision computations
rasterization of vector data
e.g., true line position is somewhere in the cell
boundary cells may actually contain parts of all adjacent cells

I. DATA QUALITY REPORTS

because there are so many, diverse sources of error it is probably not possible to measure the error introduced at each step independently - the strategy of combining errors arithmetically probably won''t work

USGS

require that no more the 10% of the points tested be in error by more than 1/30 inch, measured at publication scale (scale >1:20,000)

question are "How far out are the 10%?" "Where are the 10%?"
e.g. in a particularly bad case, all of the 10% might be accounted for by one boundary line which is out by several inches

British Ordnance Survey

carry out an ongoing accuracy assessment and re-survey

to verify a survey, a large number of points (typically n = 150 to 500 of a single type) are used to calculate:
root mean total square displacement:
e = / ( S (xi2) / n) where xi is displacement at each point i

systematic error:
s = S (xi)/n

standard error:
se = / (e2 - s2)

if the error is "excessive", then the survey is carefully reviewed
see Merchant (1987) for an implementation

US National standards

National Map Accuracy Standards from the Bureau of the Budget, 1947
not completed

current standards developed by the National Committee for Digital Cartographic Data Standards
chaired by Hal Moellering

purpose: to set standards for compatibility of:
definitions of cartographic objects
interchange formats
DATA QUALITY documentation

dates: 1982 Jan NCDCDS formed 1985 Jan Interim Proposed Standard 1988 Jan Proposed Standard 1988 Testing in the field
handout - Interim Proposed Standard for Digital Cartographic Data Quality (2 pages)

REFERENCES
Bureau of the Budget, 1947. National Map Accuracy Standards, Washington DC, GPO, reprinted in M.M.Thompson, 1979, Maps for America, USGS, Reston VA, p 104.

Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment, Clarendon Press, Oxford. See pp. 103-135.

DCDSTF, 1988. "The Proposed Standard for Digital Cartographic Data," The American Cartographer 15(1):entire issue.

Federal Geodetic Control Committee, 1974. Classification, Standards of Accuracy, and General Specifications of Geodetic Control Surveys, Washington DC, GPO, 1980-0- 333-276 (also NOAA--S/T 81-29).

Harley, J. B., 1975. Ordnance Survey Maps: A Descriptive Manual, Ordnance Survey, Southampton, England.

Merchant, D.C., 1987. "Spatial accuracy specification for large scale topographic maps," Photogrammetric Engineering and Remote Sensing 53:958-61. Reports a recent effort by ASPRS to revise the US National Map Accuracy Standard.

National Committee for Digital Cartographic Data Standards, Moellering, H., ed, 1985. Digital Cartographic Data Standards: An Interim Proposed Standard, Report #6.

DISCUSSION AND EXAM QUESTIONS

1. Explain the difference between accuracy and precision, and show how these ideas apply to GIS.

2. "In manual map analysis, precision and accuracy are similar, but in GIS processing, precision frequently exceeds the accuracy of the data". Discuss

3. Design an experiment to measure the accuracy achieved by an agency in its digitizing operations. How would you measure the accuracy with which lines are being digitized?

4. What is meant by data lineage, and why is it important in understanding the accuracy of spatial databases?

MANAGING ERROR
A. ERROR PROPAGATION
Example application
Error analysis
Sensitivity analysis

B. ARTIFACTS OF ERROR
Raster data
Vector data
Digitizing artifacts
Strategies used to avoid problems:
Polygon overlay artifacts

C. STORING ACCURACY INFORMATION
Raster data
Vector data
Positional uncertainty
Attribute uncertainty

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

UNIT 46 - MANAGING ERROR

A. ERROR PROPAGATION

in GIS applications we combine data from different sources, with different levels of accuracy
What impact does error in each data layer have on the final result?

Example application

Problem: find the best route for a power transmission corridor from a given origin point to a given destination point about 150 km away, across an area of the Midwest with comparatively high densities of agriculture and settlement

the study area has been divided up into about 30,000 raster cells, each 500 m on a side

have identified about 100 factors likely to influence the choice of route, including:
agricultural productivity (dollars per hectare)
settlement (presence or absence)
existing rights of way for power lines (presence or absence)

the 100 factors have been combined, or cascaded, to a single measure of suitability on a scale of 0 through 6
the cascading rules group factors into composites such as "social impact", "agricultural impact" and then weight each group against the others

the rules used in cascading include weighted addition:
suitability = w1x1 + w2x2

as well as simple conditions:

suitability = 0 if settlement = "present"

and reclassifications:

suitability = 3 if x1 = A and x2 = d suitability = 4 if x1 = B and x2 = d

Error analysis

the effects of cascading on error will be complex
do errors get worse, i.e. multiply?
do errors cancel out?
are errors in each layer independent or are they related?

suppose two maps, each with percent correctly classified of 0.90 are overlaid
studies have shown that the accuracy of the resulting map (percent of points having both of the overlaid classes) is little better than 0.90x0.90=0.81
when many maps are overlaid the accuracy of the resulting composite can be very poor

however we''re more interested in the accuracy of the composite suitability index than in the overlaid attributes themselves

for some types of operations the accuracy of suitability is determined by the accuracy of the least accurate layer
this is true if reclassification and the and operator are used extensively, or if simple conditions are used based on inaccurate layers

in other cases the accuracy of the result is significantly better than the accuracy of the least accurate layer
this is true if weighted addition is used, or if reclassification uses the or operator
e.g. suitability = 4 if x1 = A or x2 = d

Sensitivity analysis

how to determine the impact of inaccuracy on the results?

two types of answers are needed:
the impact of error on the suitability map
the impact of error on the best route

the answers will likely be very different

it will also be useful to ask the question the other way:
what accuracy is needed in each layer in order to produce a required level of accuracy in the result?

sensitivity is the response of the result (suitability, or the route location) to a unit change in one of the inputs
easy to see what a unit change means for agricultural productivity in dollars per acre, but what does it mean for vegetation class?

sensitivity can be defined for: 1. the data inputs:
how much does result change when data input changes? 2. the weights
how much does result change when the weight given to a factor changes?

error in determining weights may be just as important as error in the database

may be better to use full observed range to test sensitivity
i.e. response of the result to a change in one of the inputs from its minimum observed value to its maximum
e.g. suppose one layer is settlement (present/absent)
set the entire layer to settlement=present and recompute suitability and the best route
then set the entire layer to settlement=absent and recompute
the difference will be a measure of the sensitivity of the analysis to the settlement layer
layers which are important but nevertheless do not show geographical variation over the study area will not have high sensitivity in this definition

this serves to point up the distinction between sensitivity in principle and in practice
a layer may be important in principle, but have no impact in this study area
e.g. in principle the agricultural productivity layer may be very important in the decision framework, but if all the land is equally productive, then it will not be important in practice

in practice, only a few layers (out of our original 100) will have much impact on the final route
it is critical to know which these are in order to defend the methodology effectively (or to attack it!)
must examine both the decision rules and the value ranges to determine which layers have the highest impact in the suitability product
this information can be used in assessing the level of input accuracy that is needed
e.g. if the additional accuracy will not change the results, it may be unnecessary to carry out costly detailed surveys

can also use sensitivity analysis to assess the effects of uncertainty in the data
compute the impact of values at each end of the uncertainty range and compare the results
provides a measure of the "confidence interval" of the results

sensitivity may also refer to spatial resolution
would increasing resolution give a better result?
would cost of additional data collection at higher resolution be justified?
can we put a value on spatial resolution?

MANAGING ERROR
B. ARTIFACTS OF ERROR

artifacts are unwanted effects which result from using a high- precision GIS to process low-accuracy spatial data
usually result from positional errors, not attribute errors

Raster data

since raster data has finite resolution, determined by pixel size
as long as pixel size is greater than the positional accuracy of the data, we have no risk of unwanted effects or artifacts

Vector data

often have precision different than accuracy

significant problems occur in two areas:
digitizing
polygon overlay

Digitizing artifacts

a digitizer operator will not be able to close polygons exactly, or to form precise junctions between lines
a tolerance distance must be established, so that gaps and overshoots can be corrected (lines snapped together) as long as they fall within the tolerance distance

most digitizer operators can work easily to a tolerance of 0.02 inches or 0.5 mm

problems arise whenever the map has real detail at this resolution or finer
e.g. polygon with a narrow isthmus:

diagram

e.g. two lines close together - which one to snap to?

diagram

e.g. removing overshoot - must look back along line to form correct topology:

diagram

Strategies used to avoid problems:

essentially, we try to find a balance between:
1. asking the operator to resolve problems, which slows down the digitizing, and 2. having the system resolve problems, which requires good software and lots of CPU usage

each system establishes its own ways of avoiding or reducing these problems
some are more successful than others

1. require the user to enlarge the map photographically
increases the scale of the map while holding tolerance constant, so problem detail is now bigger than the tolerance
difficult or impossible to get error-free enlargement cheaply and easily

2. require the user to digitize each arc separately
e.g. if the following is digitized as one arc then it there is no intersection

diagram

program then only needs to check for snaps and overshoots at ends of arcs
tedious for the digitizer operator

3. require the user to identify snap points
press a different digitizer button when a point needs to be snapped
wait for system response indicating successful snap

diagram

4. have the system check for snaps continuously during digitizing

requires fast, dedicated processor
computing load gets higher as database accumulates
requires continuous display of results
no good for imported datasets

5. use rules to assist CPU in making decisions
e.g. two labels in a polygon indicates that it''s really two polygons, not one with a narrow isthmus
might use expectations about polygon shape
puts heavy load on the processor

the best current solutions use a combination of strategies 3 and 4

it is almost always useful to keep track of digitizing by marking work done on a transparent overlay
a cursor in the form of a pen is a good practical solution

Polygon overlay artifacts

covered algorithms for dealing with sliver polygons earlier

another strategy for avoiding the sliver polygon problem is to allow objects to share primitives
this departs from the database model in which every set of polygons is thought of as a different layer
e.g. suppose a woodlot (polygon) shares part of its boundary with a road (line)
the shared part becomes a primitive object which is stored only once in the database, and shared by the two higher level features

by using shared primitives, can avoid artifacts which might result when comparing or overlaying the two versions of the woodlot/road line, one belonging to the road object and one to the woodlot object

to identify shared primitives during digitizing they must be on the same document
need an operation which allows two separate primitives to be identified as shared and replaced by one
need a converse operation to unshare a primitive if one version of the line must be moved and not the other

diagram

C. STORING ACCURACY INFORMATION

how to store information on accuracy in a database?

Raster data

uncertainty in each cell''s attributes might be stored by giving each cell a set of probability attributes, one for each of the possible classes
in classified remote sensing images this information can come directly from the classification procedures

uncertainty in elevation in a DEM is more likely constant over the raster and can be stored as part of the descriptive or metadata for the raster as a whole

positional uncertainty is also likely constant for the raster
can be stored once for the whole map

Vector data

there are five potential levels for storage of uncertainty information in a vector database:
map
class of objects
polygon
arc
point

Positional uncertainty

positional accuracy at one level may not imply similar accuracy at other levels
positional accuracy about a point says little about the positional accuracy of an arc

diagram

similarly, positional accuracy at the polygon level may cause confusion along shared arcs

diagram

for lines and polygons, accuracy can be stored as an attribute of:
arc (e.g. width of transition zone between two polygons)
class of objects (e.g. error in position of railroads)
map as a whole (e.g. all boundaries and lines on the map have been digitized to specified accuracy)

for points, can be stored as an attribute of point, class or map

Attribute uncertainty

uncertainty in an object''s attributes can be stored as:
an attribute of the object (e.g. polygon is 90% A)
an attribute of the entire class of objects (e.g. soil type A has been correctly identified 90% of the time)

REFERENCES
Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment. Clarendon, Oxford. Chapter 6 on error in GIS.

Chrisman, N.R., 1983. "The role of quality information in the long-term functioning of a geographic information system," Cartographica 21:79.

Goodchild, M.F. and S. Gopal, editors, 1989. The Accuracy of Spatial Databases, Taylor and Francis, Basingstoke, UK. Edited papers from a conference on error in spatial databases.

EXAM AND DISCUSSION QUESTIONS

1. Define the difference between sensitivity to error in principle and in practice.

2. Imagine that you represent a community trying to fight the proposed route of the powerline discussed in this unit. What arguments would you use to attack the power utility company''s methods?

3. Compare the methods available in any digitizing system to which you have access, to those discussed in this unit. Does your system offer any significant advantages?

4. Some GIS processes can be very sensitive to small errors in data. Give examples of such processes, and discuss ways in which the effects of errors can be managed.

FRACTALS
A. INTRODUCTION
Why learn about fractals?
Length of a cartographic line
Where did the ideas originate?

B. SOME INTRODUCTORY CONCEPTS
Euclidean geometry

C. SCALE DEPENDENCE
Determining fractal dimension
Some questions

D. SELF-SIMILARITY AND SCALING
Self-similarity
Scaling

E. ERROR IN LENGTH AND AREA MEASUREMENTS

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

UNIT 47 - FRACTALS

Compiled with assistance from Brian Klinkenberg, University of British Columbia

A. INTRODUCTION

Why learn about fractals?

fractals are not so much a rigorous set of models as a set of concepts

these concepts express ideas which have been around in cartography for a long time

they provide a framework for understanding the way cartographic objects change with generalization, or changes in scale

they allow questions of scale and resolution to be dealt with in a systematic way

Length of a cartographic line

if a line is measured at two different scales, the second larger than the first, its length should increase by the ratio of the two scales
areas should change by the square of the ratio
volumes should change by the cube of the ratio

yet because of cartographic generalization, the length of a geographical line will in almost all cases increase by more than the ratio of the two scales
new detail will be apparent at the larger scale
"the closer you look, the more you see" is true of almost all geographical data
in effect the line will behave as if it had the properties of something between a line and an area

a fractal is defined, nontechnically, as a geometric set - whether of points, lines, areas or volumes - whose measure behaves in this anomalous manner
this concept of the scale-dependent nature of cartographic data will be discussed in more detail later

Where did the ideas originate?

term was introduced by Benoit Mandelbrot to the general public in his 1977 text Fractals: Form, Chance and Dimension
a second edition in 1982 is titled The Fractal Geometry of Nature

some of Mandelbrot''s earliest ideas on fractals came from his work on the lengths of geographic lines in the mid 1960s

fractals may well represent one of the most profound changes in the way scientists look at natural phenomena
fractal-based papers represent over 50% of the submissions for some physics journals
many of the studies of the fractal geometry of nature are still at the early stages (especially those in geomorphology and cartography)
the results presented in some fields are very exciting (e.g., see Lovejoy''s (1982) early work on the fractal dimensions of rain and cloud areas)

B. SOME INTRODUCTORY CONCEPTS
Euclidean geometry

in traditional Euclidean geometry we work with points, lines, areas and volumes
Euclidean dimensions (E) are all positive whole numbers

the Euclidean dimension represents the number of coordinates necessary to define a point
to specify any point on a profile requires two coordinates, thus a profile has a Euclidean dimension of two
to define a point on a surface requires three dimensions, therefore a surface has a Euclidean dimension of three

closely allied with Euclidean dimensions are the topological dimensions (DT) of phenomena
on a flat piece of paper (which has a Euclidean dimension of 2) you can draw a two-dimensional figure (DT= 2), a one-dimensional line (DT= 1), and a zero-dimensional point (DT= 0) (compare 0-cell, 1- cell and 2-cell notation)

in fractal geometry we work with points, lines, areas and volumes, but instead of restricting ourselves to integer dimensions, we allow the fractal dimension (D) to be any real number
the limits on this real number are that it must be at least equal to the topological dimension of the phenomenon, and at most equal to the Euclidean dimension (i.e., 0&LT=DT&LT=D&LT=E)
a line drawn on a piece of paper can have a fractal dimension anywhere from one to two

the term fractals is derived from the same Latin root [fractus] as fractions; therefore: fractional dimensions

the fractal dimension summarizes the degree of complexity of the phenomenon, the degree of its ''space-filling capability''
overhead - Lines of different fractal dimensions

straight line will have equivalent topological and fractal dimensions of 1
slightly curved line will still have a topological dimension of 1, but a fractal dimension slightly greater than 1
highly curved line (DT= 1) will have a much higher fractal dimension
line which completely ''fills in'' the page will have a fractal dimension of 2
many natural cartographic lines have fractal dimensions between 1.15 and 1.30
a surface can have a fractal dimension anywhere from 2 (perfectly flat) to 3 (completely space-filing)

fractal dimension indicates how measures of the object change with generalization
e.g. a line with a low fractal dimension (straight line) keeps the same length as scale changes
a line with fractal dimension 1.5 loses length rapidly if it is generalized

topological dimension tells us little about how shapes differ
e.g. all coastlines have the same topological dimension
however, sections of many coastlines have been found to have very different fractal dimensions

fractal dimension quantifies the metric information in lines and surfaces in a new and unique manner

FRACTALS
C. SCALE DEPENDENCE

the scale dependent nature of measurements (especially those made on maps) has been observed by many people
e.g. as you measure the length of a natural boundary on maps of larger scales, or make your measurements with more precise instruments, the length appears to increase
this is known as the "Steinhaus Paradox"

Richardson (1961) made an extensive study of the cartographic representation of international borders
suggested overhead Richardson plot, see Mandelbrot 1982, p. 33

he observed that there was a predictable relationship between the scale at which the measurement was made, and the length of the line
even though the length increased when the borders were measured on maps of larger scale, the increase was predictable
plots illustrating the relationship between measurement scale and length have since become known as Richardson plots
Mandelbrot subsequently placed Richardson''s (and others) work within the framework of fractal geometry, and showed that such behavior is predicted in a fractal world

Determining fractal dimension

an example of how to determine the fractal dimension of a cartographic line: 1. step a pair of dividers (step size s1) along the line; say it takes n1 steps to span the line 2. the length of the line is equal to s1n1 3. repeat the process, but decrease the step size (to s2); it now takes n2 steps to span the line 4. the length of the line is now s2n2 5. the fractal dimension can be calculated as:
D = log (n2/n1) / log (s1/s2)

worked example:
dividers size: 10 m number of steps: 100
dividers size: 5 m number of steps: 220
D = log (220/100) / log (10/5) = log (2.2) / log (2.0) = 0.3424 / 0.3010 = 1.14

here used logs to base 10, but any base could be used

the more irregular the line, the greater the increase in length between the two estimates, and the greater the fractal dimension

Mandelbrot''s texts, the book by Peitgen and Saupe (1988), and the papers by Goodchild and Mark (1987) and Milne (1988) discuss other methods of determining the fractal dimension
there are a large number of ways of determining the fractal dimensions of points, lines, areas, and volumes

Some questions

1. what is the "true" length of a line?
2. how can you compare curves whose lengths are indeterminate?

3. of what value are indices based on length measurements?

the perimeter of an area object increases steadily with scale, but the area of an area object deviates up and down by much smaller amounts
are analyses based on area less scale-dependent than ones based on perimeter?
what does this indicate about measures of shape based on the ratio of perimeter to the square root of area?

there is no complete solution to these (and similar types of) problems
however, use of fractal geometry (especially the fractal dimension) does allow us to make reasonably meaningful comparisons and indices (as illustrated in Woronow, 1981)

these questions are of special interest to cartographers interested in digital representations of cartographic features (e.g. Buttenfield, 1985)
there are implications with respect to: 1. digitizing
determination of the appropriate sampling interval 2. generalizing lines

the best method for generalizing lines may be that method which best retains the fractal dimension of the line 3. displaying lines at a scale greater than that at which the line was collected
introduce additional "information", by adding artificial detail to the line, detail which is a function of the fractal dimension of the original line); 4. incorporating the fractal dimension into traditional cartometry measures
see Woronow (1981)

D. SELF-SIMILARITY AND SCALING
Self-similarity

indicates that some aspect of a process or phenomenon is invariant under scale-changing transformations, such as simple zooming in or out

can be expressed in two ways:
overhead - Self-similarity 1. geometric self-similarity, in which there is strict equality between the large and small scales

not found in natural phenomena
the Morton order, quadtrees use this idea in replicating the same pattern at every level 2. statistical self-similarity, in which the equality is expressed in terms of probability distributions
this type of (random) self-similarity is the more common, and is the type found in many natural phenomena, such as coastlines, soil pH profiles, river networks (Burrough, 1981; Peitgen and Saupe, 1988; etc.)

the simplest test of self-similarity is visual
if a phenomenon is self-similar, any part of it, if suitably enlarged, should be indistinguishable from the whole or from any other part
if a natural scene is self-similar, it should be impossible to determine its scale
e.g. it should be impossible to tell whether a picture of self-similar topography shows a mountain range or a small hill - there are no visual cues as to the picture''s scale
since many scale cues are cultural, geological or geomorphological, self-similar topographies are most common on lunar or recent volcanic landscapes

Scaling

not necessarily equivalent to self-similarity, although the two terms are often used interchangably in the literature

consider a landscape, as represented by a surface and a contour map
on the contour map (coordinates in 2 dimensions only) the axes can be switched without fundamentally changing the characteristics of the landscape, i.e. the characteristics of the contour lines
contour lines are therefore examples of simple scaling fractals
in the case of the surface, with coordinates in 3 dimensions, we cannot interchange the z axes with either of the x or y axes without fundamentally altering the characteristics of the landscape
since the z axis has a different scaling parameter than the x or y axes, a three- dimensional representation of the Earth''s surface is therefor an example of a non-uniform (or multiple) scaling representation

shapes that are statistically invariant under transformations that scale different coordinates by different amounts are known as self-affine shapes (Peitgen and Saupe, 1988)
the Earth''s surface is an example of a self-affine fractal, but it is not an example of a self-similar fractal
contour lines, which represent horizontal cross- sections of the land surface, are examples of statistically self-similar scaling phenomenon (because the contour has a constant z value)

because the land surface is self-affine and not self- similar, those techniques which determine the fractal dimension of the land surface itself produce values which are different than the values produced by those techniques which determine the fractal dimension of the contours derived from that land surface

E. ERROR IN LENGTH AND AREA MEASUREMENTS

scale, through its relationships with generalization and resolution, significantly influences length and area measurements

problems in estimating line lengths, areas, and point characteristics can be related to the phenomenon''s fractal dimension (Goodchild, 1980)

estimates of area are frequently based on pixel counts, especially in raster-based systems
the error in the area estimate is a function of the number of pixels cut by the boundary of the object
boundaries with a fractal dimension greater than one will appear more complex as the pixel size decreases (as the resolution increases)
the more contorted the boundary, or the higher its dimension, the less rapid the increase in error with cell size

diagram

error in a pixel-based area estimate will also be a function of how the phenomenon is distributed about the landscape: the error in area associated with a highly compact phenomenon will be much less than the error in area associated with a widely dispersed, patchy phenomenon
Goodchild and Mark (1987, p. 268) show that:
the standard error as a percentage of the area estimate is proportional to a(1-D/4) where a is the area of a pixel and D is the fractal dimension of the boundary
standard error will thus depend on a1/2 for highly scattered phenomenon and a3/4 for single, circular patches with smooth boundaries

REFERENCES
Only a very small portion of the literature is presented here. For further references you should refer to the Goodchild and Mark (1987) paper; recent issues of Water Resources Research and Science also contain relevant papers

Burrough, P.A., 1981. "Fractal dimensions of landscapes and other environmental data," Nature 294:240-242.

Buttenfield, B., 1985. "Treatment of the cartographic line," Cartographica 22:1-26.

Goodchild, M.F., 1980. "Fractals and the accuracy of geographical measures," Mathematical Geology 12:85-98.

Goodchild, M.F., and Grandfield, A.W., 1983. "Optimizing raster storage: An evaluation of four alternatives," Auto-Carto 6(2):400-407.

Goodchild, M.F., and Mark, D.M., 1987. "The fractal nature of geographic phenomena," Annals AAG 77(2):265-278.

Hakanson, L., 1978. "The length of closed geomorphic lines," Mathematical Geology 10:141-167.

Lovejoy, S., 1982. "Area-perimeter relation for rain and cloud areas," Science 216:185-187.

Mandelbrot, B.M., 1977. Fractals: Form, Chance and Dimension, Freeman, San Francisco.

Mandelbrot, B.M., 1982. The Fractal Geometry of Nature, W.H. Freeman and Co., New York.

Milne, B.T., 1988. "Measuring the fractal geometry of landscapes," Applied Mathematics and Computation 27:67- 79.

Peitgen, H.-O. and D. Saupe (Eds.) 1988. The Science of Fractal Images, Springer-Verlag, New York.

Richardson, L.F., 1961. "The problem of contiguity," General Systems Yearbook 6:139-187.

Unwin, D., editor, 1989. Special issue on fractals. Computers and Geosciences 15(2).

Woronow, A., 1981. "Morphometric consistency with the Hausdorff-Besicovich dimension," Mathematical Geology 13:201-216.

EXAM AND DISCUSSION QUESTIONS

1. Although fractal concepts are important in understanding the error associated with pixel-based area estimates, little has been said about the relationship between fractals and area estimates obtained from vector-based systems. Why? (i.e., would the area of an enclosed figure change significantly? It is expected that the area shouldn''t change significantly, as the self-similar detail should increase the area as much as it decreases the area.)

2. Define "fractal". Include in your description terms such as scale dependency, self-similarity and scaling.

3. Discuss some of the ways in which fractals have changed our way of looking at phenomena. Based on your readings, provide examples from a variety of fields.

4. Theoretically, fractal behavior applies to a phenomenon across all scales. Practically, of course, there are limits to the application of self-similarity to natural phenomena. Where do you think some of these limits occur? (i.e., between what scales do you think portions of coastlines, for example, exhibit self-similar behavior.) What are the implications with respect to the generalization of cartogrpahic lines, if we observe definite limits to the self-similar behavior of cartographic features?

LINE GENERALIZATION
A. INTRODUCTION

B. ELEMENTS OF LINE GENERALIZATION
Simplification
Smoothing
Feature Displacement
Enhancement/Texturing
Merging

C. JUSTIFICATIONS FOR SIMPLIFYING LINEAR DATA
Reduced plotting time
Reduced storage
Problems with plotter resolution when scale is reduced
Processing

D. LINEAR SIMPLIFICATION ALGORITHMS
Independent Point Routines
Local processing routines
Unconstrained extended local processing routines
Constrained extended local processing routines
Global routines

E. MATHEMATICAL EVALUATION OF SIMPLIFICATION

F. LINEAR SMOOTHING

REFERENCES

DISCUSSION/EXAMINATION QUESTIONS

NOTES

UNIT 48 - LINE GENERALIZATION

Compiled with assistance from Robert McMaster, Syracuse University

A. INTRODUCTION

generalization is a group of techniques that allow the amount of information to be retained even when the amount of data is reduced
e.g. when the number of points on a line are reduced, the points to be retained are chosen so that the line does not change its appearance

in some cases generalization actually causes an increase in the amount of information
e.g. generalization of a line representing a coastline is done best when knowledge of what a coastline should look like is used

this unit looks at line generalization
line generalization is only a small part of the problem of generalization in cartography - the larger problem includes e.g. generalization of areas to points

the focus of the unit is on line simplification
simplification is only one approach to generalization (see below)

B. ELEMENTS OF LINE GENERALIZATION

generalization operators geometrically manipulate the strings of x-y coordinate pairs

Simplification

simplification algorithms weed from the line redundant or unnecessary coordinate pairs based on some geometric criterion, such as distance between points or displacement from a centerline

Smoothing

smoothing routines relocate or shift coordinate pairs in an attempt to "plane" away small perturbations and capture only the more significant trends of the line

Feature Displacement

displacement involves the shifting of two features at a reduced scale to prevent coalescence or overlap

most computer algorithms for feature displacement in vector mode concentrate on an interactive approach where the cartographer positions displacement vectors in order to initialize the direction for shifting

another method uses a smaller-scale version of the feature to drive the displacement process

Enhancement/Texturing

enhancement allows detail to be regenerated into an already simplified data set
e.g. a smooth curve may not look like a coastline so the line will be randomly textured to improve its appearance

one technique is to fractalize a line by adding points and maintaining the self-similarity of the original version
this produces fake (random) detail

Merging

merging blends two parallel features at a reduced scale
e.g. the two banks of a river or edges of a highway will merge at small scales, an island becomes a dot
algorithms for merging fuse the two linear features together

C. JUSTIFICATIONS FOR SIMPLIFYING LINEAR DATA
Reduced plotting time

plotting time is often a bottleneck in many GISs

as the number of coordinate pairs is reduced through the simplification process, the plotting speed is increased

Reduced storage

coordinate pairs are the bulk of data in many GISs

simplification may reduce a data set by 70% without changing the perceptual characteristics of the line
this results in significant savings in memory

Problems with plotter resolution when scale is reduced

as the scale of a digital map is reduced, the coordinate pairs are shifted closer together
with significant scale reduction, the computed resolution could easily exceed the graphic resolution of the output device
e.g. a coordinate pair (0.1, 6.3) reduced by 50% to (0.05, 3.15) could not be accurately displayed on a device having an accuracy of 0.1. Simplification would weed out such coordinate pairs before reduction

Processing

faster vector-to-raster conversion

faster vector processing
the time needed for many types of vector processing including translation, rotation, rescaling, cartometric analysis will be greatly reduced with a simplified data set

many types of symbol-generation techniques will also be speeded up
e.g. many shading algorithms calculate intersections between shade lines and polygonal boundaries
a simplified polygonal boundary will reduce both the number of boundary segments and also the number of intersection calculations required

LINE GENERALIZATION
D. LINEAR SIMPLIFICATION ALGORITHMS

overhead - Linear Simplification Algorithms

Independent Point Routines

these routines are very simple in nature and do not, in any way, account for the topological relationship with the neighboring coordinate pairs
1. nth point routine

every nth coordinate pair (i.e, 3rd, 10th) is retained

2. randomly select 1/nth of the coordinate set

Local processing routines

these utilize the characteristics of the immediate neighboring points in deciding whether to retain coordinate pairs
1. Euclidean distance between points

2. Angular change between points overhead - Perpendicular distance and angular change

3. Jenks''s simplification algorithm overhead - Jenk''s simplification algorithm

diagram

three input parameters:
MIN1 = minimum allowable distance from PT 1 to PT 2

MIN2 = minimum allowable distance from PT 1 to PT 3 ANG = maximum allowable angle of change between two vectors connecting the three points

algorithm:
IF distance from PT 1 to PT 2 &LT MIN1, OR distance from PT 1 to PT 3 &LT MIN2

THEN PT 2 is removed

ELSE IF angle 123 &LT ANG THEN PT 2 is removed

Unconstrained extended local processing routines

these algorithms search beyond the immediate neighboring coordinate pairs and evaluate sections of the line
the extent of the search depends on a variety of criteria, including:
the complexity of the line
the density of the coordinate set
the beginning point for the sectional search

Reumann-Witkam simplification algorithm overhead - Reumann-Witkam simplification algorithm
the algorithm uses two parallel lines to define a search region
after calculating the initial slope of the search region, the line is processed sequentially until one of the edges of the search corridor intersects the line

Constrained extended local processing routines

these algorithms are similar to those in the previous category, however, they are restricted in their search by: 1. coordinate search regions and 2. distance search regions
Opheim simplification algorithm overhead - Opheim simplification algorithm

same as the Reumann-Witkam routine, except the algorithm is constrained by a minimum and maximum distance check, much like the Jenks''s routine
after the initial search region is set up which is similar to that of Reumann-Witkam, any points within DMIN are eliminated
as soon as the line escapes from the search region on any side, including DMAX at the end, a new search
corridor is established and the last point within the region is saved

the behavior of this routine around a curve is represented in C and D

Lang simplification algorithm
Johannsen simplification algorithm

Global routines

consider the line in its entirety while processing
Douglas simplification algorithm overhead - Douglas simplification algorithm I and II

select a tolerance band or corridor (shaded area on slide)
this corridor is computed as a distance, t1 in length, on either side of a line constructed between the first and last coordinate pairs, in this example 1 and 40
point 1 is the anchor point and point 40 is the floater
after the establishment of a corridor, perpendicular distances between the line connecting points 1 and 40 to all intermediate points (coordinate pairs 2- 39)
are calculated to determine which point is farthest from the line

this maximum distance is to point 32, which is positioned well outside the corridor
the position of this coordinate pair (pair 32) is now saved in the first position of a stack
next, a new corridor is calculated between points 1 and 32 and point 23 is found as the farthest from the centerline
here, point 32 is the floater
this process continues until all points are within the corridor
after the search has backed up to point 4, a new anchor and floater are established between points 4 and 23--the last position saved within the stack

in this fashion the Douglas algorithm processes the entire line, backing up when necessary until all intermediate points are within the corridor and then selecting from the stack the position of the next saved coordinate pair
thus eventually the segment of the line between coordinate pairs 23 to 32 will be evaluated and the corridor from coordinate pair 32 to the end of the line will be the final computed segment

E. MATHEMATICAL EVALUATION OF SIMPLIFICATION

many different types of measures may be used to evaluate the simplification process
one type is simple attribute measures
another type are displacement measures

simple attribute measurements are those which may be applied to a single line, such as line length, angularity, and curvilinearity.
these apply to either the base line or a simplification

displacement or comparative measurements, on the other hand, evaluate differences between the base line and simplification
overhead - Measures for linear simplification

overhead - Areal displacement

it appears that some of the algorithms are much better than others in maintaining the critical geometric characteristics of the data
Douglas, Lang, Reumann-Witkam, and Opheim all appear to be reasonable choices
the two best are Douglas and Lang

F. LINEAR SMOOTHING

smoothing is applied to digital line data in order to improve the aesthetical qualities of the line and to eliminate the effects of the digitizing device
in general, it is felt that smoothing improves the quality of these data

smoothing increases the number of coodinates needed, so is normally used only for output

REFERENCES
Buttenfield, B.P., 1985. "Treatment of the Cartographic Line," Cartographica 22(2):1-26.

Douglas, D.H. and T.K. Peucker, 1973. "Algorithms for the Reduction of the Number of Points Required to Represent a Line or Its Character," The American Cartographer 10(2):112-123.

McMaster, R.B., 1987, "Automated Line Generalization," Cartographica 24(2):74-111.

McMaster, R.B., 1987. "The Geometric Properties of Numerical Generalization," Geographical Analysis 19(4):330-346.

McMaster, R.B.,1989. "The Integration of Simplification and Smoothing Algorithms," Cartographica 26(1).

Peucker, T.K., 1975. "A Theory of the Cartographic Line," Proceedings, Second International Symposium on Computer- Assisted Cartography, AUTO-CARTO-II, September 21-25, 1975 U.S. Dept. of Commerce, Bureau of Census and ACSM, pp. 508-518.

White, E., 1985. "Assessment of Line-Generalization Algorithms Using Characteristic Points," The American Cartographer 12(1):17-27.

DISCUSSION/EXAMINATION QUESTIONS

1. Discuss the differences between sequential and global approaches to line simplification.

2. What are the five generalization operators for digital line data? Discuss each one of these and give examples.

3. Using a series of diagrams, discuss the procedure used by the Douglas algorithm.

4. Discuss the different approaches you might use to evaluate the effectiveness of line simplification procedures and the advantages and disadvantages in each case.

VISUALIZATION OF SPATIAL DATA
A. INTRODUCTION
Maps
Computer-generated displays

B. CARTOGRAPHIC BACKGROUND
Visualization
What is the image supposed to show?
To whom?
Ideal display

C. GRAPHIC VARIABLES
1. Location
2. Value
3. Hue
4. Size
5. Shape
6. Spacing
7. Orientation

D. PERCEPTUAL AND COGNITIVE LIMITATIONS

E. GRAPHIC LIMITS

F. REPRESENTING UNCERTAINTY
Explicit uncertainty codes
Graphic ambiguity
Examples

G. TEMPORAL DEPENDENCE
Basic strategies

H. SHOWING A THIRD DIMENSION
Contours
Hypsometric mapping
Simulating oblique views of surface

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

UNIT 49 - VISUALIZATION OF SPATIAL DATA

Compiled with assistance from Matt McGranaghan, University of Hawaii

A. INTRODUCTION

Maps

are limited to two-dimensions
must show 3-D data projected onto a flat surface
give a distorted impression of spatial distributions on the globe

are static, cannot show change through time or animate

have difficulty showing interactions or flows between places

are limited by the tools used to make maps
pens of constant width
constant color or tone
the airbrush adds flexibility but is difficult to use, control

have difficulty showing uncertainty in data
give a false impression of accuracy

Computer-generated displays

include screens, plots, printer output

include raster and vector

can be animated

can show continuous gradations of color, texture, tone

can show 3-D using stereoscopic technology and pairs of images

the computer is a powerful tool for visualizing spatial information
this unit looks at some of the issues involved in combining the knowledge of cartography with the power of digital technology
all too often these issues are ignored when output maps and displays are created from GIS
although GIS display and mapping has much to learn from principles of cartographic design, it also provides entirely new possibilities

B. CARTOGRAPHIC BACKGROUND

must consider the objective of display

Visualization

process for putting (complex) images into minds

examples:
the shape of a mountain - poorly conveyed by contours
pattern of growth of an urban area - may need animation to show changes through time effectively
air-flows over a patch of terrain - needs 3-D capabilities plus animation to show true pattern of directions, speeds of flow
movements of people in an area - needs ability to generalize individual movements into meaningful aggregate patterns

components of visualization system:
database containing information
hardware device used to generate display
human visual system
processing of perceived image in the brain
correct perception depends on functioning of all of these components

What is the image supposed to show?

what impressions does the analyst wish to create in the mind?

what relationship do these have with the contents of the database?
database contents are abstract version of geographical reality
system should create an impression of reality, not of the contents of the database
aspects of relationship between database and reality, e.g. accuracy, should be important part of display

geography is complex
display is a filter removing unwanted complexity to show trends, patterns
display must show level of detail required by user, from general overview to detailed insights

To whom?

effective visualization may require familiarity with symbols on the part of the user

some people may never master skills of map-reading, i.e. using maps to visualize geography

how much familiarity should be assumed?
it may generally be better to assume low familiarity
people can learn to work with complex displays, but may lose interest and look for alternative sources of information

Ideal display

communicates intended message perfectly to all users
is not mis-understood

offers complete design flexibility
put any symbol anywhere, at any size, etc.

C. GRAPHIC VARIABLES

classes of symbols correspond to classes of objects
point

line
area

visual differences among map symbols convey information

1. Location

where the symbol is

determined primarily by geography

the primary means of showing spatial relations

the brain computes relations like "is within", "crosses" on the fly from the eye''s perceived image of the map

compare GISs
some compute these relationships on the fly, others store them in the database to avoid the processing required to compute them
compared to the brain, current GIS technology is amazingly crude

2. Value

lightness or darkness of a symbol

very important visually - the eye tends to be led by patterns of light and dark

usually used to represent quantitative differences

tradition suggests darker symbols should mean "more" - however this may reverse on dark backgrounds which are common on computer displays - on dark backgrounds, lighter may mean "more"

3. Hue

color

important aesthetically

usually represents qualitative differences - continuous grading of color is difficult and expensive to achieve on printed maps

4. Size

how large the symbol is

conveys quantitative difference

brain has difficulty inferring quantity accurately from the size of a symbol
if proportional circles are used to portray city population, doubling the radius of a circle (quadrupling its area) is perceived as indicating more than twice the population, but not four times
i.e. the brain infers population from some mixture of the radius and the area of the symbol

5. Shape

geometric form of the symbol

used to differentiate between object classes

used to convey nature of the attribute, e.g. population indicated by images of people, housing by house symbols

6. Spacing

arrangement, density of symbols in a pattern

used to show quantitative differences, e.g. dot density to show population

7. Orientation

of a pattern, to show qualitative differences

of a linear symbol, to show quantitative (directional differences)

VISUALIZATION OF SPATIAL DATA
D. PERCEPTUAL AND COGNITIVE LIMITATIONS

symbol differences must be perceptible to be of use
JND - just noticeable difference - the smallest difference which can be reliably perceived between symbols, sizes, colors, shapes etc.
LPD - least practical difference - the smallest difference which can be produced by the cartographic process

eye''s sensitivity to various graphic codes
some codes "get through" better
e.g. use of yellow for fire trucks allows them to stand out better in the visual field
sensitivity varies across visual field
"peripheral vision" is enhanced by movement, varies among individuals

cognitive aspects

indications that perception is dependent on cognition - knowledge understanding of phenomena
color categories/nameability - certain colors may have associations with names, concepts

E. GRAPHIC LIMITS

digital devices provide finite resolution

spatial - where symbols might be and their shapes
display device has a set screen or paper size
display pixels have a set size, finite number of spatial locations
aliasing - line (or point) mapped onto closest pixel(s)
produces stepped (straight) lines

color - what colors things might be
limit on number of colors available (palette) - plotter may have only 8 pen colors - screen may have millions of possible colors
limit on range of luminance and contrast

how many colors displayable at one time - 2n where n is the number of bit planes
what the colors are

temporal limits
data retrieval from mass storage or from core memory?
how much data processing needed to compute display?
writing to the display device
speed limited by communication overhead & bus contention (competition from other activities)
these factors may preclude using some types of display image
animation requires fast through-put
complex images require fast data retrieval
acceptable response time
people don''t like to sense a pause in the system
typical goal: maximum of two seconds for complex operations, instantaneous for all others
how long should something remain visible to be noticed?

F. REPRESENTING UNCERTAINTY

have to use SOME graphic code
don''t want its meaning confused with something else
e.g. line drawn wide to represent uncertain position confused with wide highway or braided stream

Explicit uncertainty codes

mark things which are uncertain with a color
e.g. red or yellow to suggest caution in using the information

Graphic ambiguity

use graphic ambiguity to create cognitive/visual ambiguity
e.g. multiple positions for an uncertainly located item

dot density or color could be used to show varying probability, e.g. a cloud with highest density in the center

absence of "hard" lines or edges where they are uncertain

Examples

show uncertain area with a red tint overlay

show uncertain lines as multiple lines (like a braided stream)

fuzzy line
vary the value or saturation of the line across its width

blending between adjacent areas to show zones of transition
blend the colors

choose such that the blend works psychologically
red &LT-> purple &LT-> blue
blue &LT-> aqua &LT-> green
NOT red &LT-> yellow &LT-> orange
large set of possible colors are needed to show the appearance of a smooth transition
transition can be simulated with a small set of colors by spatially blending pixel colors ("dithering")

G. TEMPORAL DEPENDENCE
Basic strategies

static maps
show a single slice of time
show several states at once by careful choice of symbols
indicate amount or rate of change

dynamic maps
real time is compressed or scaled into changing display
non-moving occurrences - events added and deleted at places through time
moving objects - movement is animated on the screen - symbol is deleted at one location, regenerated at adjacent location

H. SHOWING A THIRD DIMENSION
Contours

calculated contours (calculated by contouring algorithms)
starting with a grid of elevations, thread contours and display the lines

visual contours with elevation grid cells (contours are perceived but not computed explicitly)
given a sufficiently dense raster of elevations
shade pixels according to the elevation value of the central point using specified elevation ranges

result is apparently (not analytically) a contour map

Hypsometric mapping

set each pixel to a color dependent on its height

this is easily implemented as table look-up

range of colors is conventional - dark green for low elevations, through green, yellow, brown, then white at highest elevations

Simulating oblique views of surface

each pixel''s illumination computed from its slope relative to simulated "sun"
sun must be placed at top of image for correct visual perception - if sun is at bottom, eye sees surface inverted

requires assumptions about reflectance of surface
lakes, ice, some building materials produce highlights

single light source makes the surface too "stark"

assume light source infinitely far away from surface
may assume viewer is also infinitely far away to avoid complex perspective calculations

with TINs or coarse grids, edges of plane patches may be visible because of sharp change of slope
discontinuities can be eliminated by varying intensity of illumination continuously over facets
many 3-D display systems supply this capability - called Gouraud rendering

REFERENCES
Standard texts on map design:

Cuff, D.J., and Mattson, M.T., 198. Thematic Maps: Their design and Production, Methuen, New York.

Dent, B.D., 1985. Principles of Thematic Map Design. Addison- Wesley, New York.

Tufte, E.R., 1983. The Visual Display of Quantitative Information. Graphics Press, Cheshire, CT. A fascinating discussion including many cartographic examples.

Texts on computer graphics:

Durrett, H.J. ed., 1987. Color and the Computer. Academic Press, New York.

Foley, J.D., and Van Dam, A., 1982. Fundamentals of Interactive Computer Graphics. Addison-Wesley, New York.

Myers, R.E., 1982. Microcomputer Graphics. Addison-Wesley, Reading, MA.

Design for digital maps:

Monmonier, M., 1982. Computer-Assisted Cartography: Principles and Prospects. Prentice-Hall, Englewood Cliffs NJ.

Techniques for displaying topography:

Kennie, T.J.M., and McLaren, R.A., 1988. "Modelling for digital terrain and landscape visualisation," Photogrammetric Record 12(72):711-45.

EXAM AND DISCUSSION QUESTIONS

1. Summarize the ways in which digital displays offer greater flexibility for visualizing spatial data.

2. The visual system is not the only way in which spatial information might be conveyed to a user. Discuss the prospects for using other methods of communication, either alone or in combination with visual methods. What kind of user interface would be appropriate for a GIS for visually impaired users, and what applications might such a system have?

3. Review the methods of visualization available in any GIS to which you have access. How limited are they, and how could they be improved?

4. How would you adapt the concept of an atlas to a digital system with capabilities for animation?

COLOR
A. INTRODUCTION
What is color?
What gives an object its color?

B. COMPONENTS OF COLOR VISION

C. COLOR MEASUREMENT

D. PHYSICAL COLOR SPECIFICATION SYSTEMS
CIE
Uniform color spaces

E. PERCEPTUAL COLOR SPECIFICATION SYSTEMS
Munsell color system

F. CRT COLOR SPECIFICATION SYSTEMS
RGB system
HLS system
HVC (hue, value, chroma) system

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

The slide set contains images to illustrate this unit (#31 to 40).

UNIT 50 - COLOR

Compiled with assistance from Jon Kimerling, Oregon State University

A. INTRODUCTION

What is color?

a complex eye-brain response to electromagnetic radiation in the visible portion of the electromagnetic spectrum, commonly called "light"

the average person perceives solar radiation from approximately 400 nm to 700 nm (1nm = 10-9m) in wavelength

this range can be visualized as a series of six "spectral" colors grading from violet through red

colors such as red cover a greater proportion of the spectrum than others such as yellow

other colors are mixtures of these in varying proportions and a few colors, like fluorescent pink, are "non- spectral" since they cannot be created as a spectral mixture

What gives an object its color?

colors of most objects we see are a function of:
the spectral properties of the illumination source, i.e., the amount of light at each wavelength coming from the source
the ability of the object to reflect light at each wavelength, often graphically portrayed as a spectral reflectance curve
the sensitivity of the cones in our eyes to each wavelength

a CRT generates color by selectively exciting dots of three different phosphors - red, green and blue
the spectral emittance characteristics of the phosphors and our sensitivity to light emitted by them determine the colors we see

the gamut of a device is the range of colors which it is capable of generating
generally, it is difficult to match the gamuts of different devices or media (e.g. CRT and paper), so colors tend to change when an image is displayed on a different device or medium

B. COMPONENTS OF COLOR VISION

differences in spectral sensitivities of receptors in the eye''s retina give us color vision
Maxwell trichromatic theory of color vision is based on the fact that cone cells in our retinas, termed b, c and q, are primarily sensitive to blue, green, and red light, respectively
color seen is a function of the relative amount of blue, green and red light striking the closely packed mixture of cone cells that, along with rod cells sensitive only to light intensity, form the retina

visual signal transmission from the rod and cone cells appears not to be carried out by four different types of nerve fibre, but rather by nerve cells connected so as to
produce only three different signals in the fibre

the Opponent Process theory of color vision postulates that these signals interact to produce four perceptually unique "pole" colors: blue, green, yellow, and red
all other colors will be seen as mixtures of these maximally discriminable "poles"

color constancy refers to the ability of our visual system to adapt to light sources of different intensities and colors so that object colors remain the same
e.g. skier''s experience of again seeing snow and trees as white and green after wearing goggles of a different color for a few seconds
e.g. colors on CRT monitors appear the same as the screen brightness is lowered
the Retinex theory, proposed by Edwin Land of Polaroid fame, explains color constancy by saying that our eyes do not function as cameras, since we do not perceive color by wavelength alone
our mind does not determine the color of an object in isolation, but by comparing the object with its surround and continually adjusting to light source differences so that the object color appears the same

perceptual dimensions of color describe the three basic ways in which we see variation in color, that is, color varies by: 1. hue - the attribute of color whereby an area appears similar to an opponent process "pole" color (red, yellow, green, or blue) or a mixture of any two "pole" colors 2. lightness - the brightness of an area relative to the brightness of a similar area that appears white in color 3. chroma - the colorfulness of an area relative to the brightness of a similar area that appears white in color - the strength or weakness of a color

COLOR
C. COLOR MEASUREMENT

light measuring devices called spectrophotometers are employed to measure the light reflected from or emitted by an object, giving data needed for physical color specification
slide 31 - spectrophotometer for CRT screen

a spectrophotometer is a device that detects visible light either reflected from a surface lit by a "standard illuminant" or emitted by a CRT screen with a known "white point", disperses the light into a spectrum, and measures the amount of light at
small wavelength intervals along the spectrum relative to the standard light source

amount of light per wavelength interval is recorded in graphical or digital form suitable for subsequent use in color specification systems

D. PHYSICAL COLOR SPECIFICATION SYSTEMS

methods of specifying color used in optics

CIE

Commission International de l''Eclairage color system

widely used

allows precise numerical specification of color, based on spectrophotometric measurements
a numerical way to match colors to a standard and to determine color differences

colors are defined by (x,y,Y) coordinates that give a location on a chromaticity diagram
slide 32 - CIE chromaticity diagram

plotting the (x,y) coordinates of the color and the illumination source, one finds that a straight line drawn from the light source to the color and continued to the edge of the diagram gives the color''s dominant wavelength, a numerical description of hue
the straight line distance on the chromaticity diagram between the light source and color, divided by the distance from the light source, through the color, and to the diagram edge gives the color''s purity, which is similar in concept to chroma
the Y coordinate gives the color''s luminosity, this being a mathematical counterpart to lightness

the true nature of the CIE system is best illustrated by a three dimensional figure, that for process color printing and CRT monitors resembles a six sided crystal with black and white tips
slide 33 - 3-D perspective diagram of CIE color gamut for process color printing

all colors that can be created by a display device, such as a plotter or CRT monitor, will fall within the boundaries of this type of solid figure, which normally encompasses only part of the entire CIE color space.

slide 34 - the vertical dimension of the CIE color space

Uniform color spaces

equal differences in coordinates signify equal perceptual differences
desirable when color progressions are to be determined based on physical color measurements
the CIE (x,y,Y) system is not a uniform color space, but the related CIE (L*,u*,v*) color space is
(L*,u*,v*) is a non-linear transformation of (x,y,Y) coordinates

E. PERCEPTUAL COLOR SPECIFICATION SYSTEMS
Munsell color system

differs from CIE by being based on perceptual experiments to determine equal appearing steps of hue, value (perceived lightness), and chroma
color of a surface is determined by comparing it visually to a set of painted color chips
colors specified by 0-100 hue range, 0-10 value range, and 0-20+ chroma range
complex mathematical procedures exist to convert CIE to Munsell colors, based on a table look-up approach

slide 35 - the Munsell color system

color progressions for quantitative areal data displayed using the choropleth or dasymetric mapping method are often based on Munsell value and/or chroma steps, whereas qualitative data often are portrayed with a series of Munsell hues

F. CRT COLOR SPECIFICATION SYSTEMS

color CRT displays are fundamentally different from color printers and plotters
electrons in red, green, and blue (RGB) phosphor atoms are excited to higher energy levels by a moving electron beam, only to give off photons of the corresponding wavelengths upon return to their normal state after the beam has passed
the monitor screen is made up of hundreds of thousands of tiny red, green, and blue phosphors arranged as rows and columns of triads

RGB system

the RGB color system is closest to the physical design of monitors, since colors are specified by amounts of red, green, and blue which can be directly translated into the electron beam strengths to be delivered to each phosphor in a triad
system can be viewed as a cube with red, green, and blue axes

slide 36 - RGB cube
cube corners are white, black, red, yellow, green, cyan, blue, and magenta
all possible RGB combinations are within the cube

number of colors within the cube which are actually displayable depends upon the number of bit planes in the display driver or color monitor adaptor card

e.g. many adaptors (EGA, VGA) have 4 bit planes in normal modes, use 3 for colors, 1 for lightness - 3 colors give the eight corners of the RGB cube
24 bit plane driver gives 224, or over 16 million different colors per pixel, organized so that there are 28 or 256 levels of red, green, and blue, plus all combinations thereof

slide 37 - RGB cube diagram
(0,0,0) gives black, (255,255,255) gives white, and the 254 intermediate triplets form a progression of grey tones running diagonally through the cube
colors along the white-yellow [(255,255,255)- (255,255,0)], white-magenta and white-cyan cube edges, as well as diagonal rows from white to red, green, and blue form "tint" progressions
opponent process "pole" colors and mixtures thereof can be easily specified, since RGB components change smoothly
e.g. 254 gradations between blue (0,0,255) and green (0,255,0) can be created by holding red at 0, incrementing green by 1, and decrementing blue by 1

HLS system

Tektronix developed the HLS system to simplify the selection of color progressions similar to tints and shades
slide 38 - HLS color solid

a double cone with the central axis forming a lightness progression identical to the black to white diagonal line through the RGB cube
hues are specified by angle, starting with blue at 0o and progressing around the perimeter in the same order as found in the CIE chromaticity diagram when its boundary is traversed counterclockwise
lightness and saturation vary from 0 to 1

the triangular slice for each hue can also be viewed as a plane cut from the RGB cube and deformed into the HLS triangle
slide 39 - RGB - HLS deformation

the transformation is linear, and hence simple equations can be used to transform HLS specifications into RGB values, and vice-versa.

HVC (hue, value, chroma) system

slide 40 - HVC system

Tektronix has worked for several years to develop a color specification system essentially identical to the Munsell, the HVC system being the end product
created by making spectrophotometric measurements of thousands of RGB combinations, determining the CIE chromaticity coordinates for each, transforming all (x,y,Y) coordinates to their (L*,u*,v*) counterparts, and determining equal increments of hue, value, and chroma in this uniform color space
closely resembles the Munsell system - an irregular solid with vertical axis forming the value scale
hues progress from 00 to 3600 around the axis, with red at 00
each vertical slice into the solid exposes a page of value-chroma combinations for a particular hue.

the HVC-RGB transformation is far more difficult than the HLS-RGB, requiring a computer program of several hundred statements

REFERENCES
Dent, B.D., 1985. Principles of Thematic Map Design, Addison- Wesley, Reading, MA, pp. 353-357.

Eastman, J.R. 1986. "Opponent Process Theory and Syntax for Qualitative Relationships in Quantitative Series," The American Cartographer. 13(4):324-333.

Hunt, R.W.G., 1987. Measuring Color, John Wiley & Sons, New York, pp. 1-102.

Murch, G.M. and J.M. Taylor, 1988. "Sensible Color," Computer Graphics World, July 1988:69-72.

Niblack, Wayne, 1986. An Introduction to Digital Image Processing. Prentice-Hall, Englewood Cliffs, NJ.

Robinson, A.H., R.D. Sale, J.L. Morrison, and P.C. Muehrcke, 1984. Elements of Cartography, 5th edition, John Wiley & Sons, New York, pp. 170-177.

EXAM AND DISCUSSION QUESTIONS

1. How has the Munsell color system been adapted for display screen color specification?

2. What is the relationship between bit planes and the number of colors possible on a CRT monitor?

3. How is it that we see objects as the same color under different sources of illumination?

4. Explain the relationship between physical, perceptual and CRT color specification schemes, and give examples of each.

5. Explain the meaning of the term "gamut", and the problems which occur because of differences in gamuts between different display devices and media.

GIS APPLICATION AREAS
A. INTRODUCTION
Functional classification
GIS as a decision support tool
Core groups of GIS activity

B. CARTOGRAPHY
Computers in cartography
Organizations
Adoption

C. SURVEYING AND ENGINEERING
Recent advances in technology
Characteristics of application area
Organizations

D. REMOTE SENSING
Characteristics of application area
Organizations

E. SCIENCE AND RESEARCH
Analogy to statistical packages
Characteristics of application area
Organizations

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

This begins a 6 part section which reviews the spectrum of different applications of GIS. We have tried to include examples from all the areas in which GIS is currently actively employed. You may want to rearrange, enhance or revise major portions of these units to suit the needs and interests of your students.

UNIT 51 - GIS APPLICATION AREAS

Compiled with assistance from David Cowen, University of South Carolina and Warren Ferguson, Ferguson Cartotech

A. INTRODUCTION

GIS technology, data structures and analytical techniques are gradually being incorporated into a wide range of management and decision-making operations

numerous examples of applications of GIS are available in many different journals and are frequent topics of presentations at conferences in the natural and social sciences

in order to understand the range of applicability of GIS it is necessary to characterize the multitude of applications in some logical way so that similarities and differences between approaches and needs can be examined

an understanding of this range of needs is critical for those who will be dealing with the procurement and management of a GIS

Functional classification

one way to classify GIS applications is by functional characteristics of the systems

this would include a consideration of:
1. characteristics of the data such as:

themes
precision required
data model

2. GIS functions
which of the range of possible GIS functions does the application rely on?
e.g. address matching, overlay?

3. products
e.g. does the application support queries, one-time video maps and/or hardcopy maps?

a classification based on these characteristics quickly becomes fuzzy since GIS is a flexible tool whose great strength is the ability to integrate data themes, functionality and output

GIS as a decision support tool

another way to classify GIS is by the kinds of decisions that are supported by the GIS

several definitions of GIS identify its role in decision- making

decision support is an excellent goal for GIS, however:
decisions range from major (which foreign aid project to support with limited budget?) to minor (which way to turn at next intersection?)
difficult to know when GIS was used to make decisions except in cases of major decisions

decision support is a good basis for definition of GIS, but not for differentiating between applications since individual GIS systems are generally used to make several different kinds of decisions

Core groups of GIS activity

GIS field is a loose coalescence of groups of users, managers, academics and professionals all working with spatial information

each group has a distinct educational and "cultural" background
each has associated societies, magazines and journals, conferences, traditions
as a result, each identifies itself with particular ways of approaching particular sets of problems

interactions occur between groups through joint memberships, joint conferences, umbrella organizations

these groups or cultures, then, are another basis for characterizing application areas

the core groups of GIS activity can be seen to be comprised of:
1. mature technologies which interact with GIS, sharing its technology and creating data for it

surveying and engineering
cartography
remote sensing

2. management and decision-making groups
resource inventory and management
urban planning (Urban Information Systems)
land records for taxation and ownership control (Land Information Systems)

facilities management (AM/FM)
marketing and retail planning
vehicle routing and scheduling

3. science and research activities at universities and government labs

this and the next 5 units (Units 52-56) examine each of these groups of GIS activity seeking to find distinctions and similarities between them

begin in this unit with a quick review of the relationship between the mature technologies and GIS and finish with a look at the role of GIS in science

GIS APPLICATION AREAS
B. CARTOGRAPHY

there are two areas of GIS application in cartography: 1. automation of the map-making process 2. production of new forms of maps resulting from analysis, manipulation of data
the second is closer to the concept of GIS although both use similar technology

Computers in cartography

first efforts to automate the map-making process occurred in early 1960s

major advantage of automation is in ease of editing
objects can be moved around digital map without redrafting
scale and projection change are relatively easy

differences between automated mapping and GIS are frequently emphasized
mapping requires: knowledge of positions of objects, limited number of attributes
GIS requires: knowledge of positions of objects, attributes, relationships between objects
hence distinction between "cartographic" and "topological" databases

"analytical" cartography involves analysis of mapped data
has much in common with some aspects of GIS analysis

cartography plays a vital role in the success of GIS
supplies principles of design of map output products - how to make them easy to read and interpret?
see: Units 17 and 49
represents centuries of development of expertise in compiling, handling, displaying geographical data

widespread feeling that conversion to digital technology:
is inevitable
will revolutionize the field through new techniques

Organizations

both professional and academic organizations in most countries
International Cartographic Association (ICA)

well-developed training and education programs, journals, continuing research

Adoption

now is some use of digital technology in almost all aspects of the map production process

the term "desktop mapping" emphasizes the accessibility of one form of automated cartography in the same way that page formatting programs have led to the success of "desktop publishing"

C. SURVEYING AND ENGINEERING

surveying is concerned with the measurement of locations of objects on the Earth''s surface, particularly property boundaries
all 3 dimensions are important - vertical as well as horizontal positions
accuracy below 0.1 m is necessary

the locations of a limited number of sites are fixed extremely accurately through precision instruments and measurements
these sites are monuments or benchmarks - the geodetic control network
this is the function of geodesy or geodetic science

using these accurate benchmarks for reference, large numbers of locations can then be accurately determined relative to the fixed monuments

surveying is an important supplier of data to GIS
however, it is not directly concerned with role of GIS as a decision-making tool

some civil engineers now use GIS technology, especially digital elevation models and associated functionality, to assist in planning construction
e.g. to make calculations of quantities of earth to be moved in construction projects such as building highways
e.g. to visualize the effects of major construction projects such as dams

Recent advances in technology

instruments:
locations captured by measuring device in digital form, downloaded to database - the "total station"
new GPS (global positioning system) instruments determine location from satellites, supplementing the geodetic control network

direct linkage of surveying instruments to spatial databases
thus suppliers of surveying equipment have entered the GIS field as vendors

Characteristics of application area

scale:
large - surveying often accurate to mm
engineering calculations require high DEM resolution

data model:
survey data is exclusively vector

lineage:
for legal reasons the source of survey data is important
e.g. instruments, benchmarks used, name of surveyor, date
most systems do not yet allow such lineage information to be stored directly with the data

Organizations

surveying and engineering are mature professional fields based on scientific methods, with organizations, conferences, courses, journals, systems of accreditation

introduction of GIS technology has not radically altered the profession

D. REMOTE SENSING

like surveying, is a data producing field

acquires knowledge about the Earth''s surface from airborne or space platforms

elaborate, well-developed technology and techniques
instruments for data capture - high spatial and spectral resolution
transmission of data, processing, archiving
interpreting and classifying images

two major roles for GIS concepts:
quality and value of product is enhanced by use of additional ("ancillary") data to improve accuracy of classification
e.g. knowledge of ground elevation from a DEM allows shadows to be removed from images
to be useful in decision-making, product needs to be combined with other layers less readily observed from space
e.g. political boundaries

remote sensing continues to be an active research area
new instruments need to be evaluated for applications in different fields
careful research is needed to realize the enormous potential of the technology
volume of accumulated data is increasing rapidly

Characteristics of application area

scale:
a full range of spatial resolutions, depending on altitude, characteristics of instrument

data model:
data is captured exclusively in raster form (pixels)
classified images may be converted to vector form for output, or for input to GIS systems

interfacing with GIS is a current development direction
both areas have developed extensive software systems
in remote sensing, systems include image processing functionality
interfacing is not difficult technically - however, there may be substantial incompatibilities in data models, format standards and spatial resolution
many GIS vendors include functions to convert data from remote sensing systems and to display vector data on satellite image backdrops
true integration of vector GIS and raster image processing systems is not yet available

Organizations

because of continuing emphasis on research, there is heavy representation from government and academic research

the growth curve of remote sensing occurred about a decade earlier than GIS

E. SCIENCE AND RESEARCH

growing interest in using GIS technology to support scientific research
to support investigations of global environment - global science
to search for factors causing patterns of disease - epidemiology
to understand changes in patterns of settlement, distributions of population groups within cities - anthropology, demography, social geography
to understand relationships between species distribution and habitats - landscape ecology

GIS has been called an enabling technology for science because of the breadth of potential uses as a tool

Ron Abler (Pennsylvania State University) has compared GIS to tools like microscopes, Xerox machines, telescopes in its potential for support of research

Analogy to statistical packages

major statistical packages - SAS, SPSS, BMD, S etc. - developed over past 20 years
primarily developed to apply statistical tools in scientific research
subsequent applications in consulting, business
recent introduction of graphics, mapping capabilities for display of results, e.g. SAS/GRAPH

unlike statistical packages, GIS development has been driven by applications other than scientific research

lack of tools for spatial analysis has meant that the role of location in explaining phenomena has been difficult to evaluate
locational information has been available in map libraries but hard to interface with other information, not part of digital research environment

potential for GIS to play an important role in scientific research

GIS supports spatial analysis as statistical packages support statistical analysis

Characteristics of application area

scale:
very large (archaeology) to very small (global science)

functionality:
overlay to combine, correlate different variables
ability to interface GIS with complex modeling packages, statistical packages
interpolation
visualization of data
potential for 3D, time-dependent applications

Organizations

no forum for exclusive discussion of role of GIS in science (similar problems in statistics)
particularly in the non-technical fields in the social sciences

discussion confined to individual disciplines

geography is the only discipline with a general concern for spatial analysis and supporting tools
however, in most US universities geography is a small, relatively weak and unknown discipline
in other countries, (e.g. UK) geography is recognized as a strong traditional discipline, with distinguished roots in social and physical science research

REFERENCES
Abler, R.F., 1987. "Awards, rewards and excellence: keeping geography alive and well," Professional Geographer 40:135-40. Source of the reference in Section E.

Bylinsky, Gene, 1989. "Managing with electronic maps," Fortune, April, 1989. Important popular review of GIS as a decision tool.

EXAM AND DISCUSSION QUESTIONS

1. Some have argued that the best way to classify GIS applications is through the data they use. How would the results differ from the taxonomy proposed in this Unit?

2. What significant groups are missing from this taxonomy of GIS applications? What areas of application might develop in the future?

3. Do you accept the analogy between GIS and statistical packages presented in this Unit? In the long term, which would you expect to have the more significant role in supporting scientific activity? Why?

4. Which branches of science would have most use for a GIS as an enabling technology? Which would have least use for it?

5. It has been argued that GIS is an extremely dangerous tool in epidemiology, because of its potential for identifying all sorts of spurious correlations between environmental factors and the occurrence of disease. Do you agree, and if so, what steps would you recommend to reduce the potential for misuse?

RESOURCE MANAGEMENT APPLICATIONS
A. INTRODUCTION
Characteristics of applications
Functionality
Adoption
Organizations

B. EXAMPLE - BIG DARBY CREEK PROJECT
Big Darby Creek characteristics
AGNPS - Agricultural Nonpoint Source Pollution Model
The GIS
GIS-Model Link

C. DATABASE
Slides

D. SAMPLE RESULTS
Management strategies tested
Example of output

E. ASSESSMENT OF SYSTEM

EXAM AND DISCUSSION QUESTIONS

NOTES

The slide set contains twelve slides (#41 to 52) to illustrate this unit. As in many of these practical applications, widely accessible documentation is not available.

UNIT 52 - RESOURCE MANAGEMENT APPLICATIONS

Compiled with assistance from John Bossler, Ohio State University

A. INTRODUCTION

resource inventory and management was one of the earliest uses of GIS
these applications dominated sales by vendors in the early 1980s
many systems installed by state and federal governments and resource industries, particularly forestry, oil and gas

most successful resource applications:
forestry - timber inventory, watershed management, development of infrastructure (roads), forest regeneration
agriculture - studies of agricultural pollution, inventories of land capability, productivity studies
land use - planning use of land, zoning, evaluating impacts
wildlife - management of habitat, evaluation of impact

less successful
subsurface resources - requires 3D approach, technology is predominantly 2D
oceans - requires 3D, problems are time-dependent, lack of suitable data sources
water resources - good for integration over watersheds, but 2D approaches are not ideal for linear surface watercourses or 3D groundwater

Characteristics of applications

layers:
typically requires many coverages of an area - resources and relevant management factors are multi- dimensional
mixture of data models - raster and vector
with vector model, heavy use of polygons to represent homogeneous areas

scale:
varied but uncommon above 1:10,000

data quality:
many layers are result of interpretation, classification
quality is variable, often unevaluated

Functionality

simple map analysis:
overlay, measurement of area, buffer zone generation, calculation of viewshed

modeling:
many include the use of external models based on multiple variables obtained from different layers
e.g. models to simulate drainage basin runoff, fire spread

Adoption

most forest management agencies by mid 1980s

most resource management agencies by late 1980s

Organizations

numerous conferences sponsored by federal and state agencies

no major organization clearly devoted to GIS applications in resource management
discipline-based organizations focus applications, e.g. forestry, ecology

B. EXAMPLE - BIG DARBY CREEK PROJECT

demonstrates an application of GIS to natural resource management
illustrates the role of a GIS in linking with an existing analytical package
GIS provides data input, storage, output and some analytic capabilities
existing package provides specialized modeling, interfaced with the GIS

funded by Nature Conservancy, NASA, Ohio EPA, Ohio Department of Natural Resources

2 year project

combines a GIS (ERDAS) with a nonpoint source pollution model (AGNPS)
additional software was developed to link the two existing packages

goal to provide a low-cost, user-friendly system and database to support land use planning and management for the basin

purpose of this project is to evaluate effects of changes in management practice
model with GIS provides capability to evaluate "what-if" scenarios - observe and quantify effects of changes
role of model is to simulate effects of natural processes, e.g. if x changes by an amount a, what is the corresponding effect on y?
model is only useful if it predicts such effects accurately

an additional role of the GIS in this case is to integrate spatially
if changes are made in certain parts of a drainage basin, GIS can be used to integrate results of changes over the whole basin and give user the total

Big Darby Creek characteristics

Watershed
contains 370,000 acres (580 mi2, 1,500 km2) in central Ohio
includes parts of 7 counties

State Scenic River
one of the region''s last remaining free flowing streams
not dammed for flood control or water supply
over 60 of 100 Ohio freshwater fish species
"exceptional water quality" (Ohio Environmental Protection Agency)

Heritage elements
107 "heritage element" occurrences
heritage elements are rare plant and animal species, champion trees protected by state and federal laws

Sediment production
however, is "highest sediment yielding watershed in Ohio" (Soil Conservation Service)
percentages of land use - 71% cropland, 9% forest, 9% pasture, 9% fallow, 1% urban

Typical management questions
what would be the water quality effects of a 10 m conservation easement along the river?
which soil types or fields are contributing the most siltation to the river and should be targeted for some kind of conservation action?
which combination of crop/field management practices yields the most benefit to water quality?
effective management requires quick and accurate answers to these and other questions

AGNPS - Agricultural Nonpoint Source Pollution Model

developed by US Department of Agriculture

simulates impact of agricultural land use on water quality

calculates for watershed as a whole, or for 40 acre units, the erosion and siltation and the nitrogen, phosphorus and chemical-oxygen demand generated by a storm

results provided in tabular form

The GIS

low cost, microcomputer-based

uses the GIS module marketed by ERDAS
ERDAS product is normally associated with image processing thus these capabilities are also available

provides:
easy data entry and manipulation
flexible graphics for output
report generation

GIS-Model Link

GIS provides data entry and manipulation interface for the AGNPS program

once the database has been created by the GIS it is reformatted and fed to the AGNPS model by a simple series of user commands

after the model tabulates the results, output is fed back to the GIS to be displayed in map form

RESOURCE MANAGEMENT APPLICATIONS
C. DATABASE

21 variables required by AGNPS were entered through GIS
overhead - Variables used in AGNPS model

include:
current management practices obtained by survey of 200 farmers
soil type, slope from Soil Conservation Service surveys
land cover from remote sensing (Thematic Mapper)

40 acre raster cells, 400 m on each side
210 rows and 148 columns

Slides

slide 41 - Regional setting
slide 42 - Landsat scene

Columbus is light blue area in lower right, Darby Creek watershed is centered on greenish area to left of Columbus
note proximity to major metropolitan area, population 1.4 million

slide 43 - surface hydrology
1 = Big Darby, 2 = Little Darby, 3 = major streams

slide 44 - photograph of Big Darby Creek
slide 45 - land use

1 = cropland, 2 = fallow, 3 = pasture, 4 = forest, 6 = urban, 7 = water
note 88% of watershed is in agricultural use, only 9% is forest and 1% developed

slide 46 - photograph of cropland in the watershed

another layer identifies slope

50% of watershed is &LT2% slope, only 3% has slope >12%
note that estimation of slope depends on size of raster cells
the mean slope in a cell 400 m by 400 m is not the same as the maximum slope
definition of slope used is unclear

despite low slopes, much of basin has high soil erodibility according to SCS''s rating system
28% of basin qualifies for SCS Conservation Reserve Program (CRP)
evidence of critical need for soil conservation practices

slide 47 - distribution of CRP soils
clustered in areas of higher slopes and along watercourses

slide 48 - the 107 Heritage Element occurrences in the watershed
identifies rare plant and animal species and "champion" trees protected by state and federal laws
large number of occurrences indicates watershed''s ecological diversity and significance

slide 49 - subwatersheds
subsequent results will be for subwatershed 1 in northern extremity

D. SAMPLE RESULTS

slide 50 - nitrogen levels predicted by AGNPS model using data from GIS and displayed by GIS
1. (upper left) historical baseline - complete forest cover, virtually no erosion

2. (lower left) assuming complete compliance with CRP for eligible soils (28% of basin)

low levels of erosion, only in limited areas

3. (upper right) current conditions - red indicates extremely high soil erosion
several areas of very high erosion

4. (lower right) assumes implementation of a conservation easement on both sides of river, with forest cover
erosion is reduced within the easement, but not outside
number of raster cells in lowest category of erosion is increased from 459 under current conditions to 531

Management strategies tested

overhead - Management strategies tested

conservation easements of various widths on both sides of river

use of no-till or conservation tillage practices on critical areas

conversion of critical areas to non-agricultural (forested) use

various combinations of the above, determined by likely acceptability to local farmers and government agencies

Example of output

given limited resources for erosion abatement, where should effort be concentrated?

model can identify areas where greatest reduction in erosion rate can occur for given change in management practice
slide 51 - critical areas for sediment reduction

shows where change in management practice will produce greatest reduction
12% reduction in sediment yield can be achieved by changing management of these cells
these are only 3% of area

E. ASSESSMENT OF SYSTEM

user-friendly GIS provides easy display of results, colorful graphics, standardized reports, easy input of data
slide 52 - specialized linkage required between GIS and erosion model (AGNPS)

such linkages will become unnecessary if data transfer formats can be standardized

about 30 minutes required to test a scenario fully and obtain results

system runs on readily available PC hardware under DOS
system is comparatively portable and could be used for decision support in local planning meetings

EXAM AND DISCUSSION QUESTIONS
1. What types of standards would be useful in interfacing packages such as AGNPS and ERDAS? Who should develop them and how should they be promulgated?

2. Discuss the role of spatial resolution in the Big Darby Creek study and its effects on the results. What arguments might have been used to justify a 40 acre cell?

3. Why was a raster data model used in this study rather than a vector data model?

4. The results quoted in this unit were based on counts of raster cells. Discuss the issue of accuracy in the Big Darby Creek study, and its implications for implementation of the study''s results.

URBAN PLANNING AND MANAGEMENT APPLICATIONS
A. INTRODUCTION
Characteristics of applications
Adoption
Organizations

B. EXAMPLE - ASSESSING COMMUNITY HAZARDS
Anticipatory hazard management
Hazard zone geometries
US Superfund Amendments and Reauthorization Act
Case study

C. DATABASE
Hazardous materials
Demographic information
Urban infrastructure
Physiography

D. ANALYSIS
Simple spatial analysis
Cartographic modeling
Risk assessment model

E. POTENTIAL IMPROVEMENTS TO MODEL

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

The slide set contains eight slides (#53 to 60) to illustrate this unit.

UNIT 53 - URBAN PLANNING AND MANAGEMENT APPLICATIONS

Compiled with assistance from Robert McMaster, Syracuse University

A. INTRODUCTION

involve the use of computers to carry out functions of urban government

history of use extends back to first introduction of computers in cities in early 1960s

major involvement of US Bureau of the Census as provider of data
development of DIME files (locations of street centerlines, address ranges for each block, hooks to census reporting zones) for 1970 census

series of city case studies in late 1960s/early 1970s in US
comparable studies in many countries
case studies designed to demonstrate simple GIS capabilities for urban government:
planning using social statistics for small areas, e.g. crime data
simple record-keeping
problems associated with primitive state of hardware and software at that time

Characteristics of applications

scale:
scale of DIME and TIGER (derived from USGS mapping at 1:24,000, 1:50,000, 1:100,000) sufficient to show street center lines but not parcels
adequate for transportation planning, vehicle routing, general development strategies
at this scale GIS can interface with existing records from census
increasing interest in parcel level data for land records, zoning, services, subdivision plans
at this scale can interface with assessor''s tax records

functionality:
many installed systems used for mapping, e.g. updating subdivision plans
limited use for inventory, e.g. identifying parcels impacted by proposal
little use for modeling - modeling applications more likely supported by specific software not linked to GIS - e.g. school bus routing packages

Adoption
early adoption by federally funded case study cities, others with adequate budgets
now almost all local governments have some level of involvement
in many states the state government plays a coordinating role

Organizations

Urban and Regional Information Systems Association (URISA) organized in late 1960s
similar organizations in many countries
membership drawn from local, state and federal government, consultants, academics
sustained interest in GIS, particularly in recent years

Spatially Oriented Referencing Systems Association (SORSA) provides an international forum

B. EXAMPLE - ASSESSING COMMUNITY HAZARDS

this example describes modeling of community vulnerability to hazardous materials

there is an increasing concern with the manufacture, storage, transportation, disposal of hazardous materials
recent EPA study revealed an average of 5 incidents per day over past 5 years where hazardous materials were released into the environment from small and large production facilities

Anticipatory hazard management

crucial component in mitigating potential impacts

determines exact hazard distribution in an area
exact locations of sources and zones of potential impact

determines what can be done to prevent or reduce serious accident
identify population distribution, social and economic characteristics
needs daytime locations of population as well as residential (night-time) locations
identify communication resources and transportation plan for evacuating area

this example deals with airborne toxic releases
occur rapidly, disperse over large area with immediate health effects
evacuation more likely needed than for spills into soil or water
population at risk may depend on specific substance released
needs detailed socioeconomic information - e.g. age of population is a factor in evacuation planning, assessing potential impact because of possible mobility impairment

Hazard zone geometries

regions defined by level of risk to population, based on proximity to hazards

combination of hazard zones produces a potential "contoured risk surface"
overhead - Hazard zone geometries

specific geometries include:
areas of risk due to production of hazardous materials
lines of risk due to hazards of transportation and transmission
points of risk produced by consumption

US Superfund Amendments and Reauthorization Act

US Superfund Amendments and Reauthorization Act (SARA), 1986

Title III - The Emergency Planning and Community Right- to-know Act, covers four aspects of hazards mitigation:
emergency planning
emergency notification
community right-to-know and reporting requirements
reporting of chemical releases

third component (community right-to-know) requires companies, organizations to submit emergency and hazardous chemical inventory information - including quantities and general locations

Case study

Santa Monica, CA selected as case study location
is a separate administrative entity within Los Angeles basin
city population of 88,300 suited the scale of the prototype study

community had initiated a community right-to-know law
fire department must be informed of any production or storage of over 50 gallons or 500 pounds or 2,000 sq ft of any hazardous material
records stored by Police Department

explores use of GIS for assessing community vulnerability
three levels - simple spatial analysis, cartographic modeling and risk assessment modeling

C. DATABASE

constructed for MAP (Map Analysis Package)

uses 100 m resolution pixels
difficulty of estimating population data for finer resolution because of confidentiality restrictions
adequate for airborne toxics
soil or water-borne would require finer resolution, different data models (3D and linear objects respectively)

database includes:
hazardous materials locations and descriptions
demographic data
infrastructure - transportation, sewer lines, landuse
physical geography - geologic faults, topography

Hazardous materials

records maintained by Police Department''s Toxic Chemical Coordinator

hundreds of different types of chemicals reported
overhead - On-site hazardous materials

some sites had only one chemical - e.g. solvent
chemical company has many toxic chemicals on site
genetic engineering company with assorted radioactive materials

study used UN Classification of Hazardous Materials
overhead - UN Classification of hazardous materials

categories added to UN classification by the city include:
PCBs
gunshops

slide 53 - composite map showing presence of hazardous materials by class in 100 m cells

Demographic information

from 1980 census, includes:
age structure - includes classes under 5, 5-15, 15- 65, over 65
ethnicity includes classes percent black, white, asian
percent non-English speaking
population density

assigned from census tracts to cells assuming uniform density

Urban infrastructure

includes:
locations of all public institutions
schools, colleges, hospitals, theaters, shopping centers
major street network
traffic flow densities
storm sewer network
includes numbers of catchbasins per 100 m cell
major oil pipeline
detailed land use map

Physiography

terrain model at 100 m resolution from 1:24,000 topographic sheet

allows:
tracing of chemicals flushed into storm sewer network
use of wind dispersion model

D. ANALYSIS
Simple spatial analysis

slide 54 - create composite map of all hazardous materials, construct 500 m buffer zones (MAP command SPREAD)
slide 55 - composite map of services

slide 56 - overlay of 500 m buffers on services to identify those services in close proximity to hazardous materials

could identify specific services and specific classes of hazardous materials, e.g. schools and radioactive materials

Cartographic modeling

cartographic modeling was used to model effects of hazardous materials incidents

for example, consider the event of a liquid spill:
control measures by the fire department would likely include washing the effluent into the storm sewer network
during similar previous incidents, vapors within the storm sewer network have risen into buildings

modeling strategy for assessing impact on schools
model flow through storm sewer network using terrain data
buffer around network
identify impacted schools falling in the buffer

slide 57 - topography of Santa Monica
slide 58 - sewers "draped" over topography (MAP command COVER)

slide 59 - flow forced downhill (under gravity) through storm sewers from assumed origin (beginning of red line in slide) to Santa Monica Bay

uses the MAP command STREAM with the constraint DOWNHILL

slide 60 - buffer zone of 300 m on either side of path

Risk assessment model

this represents a first step in developing a comprehensive spatial method for evaluating community vulnerability
overhead - Conceptual risk-assessment model

note: GIS functions named in this overhead refer to OSU MAP commands

risk zones were identified:
within 500 m of hazardous material site (HAZZONE)
within 500 m of Santa Monica Freeway (FREEZONE)
within 300 m of underground storage tank (TANKZONE)
appropriate distances were determined by consulting toxic chemical information and emergency planning personnel
uniform distances assumed

the remainder of the city not in risk zones was eliminated from further consideration (leaves RISKZONEs)

next, examined two components of risk assessment:
human component
hazardous materials component

human component has four variables:
average population density within 500 m (HAZDEN)
need to give greatest weight to areas of highest density
number of residents under 5 or over 65 within 500 m (HAZMOBIL)
these age groups will need special attention if evacuation is required
percent not speaking English as primary language within 500 m (HAZLANG)
difficulty of managing evacuation of non- English-speaking minorities
adjacency to school site (HAZSCHOL)

the first three human component variable were weighted based on the original classed data
e.g. census classification for percent Hispanic ("non-English speaking" group for this analysis) assigns classes of: 0 - outside database 1 - 1-4% 2 - 5-8% etc
these class values were used as the weights for each human component

these four human component variables then added to create a human hazard potential map (HAZHUMAN)
problem with lack of adequate basis for weighting
e.g. little research on relative difficulty of evacuating schools, elderly and non-English- speaking populations

hazardous materials component has four variables, for each cell it is determined:
number of hazardous materials within 500 m
diversity of materials within 500 m
number of underground storage tanks within 500 m
maximum traffic flow within 500 m
used as a surrogate for transportation hazard of hazardous materials

variables weighted and added to create composite (HAZSCORE)
first three variables weighted directly by value
e.g. if a cell had 16 occurrences of hazardous materials within 500 m it had a weight of 16 on the first variable
traffic flow weighted by class

finally, human and hazardous materials components added to create composite risk map (SCOREMAP), reclassified from an original range of 1-75 into five categories

highest risks along major traffic arteries due to concentration of industrial sites as well as transportation risk

note: this analysis was not intended for use in evacuation planning, it was designed only as a planning tool

E. POTENTIAL IMPROVEMENTS TO MODEL

relative weighting of components in human risk score should be based on research into relative difficulty of evacuating different groups, also relative susceptibility to materials

relative weighting of components in hazardous materials score should be based on history of previous incidents involving each material, also toxicity of material

needs plume dispersion model
score assumes impact within 500 m in all directions
actual impact will depend on wind dispersion of plume
need for model to assess likely dispersion based on atmospheric conditions, nature of incident
materials have different dispersion characteristics based on e.g. density of vapor

socio-economic data was based on census tract level
errors introduced by assuming uniform density within tract
needs finer resolution data for human component

needs evacuation model which incorporates actual road network, assigns traffic to it and estimates congestion
areas should be prioritized based on difficulty of evacuation, size of population and level of risk

many of these capabilities are available in CAMEO, developed by NOAA for the Macintosh and now widely implemented in US emergency response organizations

CADASTRAL RECORDS AND LIS
A. LAND SURVEYS AND LAND RECORDS
Public need for accurate land information
The cadaster

B. GEOMETRY OF CADASTRAL MAPS
Plane surveys and geodetic control
Absolute versus relative accuracy
Coordinate geometry (COGO)

C. THE TAX ASSESSOR AND CADASTRAL SURVEYS
Assessor''s parcel maps
Parcel numbers and Tax Roll

D. EXAMPLES OF THE NEED FOR MPC/LIS
Prince William County, Virginia
Louisville/Jefferson County, Kentucky
Los Angeles County, CA

E. ADDING MULTIPURPOSE LAND INFORMATION LAYERS
Geographic layers
Role of CAD systems in early LIS development
Non-geographic land attributes

F. GIS AND THE MULTIPURPOSE CADASTER
Integration of graphic and non-graphic information
Spatial operations for LIS applications

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

Since subdivision and other parcel maps are often hand drafted they do not reproduce well. Try to get an example from your local land record office to show in class, replacing the overhead provided here.

UNIT 54 - CADASTRAL RECORDS AND LIS

Compiled with assistance from Frank Gossette, California State University, Long Beach

A. LAND SURVEYS AND LAND RECORDS

Public need for accurate land information

governments, land developers, and property owners need and use land information daily

land information is the basis of property rights in most countries
land information must be used to resolve disputes
must be accessed when property changes hands

most of the information that a municipal government stores is tied to specific geographic locations within its jurisdiction: property lines, easements, utility and sewer lines, and many categories of spatial data

the ability to store, retrieve, analyze, report and display this public land information efficiently and accurately is of great importance
requests for information from a land information database can number thousands per day

land information is of variable quality
the legal description of land properties relies on accurate survey measurements, monuments with accurately known location, but also problematic descriptions such as "middle of river" (river may change course), marks on trees (tree may have died) etc.
in resolving disputes, the source of land information and its accuracy may be as important as the information itself
a land information database may need to include more than just coordinates

in the UK:
base mapping at 1:1,250 scale exists for all urban and many rural areas
over 250,000 sheets
regular program of maintenance and update
currently being converted to digital form

in the US:
largest scale base mapping is 1:24,000 or 1:50,000, too small for property boundaries
approximately 108 million parcels of taxable real property

records on these are maintained by 83,216 state and local government agencies
in local governments, 75% of daily transactions involve land information
e.g. address verification, parcel identification, ownership, budget summaries, delivery of services
records are held in unrelated formats
e.g. property record books, paper files, microfiche, maps, charts, computer databases
methods of information management are often as old as the system of land rights itself - which dates to before the Constitution
land data held by one agency are frequently unavailable to another - not because of jurisdiction, but because of the method of record keeping
leads to unnecessary confusion, cost and duplication

The cadaster

the cadaster is an official register of the ownership, extent and assessed value of land for a given area
cadastral refers to the map or survey showing administrative boundaries and property lines

cadastral information is usually the largest-scale (most detailed) land information available for an area

as such, cadastral information can provide a large-scale base to which other layers of data can be added for specific purposes
this is the concept of the multipurpose cadaster or MPC
the ideas of integration of spatial data inherent in the MPC are found in many other areas of GIS application
the MPC is an ideal - the actual state of cadastral information varies widely within the US and from country to country, despite wide acceptance that the arguments for MPC are very persuasive

LIS is a generic term for information systems that deal with land records

B. GEOMETRY OF CADASTRAL MAPS
Plane surveys and geodetic control

most cadasters are based on plane surveys

surveyors have measured the boundaries and property lines as planar distances from known locations or benchmarks or monuments
many, but not all, benchmarks are tied to actual geodetic control points (longitude/latitude or State Plane Coordinates)

conflicts occur when boundaries plotted from survey data overlap or fail to meet

Absolute versus relative accuracy

absolute accuracy refers to the relationship of a point on the map to its actual location on the globe

relative accuracy refers to the relationship of one point on the map to another point on the same map

e.g. a property line may be 400 feet from a USGS marker which has been globally positioned to be at 112 degrees West Longitude and 34 degrees North Latitude
either or both of these measurements could be inaccurate
the property line might only be 398 feet away and the benchmark might be shown to be several hundred feet off, when measured by GPS or adjusted to the new North American datum

Coordinate geometry (COGO)

overhead - Portion of a parcel map

land surveyors record subdivisions in terms of geometric distances and angles from control points (benchmarks)
legal descriptions are made up of distances and bearings that trace the boundaries of the land unit

special computer programs have been devised which accept this coordinate geometry (COGO) and translate the instructions into X-Y coordinates on the plane

this gives the maps created by this process better "relative accuracy," in most cases, than maps created by digitizing the boundaries from existing basemaps
overhead - Coordinate geometry vs digitizing

C. THE TAX ASSESSOR AND CADASTRAL SURVEYS

originally, cadastral maps and surveys were used exclusively to develop parcel maps for taxation purposes
based on the Original Surveys of the land area (county, city, sub-division, etc.)

however, these maps are not necessarily the legal authority for taxation or ownership
the actual surveyor''s notes and legal description provide this authority

Assessor''s parcel maps

basic unit of land is the parcel
parcels are usually contiguous and are owned by a single entity (family, individual, corporation, etc.)

Tax Assessor (usually a county official in the US) assigns a number (identifier) to each parcel on the map

Parcel numbers and Tax Roll

working from the Parcel Maps, the Tax Assessor makes a list of parcels and their taxable value
the value of land depends on many things, including the size of the property (area) and the actual or permitted uses (agriculture, industry, residential, etc.) of the land

tax rolls may also include the legal ownership, the size of the parcel and the improvements made to the property

important to note that tax rolls and parcel maps contain significant amounts of data that can be used for many purposes beyond tax assessment
however, many problems arise when they are used for other purposes since they were compiled at an accuracy and detail that is required for tax only
e.g. boundaries shown may not be accurate enough for city planning purposes

CADASTRAL RECORDS AND LIS
D. EXAMPLES OF THE NEED FOR MPC/LIS

the following examples illustrate the need for geographic information systems to handle this type of information
(this section quotes from and relies heavily on materials prepared for the US Department of the Interior, Bureau of Land Management''s Study of Land Information, mandated under Public Law 100-409, 1989)

Prince William County, Virginia

a mid-size Virginia county

land deeds are filed with the Clerk of the Court office, and microfilmed

a copy of the microfilm is given to the Real Estate Assessment Office
certain information is abstracted from the deed and becomes an assessment record on the county''s mainframe computer, accessible to all departments

a copy of the deed is used to update a parcel database
new parcels and subdivisions are entered into an automated mapping system using COGO
digital and mylar maps are updated weekly

since the Assessment Office defines parcels in its own way for tax purposes there is not a one-to-one correspondence between the parcel database and the assessment records
the geographic data in the parcel database cannot be linked effectively with the non-geographic assessment records
the county is developing a LIS which will implement a single database with no duplication of data elements

Louisville/Jefferson County, Kentucky

26 governmental units and local utilities produce or modify 111 sets of maps at annual costs of $3.2 million
of the 111 sets, 59 are used by more than one organizational unit and 20 by more than five

parcels and subdivisions are routinely mapped at least six times by government and utilities, often at different scales and levels of accuracy

area agencies maintain some 95 automated geographic databases and 110 manual databases
wide divergence in types and capabilities of computers
communication of data is complicated

replacing current practices with an automated system will save as much as $5.7 million over a 10 year period
conservative estimates are that staff efficiency will increase by at least a third
plan will include users and data collectors: Metropolitan Sewer District, local government agencies, utilities

Los Angeles County, CA

government consists of over 40 departments, plus committees, commissions and special districts

4084 sq mi area

approximately 50% of all information is geographically related

7 problems common to all departments:
lack of structured communication regarding sources, availability of georeferenced information
lack of timely and convenient access
information is not always current or accurate
information is duplicated, independently maintained
existing system is time consuming, difficult, labor intensive
limited ability to relate geographic and non- geographic records
difficulties of different scales, standards, accuracy, coordinate systems etc.

LA county presents enormous problems, not only related to size
complexity of jurisdiction - many of the incorporated cities within the county provide their own services, county government services the residual area
management of elections is a major potential application of LIS - there is one election on average every 2 days in LA county - each election has its own set of districts with complex definitions

a CAD parcel database alone is estimated at 300 Gbytes

plan to achieve a county-wide LIS by target date of 1997

E. ADDING MULTIPURPOSE LAND INFORMATION LAYERS

a Land Information System can be seen to be the result of adding more "layers" of information (geographic features) and including more attribute data to the cadastral map
the base map or cadaster now becomes an MPC (or LIS)

these data are useful for other, related functions of land management, planning and administration

Geographic layers

overheads - City map overlays (9 pages)

additional geographic features can be registered to the parcel basemap
e.g. street centerlines, public rights-of-way, "footprints" of public buildings, and other information for which the graphic representation is useful by itself

other examples include:

Infrastructure and Public Facilities
infrastructure may include water lines, sewer lines, fire hydrants, power poles or other "utilities"-type information

Hydrography and Topography
streams, ponds, underground aquifers, and the 50 year floodplain are all geographic features which could be useful adjuncts to basic land information

Role of CAD systems in early LIS development

early LIS development stressed the cadastral map as the main system product
ability to add layers of graphic information to the base map was a major incentive

because of the availability of Computer-Aided Design and Drafting (CAD) tools, early automation of land information was often done on such systems
since basic parcel boundaries, street information and some infrastructure information is immediately useable in graphic form, CAD systems provided LIS basemaps which could be easily updated and quickly produced
the capabilities of these systems do not generally extend beyond simple production of maps - do not support sophisticated queries or analysis

Non-geographic land attributes

geographic features may be associated with an infinite number of characteristics
parcel not only has ownership, area, and value, but can be distinguished on the basic of the allowable uses to which it can be put, the school district to which it belongs, or the age of the head-of- household

typical LIS attribute data include:
Land Use and Land Cover
Zoning and Administration
Demographics

as the attribute or tabular data become an increasingly important component of the system, the ability of simple, "flat-file" databases which are a part of CAD systems represent a serious impediment to system growth
more powerful data managers and GIS software may be needed

F. GIS AND THE MULTIPURPOSE CADASTER

many early LIS were created using CAD systems and relatively simplistic data managers
as the volume of information increases and more sophisticated applications are attempted, the functionality of full-featured Geographic Information Systems may be required
powerful, relational DBMS and topologically- structured, vector GIS software can handle the types of land-information management tasks which are typical of contemporary LIS

example areas in which GIS capabilities are essential:

Integration of graphic and non-graphic information

general queries
retrieval of administrative records using geographical keys (pointing at map, using topological relations such as adjacency, outlining query polygon etc.)

Urban and Regional Planning: thematic mapping
ability to merge geographic boundaries with statistical information - rapid creation of thematic maps in support of planning activities

Community Development: zoning changes
rapid update of zoning records, rapid display in map form using parcel boundaries

Spatial operations for LIS applications

Urban and Regional Planning: notifications
use of buffering operation to identify property owners within fixed distance of proposed project

Planning: feasibility studies
use of overlay, modeling to support spatial search for feasible areas meeting requirements for project

Public Works: roadwork surface modeling
use of 3D capabilities to make engineering calculations

Utilities: hydrologic modeling
use of network modeling capabilities to predict urban runoff, effects of changes in storm water system

Schools: population models and districting
forecasting school populations by small areas based on demographic, migration, housing development models
redistricting to achieve balanced school populations

Fire: optimal routing
use of network models for routing emergency vehicles, site selection for stations

REFERENCES
ACSM-ASPRS Joint Cadaster Task Force, 1985. "Implementing a National Multipurpose Cadaster," ACSM Bulletin 97:17-21.

ACSM Geographic Information Management Systems Committee, 1988. "Multi-Purpose Geographic Database Guidelines for Local Governments," ACSM Bulletin 114:19-30.

Chrisman, N.R., and B.J. Niemann, 1985. "Alternative Routes to a Multi-Purpose Cadaster," Proceedings Auto-Carto 7, ASPRS/ACSM, Falls Church, VA, pp. 84-94.

Donahue, J.G., 1988. "Land Base Accuracy: Is It Worth the Cost?," ACSM Bulletin 117:25-27.

Niemann, B.J. and J.G. Sullivan, 1987. "Results of the Dane County land records project: implications for conservation planning," Proceedings AutoCarto 8, ASPRS/ACSM, Falls Church, VA, pp. 445-455.

Reports on the Need for Multi-purpose Cadaster

National Research Council, 1980. Need for a Multipurpose Cadaster. Washington, DC.

National Research Council, 1982. Federal Surveying and Mapping: An Organizational Review, Washington, DC.

National Research Council, 1982. Modernization of the Public Land Survey System, Washington, DC.

National Research Council, 1983. Procedures and Standards for a Multipurpose Cadaster, Washington, DC.

Wisconsin Land Records Committee, 1987. Final Report: Modernizing Wisconsin''s Land Records, Institute of Environmental Studies, University of Wisconsin, Madison, WI.

FACILITIES MANAGEMENT (AM/FM)
A. INTRODUCTION

B. AUTOMATED MAPPING
Automated mapping capabilities
Automated mapping shortcomings

C. FACILITIES MANAGEMENT SYSTEMS
Facilities management systems capabilities
Facilities management systems shortcomings

D. AM/FM
AM/FM examples
Benefits of AM/FM systems

E. CHARACTERISTICS OF AM/FM
Functionality
Organizations

F. EXAMPLE - EASTERN MUNICIPAL WATER DISTRICT
Background
System development
System configuration
Map products
Applications development

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

The slide set contains a few slides that could be used to illustrate this unit.

UNIT 55 - FACILITIES MANAGEMENT (AM/FM)

Compiled with assistance from Warren Ferguson, Ferguson Cartotech, San Antonio

A. INTRODUCTION

facilities management is a very influential, well organized GIS application area

has major representation from utility companies - telephone, electricity, gas
projects tend to be very large, well funded and critical to the efficient operation of the utility

umbrella term used by these organizations is AM/FM - Automated Mapping and Facilities Management

AM/FM is primarily distinguished by the context of applications: utilities, urban facilities management

AM/FM is an information management tool
data used for day-to-day decisions only, is not an analytical tool
e.g. maintenance crews use information to locate and repair breaks in service
e.g. construction drawings are produced and sent to the field for installation

AM/FM is the integration of two tools
automated mapping produces maps
facilities management provides digital inventories of facilities
AM/FM links the two to provide geographical access to facility inventories

B. AUTOMATED MAPPING

with control of different layers of information, provides a variety of ways to output from a single database
e.g. by turning on or off layers, a street light map or electrical feeder map could be produced from the same database

Automated mapping capabilities

overhead - Automated mapping

better map maintenance is a major benefit of automated mapping
productivity increases 2 to 10 times over manual methods

no problem with physical or content deterioration of maps since they can be produced as needed or as updated

centralized control is a major benefit to major corporations
paper documents are replaced by a central digital store
copies can be produced and distributed as and when necessary

computerization provides easier but better controlled access
in the paper world, when a document was checked out no one else could access the information - elaborate systems were set up to ensure return of the document
in a digital world we can control who can access and for what purpose (read only, edit etc.)

Automated mapping shortcomings

provides only graphic output, no means of query
e.g. cannot obtain attributes of objects
e.g. cannot access objects by their attributes

because objects are not connected topologically, cannot carry out sophisticated analysis of networks

cannot relate map information to other records

C. FACILITIES MANAGEMENT SYSTEMS

exist in many organizations

Facilities management systems capabilities

overhead - Facilities Management

consist of computerized inventories of the organization''s facilities
capabilities for sorting, maintaining and reporting information
e.g. many utilities have pole files containing information on each pole
e.g. date of installation
many types of reports can be generated

can maintain a digital representation of the facility network to allow engineering, network analysis in tabular, numeric form (not spatial)

Facilities management systems shortcomings

no geographic capabilities
can generate only alphanumeric reports
cannot access records geographically
cannot generate geographic reports (maps)

redundancy must arise if both automated mapping and facilities management systems are maintained, one for mapping and the other for inventory

D. AM/FM

overhead - Automated mapping/Facilities management

combine automated mapping and facilities management into one system
geographic information provides a new window into the facilities database
information can be retrieved by pointing to a map image
e.g. point to an electrical cable and retrieve kVA (kilovolt-ampere) rating, length, mortality, or list of transformers connected to it

AM/FM is a very successful marriage of two traditional concepts

AM/FM examples

locating pole or facility item by street address

generate reports on street lighting - does it meet standards in specified area?

generate maps of electrical circuits or feeders at prescribed scale

produce continuing reports on property

provide reports for tax purposes

Benefits of AM/FM systems

reduces the cost to maintain information
no physical maps to deteriorate, get lost, misfiled
data is more accessible and secure

impact the organization by integrating operations
departments must cooperate because they now share data
reduces potential duplication between departments
ensures consistency of information base across departments

new forms of report available
new information provides basis for new forms of management

E. CHARACTERISTICS OF AM/FM

scale:
service maps are needed at a scale of 1" to 100''

general systems planning may require scales down to 1:1,000,000, e.g. for electrical utilities

data sources:
data generally collected during construction or maintenance, using sketches on standard basemaps

data quality:
high data quality is desirable, e.g. accurate positioning of underground facilities, but not always attainable in practice
much urban infrastructure (e.g. water, sewer pipes) may be more than 100 years old and many historical records may be missing

Functionality

AM/FM systems stress addition of geographical access to existing databases
database likely to remain on mainframe
geographical access may be from workstation with geographical data maintained locally
non-geographical data characterized by frequent transactions - requires access to database from many workstations
geographical data input independently using specialized graphics workstation

backcloth used for input
backcloth is a basemap showing the facility locations to be digitized as well as other geographic details, e.g. streets, parcels
digitizing may be done on screen with backcloth displayed in raster form using video technology
however basemap itself is not entered into database

some vendors supplying the AM/FM market argue that:
AM/FM applications are literally "geographic information systems" - providing geographically based access to information
systems which provide analysis and modeling functions are better described as "spatial analysis systems"

Organizations

AM/FM International - mostly utilities with strong representation by vendors, governments
little involvement as yet in education, research
branches in many countries

F. EXAMPLE - EASTERN MUNICIPAL WATER DISTRICT
Background

the Eastern Municipal Water District (EMWD) of Riverside County, California, provides agricultural and domestic water, sewer collection and treatment and water reclamation services to a service area of 534 square miles, population of over 250,000
land use in the service area is a mix of very rapidly growing urban and suburban areas as well as rural farm land, mountains and desert
has 50,000 domestic water customers supplied with water imported from the Colorado and California Aqueduct Systems as well as from 54 local ground water wells
33,600 sanitary sewer customers served by 5 regional water reclamation plants treating more than 24 million gallons of sewage per day
has an annual operating budget of over $60 million

area is developing very rapidly
population in the service area is anticipated to reach between 600,000 to 1 million by the year 2010
number of customers is expected to triple in that time
number of company employees will increase from 340 to 800
this extremely rapid growth has made it very difficult for the company to keep up-to-date on service maps and to plan properly for the installation of new services

System development

initially the interest in automation was simply a recognition of the immediate need for automated mapping as a way to deal with the backlog of mapping and record updates

however, during the process of system planning, several other potential information and engineering applications were also identified

therefore, the purpose of the AM/FM is:
on the short-term, to map and manage facilities in the high growth environment
on the long-term, to incorporate planning and sewer and water engineering analysis into the system

System configuration

with the assistance of a consultant the EMWD developed a plan for implementation of a major AM/FM system based on Intergraph equipment and software
overhead - EMWD proposed AM/FM system configuration

Map products

map products were the initial purpose of the system and their production is critical to the immediate success of the system
overhead - EMWD Map products

lists the maps which will be produced once the database is complete

Applications development

the current Facilities Master Plans identifies and recommends computer programs for engineering analysis in the long range planning of new facility construction and operating procedures
therefore, the system is designed to allow storage and interactive access to information for flow analysis of sewer and water models, using existing engineering analysis programs
during the development of the AM/FM database, designers needed to identify and incorporate additional attributes that would be used in these models

for long-range planning of facilities the system is designed to:
provide spatial analysis capabilities to allow projection of future resource requirements based on demographic and economic data
provide tools for the generation of construction work orders and detailed mechanical and electrical design drawings

system designers also ensured that the final system will support the inclusion of topographic data which can be used in several anticipated applications, including
hydraulic network analysis
groundwater modeling
identifying locations for radio telemetry facilities that will be used to provide real time data on flow and water levels

customized report generation will assist the maps and records department provide inventory and facility asset information for the County Tax Assessor

records will be generated by facility type, geographic area or any combination of attributes requested
digital tax rolls from the Assessor''s office can be quickly checked against the property owner data maintained by EMWD

customer service department will use the system to provide integrated access to meter reading, customer billing, facility locating and other inquiry processes

REFERENCES
Many examples of AM/FM installations are described in publications from AM/FM International including the annual conference proceedings and their trade journal, The Scribe.

Wagner, M.W., 1989. "The Eastern Municipal Water District AM/FM/GIS project," Proceedings, Conference XII, AM/FM International, New Orleans, April 1989, pp. 526-541. Describes in detail the planning and implementation plan for the EMWD system reviewed in this unit.

DEMOGRAPHIC AND NETWORK APPLICATIONS
A. INTRODUCTION

B. MARKETING, RETAILING AND ELECTORAL REDISTRICTING
Characteristics of application area
Types of applications
Organizations

C. EXAMPLE - REDISTRICTING
Background
Objectives
Technical requirements
Current districts
Redistricting
Proposals

D. VEHICLE ROUTING AND SCHEDULING
Technology
Databases
Functionality
Data quality

E. EXAMPLE - VEHICLE NAVIGATION SYSTEMS

F. HIGHWAYS PLANNING AND MANAGEMENT

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

UNIT 56 - COMMERCIAL APPLICATIONS

Compiled with assistance from David Cowen, University of South Carolina

A. INTRODUCTION

this unit looks at some of the more specialized applications of GIS

demographic analysis
spatial information plays a major role in many marketing and retailing decisions which involve decisions about the location of new stores, shopping centers, etc., and for evaluating the demographic characteristics of present and future trade areas
similar applications in the government sector include redistricting - changing electoral boundaries in response to changing distributions of population

network analysis
delivery and emergency vehicles benefit from up-to- date information on the condition of the transportation network as well as real-time route planning

B. MARKETING, RETAILING AND ELECTORAL REDISTRICTING

location factors are critical to success of retailing

accurate knowledge of spatial distributions is essential for advertising, direct mail campaigns

GIS technology useful in designing sales areas, analyzing trade areas of stores

similar applications occur in politics
design of voting districts (apportionment, gerrymandering) has enormous impact on outcome of elections
major interest in reapportionment after 1990 census

GIS applications in these areas are still at early stage

Characteristics of application area

scale:
street centerline, census reporting zones - i.e. 1:24,000 and smaller
data at block group/enumeration district scale (250 households) is needed for locating smaller
commercial operations like gas stations and convenience stores

data at census tract scale (2,000 households) is good for the location of larger facilities like supermarkets and fast food outlets

data sources:
much reliance on existing sources of digital data
especially TIGER and DIME
similar data available in other countries
additional data added to standard datasets by vendors
e.g. updating TIGER files by digitizing new roads, correcting errors
e.g. adding ZIP code boundaries, locations of existing retailers

functionality:
dissolve and merge operations, e.g. to build voting districts out of small building blocks
modeling, e.g. to predict consumer choices, future population growth
overlay operations, e.g. to estimate populations of user- defined districts, correlate ZIP codes with census zones
point in polygon operations, e.g. to identify census zone containing customer''s residence
mapping, particularly choropleth and point maps of consumers
geocoding, address matching

data quality:
more concern with accuracy of statistics, e.g. population counts, than accuracy of locations

Types of applications

districting
designing districts for sales territories, voting
objective is to group areas so that they have a given set of characteristics
"geographical spreadsheets" allow interactive grouping and analysis of characteristics
e.g. Geospreadsheet program from GDT

site selection
evaluating potential locations summarizing demographic characteristics in the vicinity
e.g. tabulating populations within 1 km rings
searching for locations that meet a threshold set of criteria
e.g. a minimum number of people in the appropriate age group are within trading distance

market penetration analysis
analyzing customer profiles by identifying characteristics of neighborhoods within which customers live

targeting
identifying areas with appropriate demographic characteristics for marketing, political campaigns

Organizations

many data vendors and consulting companies active in the field, many large retailers

no organization unique to the field

American Demographics is influential magazine

C. EXAMPLE - REDISTRICTING

GIS has applications in design of electoral districts, sales territories, school districts

each area of application has its own objectives, goals

this example looks at designing school districts

Background

the Catholic school system of London, Ontario, Canada provides elementary schools for Kindergarten through Grade 8 to a city of approx. 250,000
about 25% of school children attend the Catholic system
27 elementary schools were open prior to the study

population data is available for polling subdivisions from taxation records
approx. 700 polling subdivisions have average population of 350 each

forecasts of school age populations are available for 5, 10, 15 years from the base year (see Taylor et al., 1986) at the polling subdivision level

children are bussed to school if their home location is more than 2 miles away, or if the walking route to school involves significant traffic hazard

Objectives

minimal changes to the existing system of school districts

minimal distances between home and school, and minimal need for bussing

long-term stability in school district boundaries

preservation of the concepts of community and parish - if possible a school should serve an identifiable community, or be associated with a parish church

maintenance of a viable minimal enrollment level in each school, defined as 75% of school capacity and > 200 enrollment

Technical requirements

digitized boundaries of the polling subdivision "building blocks"

an attribute file of building blocks giving current and forecast enrollment data
for forecasting, we must include developable tracts of land outside the current city limits, plus potential "infill" sites within the limits

overhead - London polling subdivisions, development tracts and infill sites
748 polygons
development tracts are the isolated areas outside the contiguous polling subdivisions
infill sites are shown as points

the ability to merge building blocks and dissolve boundaries to create school districts
school districts are not required to be coterminous - if necessary a school can serve several unconnected subdistricts

a table indicating whether walking or bussing is required for each building-block/school combination

Current districts

overhead - Current allocation of students

"starbursts" show allocations of building blocks to 29 current schools (includes two special education centers)
note bussed areas in NW and SW - separate enclaves of recent high-density housing allocated to distant schools
this strategy allows an expanding city to deal with
dropping school populations in the core leading to an excess of capacity
rising school populations in the periphery but lack of funds for new school construction

without constantly adjusting boundaries

overhead - Enrollment projections

overhead shows projections of enrollment based on current school districts
note rapid increase in developing areas e.g. St Joseph''s (#3), St Thomas More (#4) NW
note decrease in maturing areas of periphery e.g. St Jude''s (#8) - SW area
note rejuvenation in some inner-city schools due to infilling e.g. St Martin''s (#15) - lower center
note stagnation in other inner-city schools e.g. St Mary''s (#17), decline e.g. St John''s (#14) - center

Redistricting

general strategy - begin with current allocations, shift building blocks between districts in order to satisfy objectives

requires interaction between graphic display and tabular output
quick response to "what if this block is reassigned to the school over here?"

implementation allowed School Board members to make changes during meetings, observe results immediately
using map on digitizer tablet, tables on adjacent screen

Proposals

overhead - Summary statistics for closure plan
shows one alternative plan developed

note:
assumes closure of 6 schools
rise in enrollment as percent of capacity
stability of projections through time
reduction in number of "non-viable" schools (&LT200 enrollment)
increase in percent not assigned to nearest school
increase in average distance traveled

D. VEHICLE ROUTING AND SCHEDULING

includes systems to aid in vehicle navigation, systems for routing emergency vehicles, scheduling delivery vehicles

important actors include:
automobile industry - vehicle navigation aids

parcel services - express, courier
emergency services - ambulance, fire

rapid development of technology, databases

Technology

systems in vehicles
e.g. ETAK navigator
small processor, database on cassette tape or optical disk (CD ROM), display showing location of vehicle and surrounding streets, also best route to destination
similar systems under development in Japan, Europe
e.g. Macintosh Hypercard systems installed in fire trucks - Cameo developed by NOAA
information on route to fire, layout of buildings, nearby hazardous materials

car rental agencies
systems at airport checkin counters offering driving instructions to user-defined places

vehicle scheduling
systems which automate vehicle routing given locations which have to be visited on call, e.g. parcel delivery
systems to assign optimum routes to e.g. school buses

Databases

heavy reliance on TIGER and DIME
problems with update
these products are geared to the 10-year census cycle
problems with completeness
DIME for urban areas only, lack of addresses in rural TIGER
problems with attributes
simple street layout is not sufficient for detailed vehicle routing
e.g. TIGER lacks data on one-way streets, no left turns, temporary road construction problems
problems with topology
e.g. roads which cross but do not intersect

growing interest among vendors in adding value to TIGER by dealing with some of these problems

lack of standards
no organization responsible for developing standards

no responsibilities of Bureau of the Census beyond census itself

Functionality

simple retrieval and display for vehicle navigation systems

finding optimum route requires fast, intelligent algorithm

address matching essential to identify location from street address

Data quality

street centerline, i.e. 10-20 m accuracy is adequate

attribute accuracy may be important because of risk of lawsuits in cases of accidents

E. EXAMPLE - VEHICLE NAVIGATION SYSTEMS

considerable research is currently being conducted to develop vehicle navigation systems
overhead - Automatic vehicle location systems

these systems require databases that have:
topological information
methods for determining position in the network
street attributes such as width, number of lanes, direction, surface condition, perhaps even usage information keyed to time of day
identification information like street names and other local names for special grades, bridges, landmarks

these systems need:
technology for determining current location, may be:
automatic determination from use of GPS and similar technology
dead reckoning based on distance travelled in the network and map-matching (snapping location to coordinates of links and intersections)
computer hardware and databases, may be:
internal to vehicle or at a central location with transmission of data to the vehicle
input for identifying starting location and destination
output to provide route instructions

must be able to generate maps for any location in the network at a speed that is compatible with the rate of movement of the vehicle
may be visual or verbal driving instructions

F. HIGHWAYS PLANNING AND MANAGEMENT

other transportation applications involve the use of network GIS for the planning and management of highways and roads

Nyerges and Dueker (1988) outline three levels at which GIS can play a role in State Transportation functions
handout - GIS and State Departments of Transport

Level I are planning applications that generally relate to the state as a whole
the data needed at this level is coarse and spatial accuracy is not important
aggregated data are preferred to illustrate major trends

Level II are management applications focusing on smaller areas such as a county
this is the level at which traffic safety and pavement management activities are conducted
e.g. pavement data is often collected by taking vertical photographs of the road surface from a moving vehicle every few meters
locations can now be determined using GPS
photos can be accessed by tying them to a GIS of the road network

Level III are engineering applications requiring very large scale data and high accuracy
projects at this level would cover small project or corridor areas
at this level the GIS would provide input to the preliminary engineering design
as-built plans from completed projects could be added to the state highway database at this scale

REFERENCES
Briggs, D.W., and B.V. Charfield, 1987. "Integrated highway information systems," NCHRP Synthesis 133, Transportation Research Board, National Research Council, Washington, DC.

Fletcher, D., 1987. "Modeling GIS Transportation Networks," Proceedings of URISA 1988, Los Angeles, CA, Vol. 2:84-92.

Golden, B.L. and L. Bodin, 1986. "Microcomputer-based vehicle routing and scheduling software," Computers and Operations Research 13:277-85. Reviews the availability of network analysis modules for microcomputers.

Jones, K. and J.W. Simmons, 1987. Location, Location, Location: Analyzing the Retail Environment, Methuen, New York. A recent volume on spatial analysis techniques in retailing.

Krakiwsky, E.J., H.A. Karimi, C. Harris, J. George, 1987. "Research into electronic maps and automatic vehicle location," Proceedings AutoCarto 8, Baltimore, MD, pp. 572-583.

McGranaghan, M., D.M. Mark and M.D. Gould, 1987. "Automated provision of navigation assistance to drivers," The American Cartographer 14:121-38. Reviews current technology and examines the issues in design of effective user interfaces.

Nyerges, T.L., and K.J. Dueker, 1988. "Geographic Information Systems in Transportation," US Department of Transportation, Washington, DC. Report describes the potential use of GIS in State Transportation offices and the types of data and functionality that would be required.

H.W. Taylor, W.R. Code and M.F. Goodchild, 1986. "A housing stock model for school population forecasting," Pr

DECISION MAKING USING MULTIPLE CRITERIA
A. INTRODUCTION
Goals of this unit

B. SPATIAL DECISION MAKING
Examples of spatial decision making
General steps involved in traditional approach
Assumptions involved with this type of analysis
Example 1: The fire station location problem
Example 2: Land suitability assessment
General observations
Conclusion

C. MULTIPLE CRITERIA AND GIS

D. THE CONCEPT OF NONINFERIORITY

E. BASIC MULTIPLE CRITERIA SOLUTION TECHNIQUES

F. GOAL PROGRAMMING
Choose criteria and assign weights
Build a concordance matrix
Summary

G. WEIGHTING METHOD

H. NORTH BAY BYPASS EXAMPLE
Impact factors
Alternative routes
Combination of factors
Weighting
Concordance analysis
Results

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

This unit begins a three part module introducing concepts and techniques of spatial decision-making. Although it is far from a complete coverage of the topic, it will provide students with a sampling of the kinds of decision-making activities GIS will be required to support.

UNIT 57 - DECISION MAKING USING MULTIPLE CRITERIA

Compiled with assistance from C. Peter Keller, University of Victoria, Canada

A. INTRODUCTION

an introduction to the topic of multiple criteria analysis

deals with the potential integration of quantitative multiple criteria analysis and GIS

GIS has the potential to become a very powerful tool to assist in multiple criteria spatial decision making and conflict resolution

some GIS have already integrated multiple criteria methods with reasonable success (for example TYDAC''s SPANS system)
it is anticipated that other vendors will integrate multiple criteria methods in the near future

Goals of this unit

to introduce students to the concept of multiple criteria decision making

to outline some of the simpler strategies developed to solve multiple criteria problems

to demonstrate the potential applicability of GIS

B. SPATIAL DECISION MAKING
Examples of spatial decision making

identify shortest path that connects a specified set of points
e.g. for power line route, vehicle scheduling

identify optimal location of a facility to maximize accessibility
e.g. retail store, school, health facility

identify parcel of land for commercial development which maximizes economic efficiency

General steps involved in traditional approach

1. identify the issue
2. collect the necessary data

3. define the problem rigorously by stating:

objectives
assumptions
constraints

if there is more than one objective:
define the relationship between objectives by quantifying them in commensurate terms, i.e. express each objective in the same units, usually in dollars
e.g. wish to minimize both cost of construction and impact on environment
must express environmental impact in dollars, e.g. cost of averting impact
then collapse the objectives into one objective
e.g. minimize sum of construction and environmental costs

4. find appropriate solution procedure
5. solve the problem by finding an optimal solution

Assumptions involved with this type of analysis

the objectives can be expressed in commensurate terms

the problem can be collapsed and simplified into a single objective for analysis

decision makers agree on the relative importance of the commensurable objectives

however, these assumptions don''t necessarily hold, consider the following examples:

Example 1: The fire station location problem

Problem: to locate a new fire station in a city (Schilling, 1976)
Objectives: maximize coverage of population maximize coverage of real estate

something is "covered" if it is within an established response time of a fire station, e.g. 3 minutes

Conflict: most valued real estate is not necessarily located where most people reside
most valued real estate in downtown and industrial areas
people live in the suburbs
objectives are in spatial conflict

Solution: traditional approach requires that the two objectives be collapsed into one by defining a relationship between the value of real estate and the value of life
but the two objectives are noncommensurate
can''t place a monetary value on a human life

Example 2: Land suitability assessment

Problem: suitability evaluation of a number of sites for commercial development
Objectives: maximize economic efficiency minimize environmental impact

Conflict: decision makers have to express environmental quality in terms of economic efficiency (monetary values)

different interest groups will value environment differently
no consensus, therefore can''t assess environmental quality in monetary terms
objectives are again noncommensurate

General observations

in the real world, decision making problems rarely collapse into a neat single objective
diagram

in this classification of real world spatial decision- making problems, most fall in the bottom right cell
real world problems are inherently multiobjective in nature
consensus rarely exists concerning the relationships between the various objectives

Conclusion

more appropriate to identify and maintain the multiple criteria nature of real world problems for analysis and decision making

decision makers are frequently interested in the trade off relationship between the various criteria
this allows them to make the final decisions in a political environment
e.g. trading total population covered for total value of real estate covered

Example 2: Land suitability assessment
Solution: Identify and map the different land uses, land assessments and environmental impacts on separate layers

construct several combinations of overlays based on various priorities
derive suitability surfaces for the different combinations of priorities
let politicians make the ultimate choice

C. MULTIPLE CRITERIA AND GIS

a GIS is an ideal tool to use to analyze and solve multiple criteria problems
GIS databases combine spatial and non-spatial information
a GIS generally has ideal data viewing capabilities - it allows for efficient and effective visual examinations of solutions
a GIS generally allows users to interactively modify solutions to perform sensitivity analysis
a GIS, by definition, should also contain spatial query and analytical capabilities such as measurement of area, distance measurement, overlay capability and corridor analysis

D. THE CONCEPT OF NONINFERIORITY

overhead - Noninferiority

the figure shows the objective space for a two objective problem - the fire station problem
two objectives, real estate and population coverage, are represented by the two axes of the graph
the shaded area represents the set of all possible feasible locations (subject to constraints of cost, distance etc.)

P1 represents the solution which optimizes coverage of population alone

P2 represents the solution which optimizes coverage of real estate

a site is noninferior if there exists no alternative site where a gain could be obtained in one objective without enforcing a loss in the other
P3 represents a feasible solution which is NOT noninferior
P3 can move vertically to improve population coverage without changing real estate coverage
solutions exist which are better than P3 on one axis (one objective) without necessarily being worse on the other axis

the dark curved line represents the set of noninferior solutions
P4 is an example of a noninferior solution
to improve on P4 for one objective requires a loss on the other objective

the set of noninferior solutions is the set of best compromise solutions or the "trade-off curve" in welfare economics
any point on the "trade-off curve" represents a point of Pareto optimality
a solution point where no one objective can be improved upon without a sacrifice in another objective

P4 cannot move vertically to improve population coverage
must slide along trade-off curve
movement upwards along the curve will imply a change (loss) in the real estate objective
P4 therefore is a Pareto optimal or a noninferior solution point

Example 1: Fire station location problem
Solution: Identify the set of all possible sites for the new fire station that represent noninferior solutions

for each noninferior solution, examine the trade off between covering more lives relative to more real estate
make the final and informed decision in the political environment

E. BASIC MULTIPLE CRITERIA SOLUTION TECHNIQUES

are a number of possible approaches to defining the noninferior solution set
1. Preference oriented approaches:

derive a unique solution by specifying goals or preferences
this technique assumes the set of possible solutions is known and small
an example is goal programming

2. Noninferior solution set generating techniques:
derive the entire set of noninferior solutions and leave the choice to the decision-maker
these techniques are used when a very large number of options exist
many of these many not be part of the noninferior set, thus this allows the number of options to be reduced to a limited set
an example is the weighting method

F. GOAL PROGRAMMING

one of the oldest and most well-known multiobjective research methods

generally utilized where there are a number of competing goals or objectives
Example 2: Land suitability assessment

given a set of parcels of land, identify which best suits a set of development or search criteria
the overall aim is to meet all the criteria or goals to the greatest extent possible, to choose the most desirable plan from a set of possible options

Choose criteria and assign weights

overhead - Goal programming example - criteria weights
handout - Goal programming example (2 pages)

suppose there are 4 sites to be evaluated

8 criteria have been identified
these likely reflect opinions of different experts, different schools of thought, different objectives
e.g. may wish to maximize profit (developer), to minimize cost (engineer) and to minimize environmental impact (environmentalist)

weights have been given to each criterion to identify its importance
weights must sum to 1
e.g. the developer''s criteria may have a weight equal to the engineer''s and less than the environmentalist''s

each site has been ranked on each of the criteria (see overhead)

Build a concordance matrix

overhead - Goal programming example - Building a concordance matrix

take each ordered pair of alternatives - e.g. sites A and B, pair AB

for each criterion, assign the pair to one of three sets:
where A beats B (concordance set)
e.g. criteria 2 (wt=.1), 4 (.2), 6 (.1), 8 (.1)
where B beats A (discordance set)
e.g. criteria 1 (wt=.1), 3 (.1), 7 (.1)
where A and B tie (tie set)
e.g. criteria 5 (wt=.2)

add up the weights of the cases in each set
if A always beats B on all criteria, all 10 cases will be in the concordance set - total weight will be 1

actual weights for pair AB:
concordance set: 0.5
discordance set: 0.3
tie set: 0.2

concordance for each pair is determined by summing the weights for criteria assigned to concordance set plus half sum of wts for criteria in tie set
for pair AB: 0.5 + 0.1 = 0.6
indicates a slight preference for A over B across all criteria

create a matrix of concordance for each pair
overhead - Goal programming example - Full concordance matrix

row is first in pair, column is second

row total yields index of preferability
the larger the index, the more preferred the option

over all criteria, site D is preferred to site C which is preferred to site A which is preferred to site B

note: an example of this process is provided later in this unit

Summary

decision maker is asked to specify goals and relative weightings for the different criteria
use relative weightings to find most preferred site
change weighting to assess sensitivity of solution or to reflect different opinions

G. WEIGHTING METHOD

used when the set of possible solutions is extremely large

identifies or reduces the number of solutions that need to be considered
solution of multi-criteria problem is easier if the contents of the noninferior set are known

this method finds the complete noninferior solution set rather than a single solution
final selection is left to decision-makers

strategy:
combine the criteria using a range of different weightings for each criteria - range from 100% on only one criteria to 100% on the other
find best solutions for each combination
due to the number of combinations that must be evaluated, this is not generally practical for more than 2 criteria

note the weighting method does not guarantee that all solutions in the noninferior set will be found
number found depends on how many combinations of weights are used

H. NORTH BAY BYPASS EXAMPLE

this section is drawn from B.H. Massam''s book Spatial Search which includes many examples of complex spatial decision-making

a new route is needed for Ontario Highway 11 around the city of North Bay

this study conducted by Ontario Ministry of Transportation and Communications is similar in methodology to many highway routing studies

many of these studies use GIS or automated mapping systems to analyze multi-layer databases

routing studies follow a common strategy:
identify factors which are important in evaluating impact of route
identify a small number of feasible routes
evaluate each route on each of the impact factors
reach a decision by combining impact factors on some systematic basis

this study is a particularly good example of the general strategy

Impact factors

total of 35 criteria
grouped into 7 clusters

overhead - North Bay bypass study - Criteria clusters
"Direct Cost" cluster includes construction and property costs
"Traffic Service" cluster evaluates effectiveness of route from a traffic engineering viewpoint, includes number of miles with >2% grade

"Community Planning" cluster evaluates routes against common planning criteria, including amount of land for potential development which will have improved access as a result of the highway
"Neighborhood and Social Impact" cluster includes many factors measuring impact on local communities

Alternative routes

9 alternatives identified

each alternative is a complete route, evaluated as such

two or more alternatives may share long stretches of common route, differ only in sections

Combination of factors

factors evaluated by a Technical Advisory Committee
all major clusters represented by different members e.g. direct cost cluster represented by engineers, accountants, managers e.g. neighborhood an

LOCATION-ALLOCATION ON NETWORKS
A. INTRODUCTION
Network problems
Location-allocation problems
Objectives
Applications

B. EXAMPLE - OIL FIELD BRINE DISPOSAL
Brine disposal
Disposal options
The location-allocation problem

C. COSTS
Pipe cost
Truck cost
Disposal well cost

D. GIS IMPLEMENTATION

E. LOCATION-ALLOCATION ANALYSIS MODULE
Sensitivity analysis
Problems with link-node models

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

UNIT 58 - LOCATION-ALLOCATION ON NETWORKS

A. INTRODUCTION

Network problems

a network can be represented digitally by nodes (junctions) and links (connections between nodes)
common networks include streets in a city, airline routes, railroads

a GIS is a convenient way of storing information about a network

a large number of analytical problems have been developed for networks, e.g.:
"shortest path problem" - algorithms to find the shortest route through the network between given origin and destination
"traveling salesman problem" - algorithms to find the shortest tour through a given set of destinations, beginning and ending at a given origin
"transportation problem" - find the pattern of shipments of goods from a number of factories to a number of outlets which will minimize total shipping cost
"traffic assignment problems" - given the numbers of trips to be made between origins and destinations, predict how traffic will allocate itself to a network, i.e. how many vehicles will use each route
numerous other problems in vehicle routing and scheduling

some of these, e.g. shortest path problems, have been incorporated into GIS products, e.g. ARC/INFO''s NETWORK, Caliper''s TRANSCAD

others can be used as stand-alone packages in conjunction with a GIS
the GIS provides the input, output, display, simple analysis functions
the stand-alone package provides the algorithm to solve the particular problem

this unit examines an example of network problems

Location-allocation problems

concern the provision of a service to satisfy a spatially dispersed demand

demand for the service exists at a large number of widely dispersed sites

impossible to provide the service everywhere
e.g. every household needs a source of groceries, but impossible to provide a grocery store at each household

for reasons of cost (economies of scale) service must be provided from a few, centralized locations ("sites")
sometimes the number of sites is known in advance, e.g. McDonalds wishes to locate 3 restaurants in city x
in other cases the optimum number of sites is one aspect of the solution

two elements to the problem:
1. Location

where to put the central facilities (and possibly how many, how big)

2. Allocation
which subsets of the demand should be served from each site ("trade areas", "service areas")

Objectives

important components:
cost of operating the facilities - includes construction, operating costs - may be independent of locations chosen
cost of travel to and from facilities - may be absorbed by the consumer or the provider depending on the context
quality of service
e.g. important in providing emergency fire service which is dependent on the response time of the fire truck

different objectives define different versions of the location-allocation problem

Applications
retailing - locations of stores, restaurants
emergency facilities - ambulances, fire stations
schools
warehouses
regional offices of government departments
recreation facilities - public pools

B. EXAMPLE - OIL FIELD BRINE DISPOSAL

this is an example of both a location-allocation problem and the use of a network model

concerns waste disposal for the Petrolia, Ontario oil field which has been producing oil since 1850s

oil extraction from the field generates large quantities of waste fluid
waste fluid has been increasing as the field has become depleted
waste fluid or "brine" is a salty, smelly fluid
brine may be 90%-97% of total volume extracted, only 3%-10% oil

14 active producers in the field
each producer may operate up to 30 wells
each producer operates an oil collection facility to which all liquids from that producer''s wells are piped

oil and brine are separated by each producer at the collection facility using simple gravity separation

oil is shipped to the refinery by truck

Brine disposal

brine disposed of by individual producers
some of the methods used may lead to violations of provincial pollution standards
brine may run onto fields or into surface watercourses
thus need a better disposal method

only effective method of disposal is by pumping to a geological formation below the oil producing layer
alternative methods are too expensive or impractical, e.g. purification by reverse osmosis, evaporation in holding ponds

Disposal options

options include: 1. a single, central disposal facility
minimum capital cost
maximum transport cost 2. requiring each producer to install a facility
maximum capital cost
zero transport cost 3. some intermediate configuration of shared facilities

The location-allocation problem

find locations for one or more central facilities and allocate producers to them in order to minimize the total of capital and transport costs

two alternatives for transport of waste brine to central facilities: pipe and truck
assume that both transport routes would follow the same network

C. COSTS

handout - Brine disposal study (2 pages)
overhead - Brine disposal study costs

Pipe cost

must pay for pipe over its expected lifetime, plus cost of pumping brine through pipe

Truck cost

must pay for holding tanks for brine, with sufficient capacity to allow for delays in winter, plus cost of loading and unloading truck, and estimated driving time

Disposal well cost

includes cost of installing disposal well and running pump
porosity of formation varies, so there is a risk of failure in a drilled disposal well

new well - $50-$75,000

success rate 60-80%

brine contains dense hydrocarbons - waxes - which will build up over time and block the well

problem with corrosion of pipes due to high acidity of brine

D. GIS IMPLEMENTATION

data structure defines
network of streets and rights of way - potential routes for trucks/pipes
links with attributes of length
nodes with attributes of volume produced - nodes include producer sites plus other potential well locations

GIS database with nodes and links and associated attributes provides:
data input functions (editing)
data display - graphics, plots
storage of geographic data
data to be passed to the analysis module

analysis module interacting with GIS database
obtains nodes and links from the GIS
performs analysis, reports results directly to the user
includes several heuristic methods for solving the optimization problem
allows the user access to the display/analysis functions of the GIS

an analysis module supported in this way by a GIS database provides a primitive spatial decision support system (SDSS) tailored to this specific, advanced form of spatial analysis

see Unit 59 for more on spatial decision support systems

E. LOCATION-ALLOCATION ANALYSIS MODULE

overhead - Location-allocation analysis module
1. Finds shortest paths between points on network (could be a GIS function)

2. Defines and modifies model parameters (e.g. components of pipe and truck cost equations)

3. Uses shortest paths and parameters to calculate transport costs by each mode

4. Searches for optimum solution using add, drop and swap heuristics add - start with no facilities, at each step place facilities in location which best improves objective drop - start with facilities at every node, at each step drop the facility which produces least deterioration in the objective swap - try to improve the objective by moving facilities from one node to another

5. Evaluates solutions and displays results

overhead - Brine disposal options costs

Sensitivity analysis

many parameter values are uncertain
e.g. cost of installing pipe, lifetime of pipe and wells

important to know effect of uncertainty on results
e.g. if pipe cost doubles, what will be impact on results?

in sensitivity analysis, parameter values are changed one at a time to determine its effect on solutions
overhead - Sensitivity analysis

in each case, first line gives value assumed for option d, wells at producer locations
subsequent lines give effect of changing the parameter
e.g. increasing pipe cost leads to greater number of facilities

Problems with link-node models

some spatial decisions involving networks do not work well with the standard link-node model
may need to put a facility or event anywhere on the network not just at intersections

thus need the ability to identify a location along links

this may be done by:
identifying location by its distance along a link from a node
thus network is not a set of links and nodes but an addressing system using link number and distance
breaking a link at a given location to form a new node and 2 links
e.g. "dynamic segmentation" if the break is temporary

REFERENCES
Ghosh, A. and G. Rushton, 1987. Spatial Analysis and Location- Allocation Models, Van Nostrand, Reinhold, New York. Includes many applications of location-allocation methods.

Golden, B.L. and L. Bodin, 1986. "Microcomputer-based vehicle routing and scheduling software," Computers and Operations Research 13:277-85. Reviews the availability of network analysis modules for microcomputers.

Goodchild, M.F. and J.A. Donnan, 1987. "Optimum location of liquid waste disposal facilities: formation fluid in the Petrolia, Ontario oilfield," in M. Chatterji, Editor, Hazardous Materials Disposal: Siting and Management, Gower, Aldershot, UK, pp 263-73.

SPATIAL DECISION SUPPORT SYSTEMS
A. INTRODUCTION

B. DEFINITIONS AND CHARACTERISTICS
Decision support systems

C. SPATIAL DECISION-MAKING
Example: site selection for a retail store

D. SDSS ARCHITECTURE
Data Base Management System
Model Base Management System
Graphical and Tabular Report Generators
User Interface

E. DEVELOPMENT OF DSS
Three levels of technology
Five functional roles

F. CURRENT STATUS OF SDSS

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES

UNIT 59 - SPATIAL DECISION SUPPORT SYSTEMS

Compiled with assistance from Paul Densham, State University of New York at Buffalo

A. INTRODUCTION

multiple criteria methods allow for the presence of more than one objective or goal in a complex spatial problem
however they assume that the problem is sufficiently precise that the goals and objectives can be defined

many problems are ill-structured in the sense that the goals and objectives are not completely defined

such problems require a flexible approach
the system should assist the user by providing a problem-solving environment

spatial decision support systems (SDSS) are designed to help decision-makers solve complex spatial problems

GISs fall short of the goals of SDSS for a number of reasons:
analytical modeling capabilities often are not part of a GIS
many GIS databases have been designed solely for cartographic display of results - SDSS goals require flexibility in the way information is communicated to the user
the set of variables or layers in the database may be insufficient for complex modeling
data may be at insufficient scale or resolution
GIS designs are not flexible enough to accommodate variations in either the context or the process of spatial decision-making

SDSS provide a framework for integrating: 1. analytical modeling capabilities 2. database management systems 3. graphical display capabilities 4. tabular reporting capabilities 5. the decision-maker''s expert knowledge

GISs normally provide 2, 3 and 4
the addition of 1 and 5 create a SDSS

B. DEFINITIONS AND CHARACTERISTICS
Decision support systems

spatial decision support systems have evolved in parallel with decision support systems (DSS)
DSS developed for business applications (corporate strategic planning, scheduling of operations, etc.)

DSS literature contains a substantial body of theory and a large number of applications
literature can be used to guide the design, development, implementation and use of SDSS
texts on DSS include: Bonczek, Holsapple and Whinston, 1981; Sprague and Carlson, 1982; and House, 1983

many definitions of DSS require the presence of certain characteristics

e.g. Geoffrion''s definition requires 6 characteristics: 1. designed to solve ill- or semi-structured problems, i.e. where objectives cannot be fully or precisely defined 2. have an interface that is both powerful and easy to use 3. enable the user to combine models and data in a flexible manner 4. help the user explore the solution space (the options available to them) by using the models in the system to generate a series of feasible alternatives 5. support a variety of decision-making styles, and easily adapted to provide new capabilities as the needs of the user evolve 6. problem solving is an interactive and recursive process in which decision making proceeds by multiple passes, perhaps involving different routes, rather than a single linear path

these characteristics also define a SDSS

in addition, in order to effectively support decision- making for complex spatial problems, a SDSS will need to:
provide for spatial data input
allow storage of complex structures common in spatial data
include analytical techniques that are unique to spatial analysis
provide output in the form of maps and other spatial forms

C. SPATIAL DECISION-MAKING

many spatial problems are complex and require the use of analysis and models

many spatial problems are semi-structured or ill-defined because all of their aspects cannot be measured or modelled

Example: site selection for a retail store

objective is to pick the site which will maximize economic return to the company

return is affected by:
number of potential customers within market area
accessibility of the site (e.g. is it on a main street? is it possible to turn left into the site?)
visibility, signage, appearance
cost of site and construction

some of these factors are difficult to evaluate or predict

relative impacts of each of these factors on return may be unknown (except the last - direct cost)

impossible to structure the problem completely - i.e. define and precisely measure the objective for every possible solution

retail site selection problem is ill-structured

a system to support retail site selection must be flexible
allow new factors to be introduced
allow the relative importance of factors to be changed to evaluate sensitivity or to reflect differences of opinion
display results of analysis in informative ways

solutions to this class of problems often are obtained by generating a set of alternatives and selecting from among those that appear to be viable

thus, the decision-making process is iterative, integrative and participative
iterative because a set of alternative solutions is generated which the decision-maker evaluates, and insights gained are input to, and used to define, further analyses
participative because the decision-maker plays an active role in defining the problem, carrying out analyses and evaluating the outcomes
integrative because value judgements that materially affect the final outcome are made by decision-makers who have expert knowledge that must be integrated with the quantitative data in the models

D. SDSS ARCHITECTURE

Armstrong and Densham (1990) suggest that five key modules are needed in a SDSS: 1. a database management system (DBMS) 2. analysis procedures in a model base management system (MBMS) - defined later 3. a display generator 4. a report generator 5. a user interface

to the programmer, this modularity facilitates software development
to the SDSS user, the system appears to be a seamless entity

overhead - SDSS architecture
one architecture for an SDSS is shown
the five software modules are represented by the boxes on the left of the diagram with the user interface, an expert system shell, encompassing the other modules
the arrows between the modules depict flows of data and information
the right-hand part of the diagram shows the interaction with the user who receives and evaluates output (alternative solutions) from the system which is either accepted as a solution or used to define new analyses

Data Base Management System

GIS database management systems are designed to support cartographic display and spatial query

database of an SDSS must support cartographic display, spatial query and analytical modelling by integrating three types of data: 1. locational (spatial primitives such as coordinates and chains) 2. topological (attribute-bearing objects, e.g. points, nodes and lines, and relationships between them) 3. thematic (attributes of the topological objects, including population, elevation, and vegetation)

database must permit the user to construct and exploit complex spatial relations between all three types of data at a variety of scales, degrees of resolution and levels of aggregation

database management systems found in many GIS use the relational data model

however, alternative data models have proved effective in applications of DSS
e.g. the extended network model is an enhanced form of the network model and is effective for representing the links and nodes of transportation networks
transportation networks are a popular base for developing SDSS because of the importance of applications for site selection and the abundance of methods of analysis

handout- Database for site selection
shows the implemented database for a site selection problem
locational component consists of COORD (coordinates), NODE and CHAIN
topological objects are the records POINT, L.A. NODE (possible site), LINE, STATE and CITY
thematic data are the six records on the extreme left of the diagram (LINE DISTANCE, LINE FEATURE, STATE DATA, CITY DATA, POINT FEATURE and NODE DATA)

arrows between the records indicate relationships, both spatial and non-spatial, e.g.:
the 1:1 relation between NODE and COORD means that each node "owns" one coordinate
the 1:N relation between L.A. NODES and NODE DATA indicates that each possible site owns one or more sets of data
the N:M relation between CHAIN and COORD means that each chain is made up of many coordinates and that each coordinate can be part of more than one chain

multiple relations of a given type are indicated by numbers beside the relevant arrows
L.A. NODE owns LINE in two relations, one indicates links to possible sites with lower identifiers, the other to possible sites with higher identifiers

the system set is a construct that provides direct access to records so defined - there is no need to traverse intermediate record types as in other data models
e.g. it is possible to access a coordinate pair record (COORD) directly without accessing any other type of record

Model Base Management System

one approach to incorporating analytical models in geoprocessing systems is to develop libraries of analytical sub-routines
permits large numbers of models to be made accessible very quickly, because existing programs can be patched into a system
wasteful in terms of replicated code

second approach, used in business applications of DSS, is to develop a model base management system (MBMS)
consists of small pieces of code, each of which solves a step in an algorithm
as many of these steps are common to several algorithms, this approach saves large amounts of code
the system developer only has to modify one piece of code to update a step in several algorithms
the MBMS also contains information about how steps are sequenced to execute a given algorithm

using an MBMS facilitates rapid development and testing of new algorithms
implementation may be achieved simply by adding a new formula to the MBMS
in other cases new code for additional steps also may be added to the model-base

Graphical and Tabular Report Generators

should provide the following capabilities:
high-resolution cartographic displays
general-purpose statistical graphics, including two and three-dimensional scatter plots and graphs
specialized graphics for depicting the results from analytical models and sophisticated statistical techniques
the full range of tabular reports normally associated with each of the above

User Interface

must be easy to use if they are to be effective in decision- making

interfaces of many current GIS systems are modelled on those of business systems, using command lines, pull-down menus and dialogue boxes

the move to graphical interfaces for operating systems provides an opportunity for system designers to develop more intuitive interfaces for geoprocessing systems

by using a graphical display for communication between the decision-maker and the system:
icons can be used to represent system capabilities
the user can select parameters, data, output, etc., easily and intuitively
the user may be able to more easily visualize the processes represented within the model

SPATIAL DECISION SUPPORT SYSTEMS
E. DEVELOPMENT OF DSS

Sprague (1980) presents a development framework
three levels of technological development
five functional roles

overhead - DSS development framework
depicts the three levels of technology and the five functional roles

Three levels of technology

DSS technology ranges from simple, specific applications to broadly applicable systems:
1. a specific DSS is a system being used to address a specific problem

2. a DSS generator is a set of mutually compatible hardware and software modules used to implement the specific DSS

3. a DSS toolbox is a set of individual hardware and software items which can be used to build both DSS generators and specific DSS

system vendors and consulting houses who must develop many different decisions systems of broadly similar nature on a recurring basis will build generators and toolboxes that can be adapted for individual clients with specific problems

Five functional roles

the decision-maker is responsible for choosing, implementing and managing the solution

the intermediary sits at a console and interacts physically with the system

the DSS builder configures the specific DSS from the modules in the DSS generator

the technical supporter adds capabilities or components to the DSS generator

the DSS toolsmith develops new hardware and software tools

these five roles may be filled by any number of people, individuals may have more than one function

during the decision-making process, the decision-maker uses output from the system to evaluate interim solutions
the result of this evaluation may be a desire to investigate other aspects of the problem which may require new capabilities to be added to the SDSS
the system is updated as required by people filling the technical functional roles using the three levels of technology

thus a process of system adaptation and evolution occurs rapidly during the decision-making process itself

F. CURRENT STATUS OF SDSS

at this point, SDSS as defined here remains a conceptual framework rather than an implemented strategy
some systems approach a partial implementation of its concepts

several implementations of GIS in forestry have been described as SDSS but do not satisfy the full definitions used in this unit
SDSS is an important standard against which to measure spatial decision-making tools

REFERENCES
Armstrong, M.P. and P.J. Densham, 1990. "Database organization alternatives for spatial decision support systems," International Journal of Geographical Information Systems, Vol 3(1): . Describes the advantages of the extended network model for network-based problems.

Bonczek, R.H., C.W. Holsapple, and A.B. Whinston, 1981. Foundations of Decision Support Systems, Academic Press, New York. Basic text on DSS.

Densham, P.J. and G. Rushton, 1988. "Decision support systems for locational planning," in R. Golledge and H. Timmermans, editors, Behavioural Modelling in Geography and Planning. Croom-Helm, London, pp 56-90.

Geoffrion, A.M., 1983. "Can OR/MS evolve fast enough?" Interfaces 13:10. Source for six essential characteristics of DSS.

Hopkins, L., 1984. "Evaluation of methods for exploring ill- defined problems," Environment and Planning B 11:339-48.

House, W.C. (ed.), 1983. Decision Support Systems, Petrocelli, New York. Basic DSS text.

Sprague, R.H., 1980. "A framework for the development of decision support systems," Management Information Sciences Quarterly 4:1-26. Source for DSS development model.

Sprague, R.H., and Carlson, E.D., 1982. Building Effective Decision Support Systems, Prentice-Hall, Englewood Cliffs NJ. Basic DSS text.

REFERENCES
Ghosh, A. and G. Rushton, 1987. Spatial Analysis and Location- Allocation Models, Van Nostrand, Reinhold, New York. Includes many applications of location-allocation methods.

Golden, B.L. and L. Bodin, 1986. "Microcomputer-based vehicle routing and scheduling software," Computers and Operations Research 13:277-85. Reviews the availability of network analysis modules for microcomputers.

Goodchild, M.F. and J.A. Donnan, 1987. "Optimum location of liquid waste disposal facilities: formation fluid in the Petrolia, Ontario oilfield," in M. Chatterji, Editor, Hazardous Materials Disposal: Siting and Management, Gower, Aldershot, UK, pp 263-73
Impact factors

total of 35 criteria
grouped into 7 clusters

overhead - North Bay bypass study - Criteria clusters
"Direct Cost" cluster includes construction and property costs
"Traffic Service" cluster evaluates effectiveness of route from a traffic engineering viewpoint, includes number of miles with >2% grade

"Community Planning" cluster evaluates routes against common planning criteria, including amount of land for potential development which will have improved access as a result of the highway
"Neighborhood and Social Impact" cluster includes many factors measuring impact on local communities

Alternative routes

9 alternatives identified

each alternative is a complete route, evaluated as such

two or more alternatives may share long stretches of common route, differ only in sections

Combination of factors

factors evaluated by a Technical Advisory Committee
all major clusters represented by different members e.g. direct cost cluster represented by engineers, accountants, managers e.g. neighborhood and social impact cluster by representatives of community groups

each member begins by selecting the cluster most easily understood by him/her
reviews supporting text, maps, tables documenting evaluation of routes on factors in selected cluster
scores each route on each of the factors in the cluster - scale of 0 to 10, 10 is best score, 0 is worst

each member moves to a new cluster, scores it, eventually scores all routes on all factors in all clusters

scores are totaled for each cluster and each route
result is a 7 by 9 matrix for each member of the committee
big differences depending on background of committee member

now total over all members to get one 7 by 9 matrix
implies that all members get equal weight - so membership of committee is crucial

Weighting

how to combine scores from different clusters to get overall evaluation of each route?
overhead - North Bay bypass study - Weighting schemes

results in 9 routes, 7 clusters of evaluation factors, 6 weighting schemes

Concordance analysis

evaluate routes separately for each of the 6 weighting schemes
results in a 9x9 concordance matrix for each of the 6 weighting schemes

gives a matrix of concordances for all pairs of plans

repeat for each weighting scheme

Results

routes 2,7,9 consistently best over all weighting schemes, 8 consistently worst

order of 2,7,9 changes from one scheme to another - 2 is best when cluster 6 is given a high weight

this provides the decision-makers with a limited set of routes to consider
now can proceed with more formal evaluation and public hearings to assess the significance of other factors

REFERENCES
General introduction to multicriteria decision-making:

Cohon, Jared L., 1978. Multiobjective Programming and Planning, Academic Press, Mathematics in Science and Engineering, Vol. 140

Massam, B.H., 1980. Spatial Search. Pergamon, London. Gives many examples of applications of multicriteria methods, in addition to the North Bay study used in this unit.

Rietveld, P. 1980. Multiple Objective Decision Methods and Regional Planning, Studies in Regional Science and Urban Economics; Volume 7, North Holland Publishing Company.

Goal Programming:

Lee, S. M., 1972. Goal Programming for Decision Analysis, Auerbach, Philadelphia. A general introduction to Goal Programming.

The following are examples of applications of Goal Programming:

Barber, G., 1976. "Land-Use Plan Design via Interactive Multi- Objective Programming," Environment and Planning 8:239- 245.

Courtney, J. F., Jr., T.D. Klastorin and T.W. Ruefli, 1972. "A Goal Programming Approach to Urban-Suburban Location Preference," Management Science 18:258-268.

Dane. C.W., N.C. Meador and J.B. White, 1977. "Goal Programming in Land Use Planning," Journal of Forestry 75:325-329.

Weighting Method:

discussed in: Cohon, Jared L., 1978. Multiobjective Programming and Planning, Academic Press, Mathematics in Science and Engineering, Vol. 140.

EXAM AND DISCUSSION QUESTIONS

1. Compare the goal programming and weighting methods in terms of technique, practicality and effectiveness at reaching solutions to difficult problems.

2. Discuss the North Bay study as an exercise in community decision-making. What are its strengths and weaknesses? In what ways did it succeed or fail in involving the community in the decision-making process?

3. How might the methodology of the North Bay study be manipulated or distorted by an unscrupulous agency with a hidden agenda? What can be done to protect against this possibility?

4. One of the advantages of decision-making using GIS is that the effects of changes in criteria can be seen almost immediately, in e.g. search for the best site for an activity. Discuss the impact that this capability might have on the decision-making process. Do you regard this impact as positive or negative?

5. Select a current local planning issue and discuss the decision-making criteria being promoted by various interest groups and individuals.

SYSTEM PLANNING OVERVIEW
A. INTRODUCTION

B. PROBLEM RECOGNITION/TECHNOLOGICAL AWARENESS
Problem recognition
Technological awareness
Supply-push factors
Demand-pull factors
Collecting information on GIS
Project plan

C. DEVELOPING MANAGEMENT SUPPORT
Example - AM/FM Project Life Cycle
Administration of the project

D. NEWPORT BEACH GIS PROJECT
Needs awareness
Management support
Administration of the project
Establishing the automation priorities
Pilot projects

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

The introduction to this unit describes the outline for the next module.

Many of the issues outlined in this unit are illustrated in a 20 minute video, GEOBASE - A Better Way, produced by and available from the City of Newport Beach, California. The video was originally intended for viewing by the City Council and other city officials to show the progress and promise of the GEOBASE system.

UNIT 60 - SYSTEM PLANNING OVERVIEW

Compiled with assistance from Frank Gossette, California State University, Long Beach and Warren Ferguson, Ferguson Cartotech, San Antonio and Ken Dueker, Portland State University

A. INTRODUCTION

in most cases, the design, purchase and implementation of a GIS is a significant commitment in terms of personnel time and money

it is extremely important to understand the issues involved in the development of GISs
these issues will ultimately affect the efficiency and value of the installed GIS

it is possible to identify several stages in the development of a GIS
these can be characterized in several ways
the following general outline serves as an organizing framework for the next 6 units:

development progresses through the following stages
note that these are not necessarily sequential and some may operate concurrently with others

1. Problem recognition and technological awareness
a necessary beginning point

2. Developing management support
critical to the initiation and success of the project

3. Project definition
includes identifying the current role of spatial information in the organization, the potential for GIS, determining needs and products, writing the proposal

4. System evaluation
includes reviewing hardware and software options, conducting benchmark tests, pilot studies and cost benefit analysis

5. System implementation
includes completion of a strategic plan, system development and startup, design and creation of the database, securing on-going financial and political support

this unit looks at the two least formal and unstructured initial stages: needs awareness and building management support

B. PROBLEM RECOGNITION/TECHNOLOGICAL AWARENESS

in order for an organization to become interested in acquiring a GIS, someone or some group within the organization: 1. must perceive that the methods by which they are currently storing, retrieving and using information are creating problems 2. must be aware of the capabilities of GIS technology

Problem recognition

Aronoff (1989) suggests six problems that prompt GIS interest
1. spatial information is out of date or of poor quality

e.g. often land information documents (maps and lists) are seriously outdated and questions regarding the current situation cannot be answered without digging through a stack of "updates" since the last major revisions

2. spatial data is not stored in standard formats
e.g. a city''s parcel maps will often vary in quality from one area to another
one area may have been "flown" and mapped using aerial photography at 1:1000 scale some years ago, but updated by hand drafting
other areas may have been mapped by photographically enlarging 1:24,000 topographic maps, or city street maps of unknown quality, and hand drafting parcel boundaries
maps may have been reproduced by methods which introduce significant errors, e.g. photocopy

3. several departments collect and manage similar spatial data
this may result in different forms of representation, redundancies and related inefficiencies in the collection and management of the data

4. data is not shared due to confidentiality and legal concerns
5. analysis and output capabilities are inadequate

6. new demands are made on the organization that cannot be met within the data and technological systems currently available.

Technological awareness

sometimes the "problem" is simply an awareness of newer technologies that offer a "better way"

King and Kraemer (1985, p.5) distinguish between supply- push and demand-pull factors in leading to awareness and the eventual acquisition of computing technology

Supply-push factors

changes in technological infrastructure
improvements in technological capability
in GIS: improved hardware, software, peripherals; better access to existing digital datasets, e.g. TIGER files
declining price-performance ratios
in GIS: impact of introduction of 286- and 386-based PCs, workstations, reduction in cost of mainframes and minis
improved packaging of technical components to perform useful tasks
in GIS: better (more friendly, more versatile) user interfaces, better applications software

concerted marketing efforts of suppliers
advertising creates an aura of necessity
in GIS: hard not to go with the current trend, in spite of the fact that GIS advertising is probably low-key relative to other areas of EDP
direct contact of salespeople with potential buyers
in GIS: demonstrations at trade shows, presentations at conferences by vendors

long-term strategies of technology suppliers
selective phase-outs - vendor drops support of existing system to encourage new investment
price reductions or outright donations to universities to raise students'' familiarity with product
low-cost or cost-free pilot studies offered by vendors at potential customer''s site
interchange - at present, there are high costs to conversion from one GIS vendor''s system to another''s - customers are "locked in"

Demand-pull factors

endemic demand for accomplishing routine tasks

need for faster and more accurate data handling in report generation, queries, map production, analysis
society''s appetite for information is unlimited
in GIS, there is no upper limit to need for spatial data for decision-making
there is no totally satisfactory minimum level of accuracy for data
more accurate data always means better decisions

institutionalized demand
"keeping current" with technology
maintaining systems on which the organization has become dependent

affective demand
perceived need among organizational actors to exploit the political, entertainment and other potentials of the technology
in GIS: GIS technology is impressive in itself - high quality, color map output, 3D displays, scene generation - GIS output may be perceived to have greater credibility than hand-drawn products

Collecting information on GIS

once the need for GIS is recognized, an individual or group may begin gathering information on GIS in order to develop a management proposal

information will need to be collected on:
the status of existing GIS projects
the direction the GIS industry is moving
the potential applications of GIS in the organization

sources of information include:
personnel within the company
"missionaries" or GIS proponents may have familiarity through educational background, external contacts
industry consultants, system vendors, conversion service companies will be very willing to provide information
industry organizations such as AM/FM International or American Congress on Surveying and Mapping (ACSM) are excellent sources
a growing number of newsletters and magazines are being marketed within the GIS industry

a useful mechanism is a Request for Information (RFI)
sent by the company to all known vendors of GIS software
should ask for:

general company information
system capabilities
hardware and software requirements
customer references
general functional capabilities
example applications
customer support - training and maintenance programs
general pricing information

site visits to operating GIS projects are useful
can observe the daily operations of the project
gain insight from project personnel about system performance and support

Project plan

after consulting with industry experts, visiting other sites, considering corporate objectives, the first level of project definition and planning can occur
project plan should be dynamic, adaptable, refined as better information becomes available
plans will be very general, broad-brush at this stage - a general description of the desire to investigate systems further and a plan for proceeding

for those charged with developing a project plan, it is important to discover who or what is the force behind the interest in GIS
the individuals involved and the significance of the problem are important in determining how to proceed with selling the idea to the organization

SYSTEM PLANNING OVERVIEW
C. DEVELOPING MANAGEMENT SUPPORT

once the need has been identified it is critical to gain support of the decision-makers who will be required to commit support in the way of funding and staff

decision-makers need to be assured that the project will be developed and managed in a sound manner

management will need to know: 1. what GIS is and what it can do for the organization 2. what the costs and benefits of the system will be

a carefully managed development project is critical

Example - AM/FM Project Life Cycle

AM/FM projects tend to be very large (up to $100 million is not unusual)
thus, the process of system planning and implementation must be rigorous in AM/FM because of the size of investment involved

in the AM/FM area, this planning process is called the project life cycle
overhead - AM/FM Project life cycle

is a multi-step approach with well-defined decision points
series of stages provides a generic, structured approach to planning
this recommended sequence has been devised after reviewing numerous alternative methodologies

decision points provide for financial analysis
each decision point allows the project team to analyze progress and future risks before proceeding to the next level of commitment
need to minimize risks while maximizing benefits

Administration of the project

with initial support assured, the project requires strong leadership to implement the system

quite often, the agency realizes that their own people do not possess the expertise nor have the time to fully explore and evaluate the alternatives
in this case, an outside consultant may be brought in to assist in a "needs assessment"
the GIS consulting industry is growing rapidly, and now involves several of the "big 8" major international management consultancies

D. NEWPORT BEACH GIS PROJECT

Newport Beach, California developed one of the early successful urban GISs
the following section reviews the initiation and development of their GEOBASE project
this provides a general introduction to the process of GIS system development

Needs awareness

interest in Geographic Information Systems for multi- purpose cadastral applications arose at about the same time in several major departments of the city
data processing professionals were exposed to the technology at trade shows
the Utility Department saw innovations in AM/FM at the major utility companies and some larger municipalities
city planners were exposed to GIS by attending professional meetings

these and other departments were becoming aware of these newer technologies being successfully implemented in other cities

with a core of interested individuals, an informal committee was formed to study GIS and see what it could do for them

Management support

to gain administrative support for a LIS, the GEOBASE Committee set about educating the major departments within the city about the benefits of GIS and recruiting their support
this included Data Processing (Finance), Utilities, Planning, Building and Safety, Public Works (Engineering), Fire, Police, and even the Library
a series of units and demonstrations were set up to inform departmental personnel of the proposed project

the result of these efforts was a proposal to the City Council and City Manager for funding for an integrated Land Information System
this proposal had the endorsement of all the departments mentioned above
the GEOBASE project was approved

Administration of the project

in Newport Beach, a GEOBASE Steering Committee, comprised of representatives from five departments (Utilities, Planning, Data Processing, Building, and Fire) was established to guide the project''s implementation phases

Establishing the automation priorities

in Newport Beach, it was recognized that while potential benefits to all departments of the city might be realized, difficult decisions needed to be made
concerning the priorities of data entry and application building

land parcel information was the highest priority, with other infrastructure elements (street centerlines, right-of-way, and utility lines) to be entered in the initial conversion effort

importantly, because the City wished to have complete control over the accuracy of the data, it was decided to do the map conversion and data entry in-house

Pilot projects

in the GEOBASE project, two major pilot projects were undertaken during the first year of operation

one took a portion of the city and converted the parcel and infrastructure data as a "Prototype" for the eventual city-wide basemap
this project was useful to determine the best ways of entering the cadastral information (scanning versus digitizing versus coordinate geometry) and for establishing the ground control and accuracy standards for the database

the second project involved digitizing the entire city, block-by-block, from a smaller-scale basemap to be used to revise the City''s General Plan
in this project, valuable skills were gained in map production, establishing symbolization standards for City maps, and dealing with attribute databases
both projects produced useful and highly "visible" results

REFERENCES
Aronoff, S., 1989. Geographic Information Systems: A Management Perspective. WDL, Ottawa. This excellent text includes lengthy discussion of the GIS acquisition process.

Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment. Clarendon, Oxford. Chapter 9 describes the process of choosing a GIS.

King, J.L. and K.L. Kraemer, 1985. The Dynamics of Computing, Columbia University Press, New York. Presents a model of adoption of computing within urban governments, and results of testing the model on two samples of cities.

Lucas, H.C., 1975. Why Information Systems Fail. Columbia University Press.

EXAM AND DISCUSSION QUESTIONS

1. There are over 3000 counties in the US, each with their own needs for LIS and multipurpose cadaster. What factors would you expect to influence the priorities and plans of each county in this area? Design a questionnaire survey that could be used to verify your answer.

2. Compare the circumstances in Newport Beach to those in your local area. Are they similar? How does the state of LIS development

FUNCTIONAL REQUIREMENTS STUDY
A. INTRODUCTION

B. DEVELOPING AN FRS
1. Identify decisions
2. Determine information products needed
3. Determine frequencies
4. Identify data sets required
5. Determine GIS operations required
Scope of the FRS within the organization

C. METHODS FOR CONDUCTING AN FRS
1. Fully internalized
2. Focus group
3. Interviews
4. Questionnaire

D. COMPONENTS OF THE COMPLETED FRS
1. Definitions of information products
2. List of input data sets
3. List of GIS functions required

E. WEAKNESSES OF THE FRS PROCESS
Invalid assumptions
Awareness of GIS
Funding uncertainty
Changing needs
Value of GIS

F. IMPORTANCE OF THE FRS

EXAM AND DISCUSSION QUESTIONS

NOTES

Obtain an FRS from a local government operation to use as an illustration for this unit. Unfortunately, there are no readily available references to support this unit.

UNIT 61 - FUNCTIONAL REQUIREMENTS STUDY

Compiled with assistance from Warren Ferguson, Ferguson Cartotech, San Antonio, TX

A. INTRODUCTION

once management support has been obtained, the next step is a functional evaluation of the current manual process
existing functionality and any new requirements, will be used to define the project scope and basic structure of the implemented GIS

the result of this phase is the Functional Requirements Study (FRS)

the functional requirements study (FRS) is the primary planning document for a GIS installation
it lays out what data is needed, how it must be processed in order to make the necessary reports and products
it forms the basis for a Request for Proposals (RFP)
during installation and system startup, it provides the basic reference guide

very structured methodologies for functional requirements studies have been developed by consulting companies
these proprietary methods provide the basis for some of the competition in the lucrative GIS consulting market
this unit will therefore take a broader viewpoint, not focusing on the mechanics of any one methodology

B. DEVELOPING AN FRS

are best created by working in the opposite direction to the GIS''s processing

1. Identify decisions

begin by identifying the decisions which people in the organization are required to make
what is each person''s area of management responsibility?
what decisions must be made in carrying out that responsibility?

2. Determine information products needed

identify the information products needed to support those decisions

e.g. to schedule service crews, need a map showing locations of service calls

at this point consideration of new methods and products is appropriate
what additional products would be important in supporting each user''s decision-making responsibilities?
how might existing products be modified/improved to support decision-making better?

this process involves users in the project definition process
opens communication channels
helps increase support for the project
allows potential problems to be identified and dealt with prior to commitment to the project

users may not be familiar with GIS technology and its capabilities
need to stress the irrelevance of technology at this stage - simply assume that the necessary technological capabilities exist, and concentrate on determining the user''s needs for its reports and products

3. Determine frequencies

each information product will have an associated frequency
e.g. the service call map must to produced every morning at 8 am

4. Identify data sets required

identify the data sets which must be processed to create the required product
e.g. the service calls come into my office as completed forms giving street addresses and details of the nature of the service request

5. Determine GIS operations required

identify the processes or operations which must be performed on the data to create the products
this step is most likely to require some knowledge of GIS operations
however, it is possible to refer to operations in a generic way, or by analogy to manual operations, without knowledge of GIS technology

Scope of the FRS within the organization

a full FRS gives an organization a significant opportunity to examine its own operations

the investigators should clearly identify the appropriate level at which to interact with each department of the organization
interacting personnel need to be decision-makers and managers, not technical support since the study should focus on the decisions that are made, not on the data and procedures used

an effective FRS requires a large commitment of time

the organization as a whole must be willing to commit the necessary amount of time on the part of its staff
less than full commitment (interruptions, absence from meetings) will destroy the purpose of the FRS

C. METHODS FOR CONDUCTING AN FRS

many alternative methods can be used to elicit the necessary information for the FRS

methods can be ordered by the level of commitment of the organization''s time and the associated cost of the FRS
the following begins with the most costly and works through to the least

choice made will depend on the amount of time/money the organization is willing to commit to the FRS

this depends in turn on the size of the eventual project
e.g. a $2 million project may justify a $100,000 FRS, i.e. a 5% investment in good planning

1. Fully internalized

Procedure:

organization appoints an FRS team from its own staff
FRS team trained by GIS consultant

FRS team coordinates the definition of information products by organization''s staff, act as facilitators

FRS team compiles information and identifies input data sets, functions required to make products under guidance of consultant

consultant prepares final FRS
Advantages:

FRS team combine familiarity of organization''s operations with limited knowledge about GIS and FRS procedure acquired from consultant
Disadvantages:

cost of high level of organizational commitment

2. Focus group

Procedure:

consultant acts as leader at a series of group meetings of organization''s staff
meetings are used to discuss procedures, prepare and edit descriptions of products and define input datasets and system functions

Advantages:

focus group allows consultant to facilitate but leaves work mostly to organization''s personnel

excellent tool for building consensus on what is needed
Disadvantages:

by isolating FRS-related activity to focus group meetings, level of commitment of organization''s staff is lower

3. Interviews

Procedure:

consultant gathers information at interviews, prepares FRS
Advantages:

minimal commitment of organization''s personnel
Disadvantages:

organization has little or no group involvement in FRS

4. Questionnaire

Procedure:

consultant prepares a questionnaire with advice from the organization, circulates it to all appropriate staff
Advantages:

low cost, appropriate for obtaining limited information from a large number of users
Disadvantages:

poor quality of information gathered, no opportunity for discussion

FUNCTIONAL REQUIREMENTS STUDY
D. COMPONENTS OF THE COMPLETED FRS

handout - Functional requirements study example (4 pages)

1. Definitions of information products

see Unit 68 for handouts of products identified in an FRS

products may be maps, reports, lists

for each product need:
frequencies of production
details of input data
processing steps required to make the product
for maps, need associated scales, legends, symbolization details
for lists and reports, need details of formats

useful to prepare rough samples of each product

a large organization may generate descriptions of tens or hundreds of different products

2. List of input data sets

need details of data to estimate input workload
volume, e.g. how many map sheets, how many records, how many attributes?
format, e.g. paper maps, digital tape, survey documents
sources
frequency of update

data sets may be shared between products
e.g. basic street map may be part of many different information products

important to know product priorities
products cannot be generated until data is input, and input may take a long time
some products may be input data for other products, which creates problems in scheduling

3. List of GIS functions required

some functions may be needed only for one or two products

others (e.g. plotting) may be needed for all

also need to include functions for data input, e.g. digitizing

list of functions must make sense to staff with no GIS knowledge

E. WEAKNESSES OF THE FRS PROCESS
Invalid assumptions

the assumptions of the method may be invalid
it may be impossible to separate issues of technology from requirements, e.g. raster vs. vector
it may be impossible to anticipate the information needed to make decisions
it may be difficult to anticipate the decisions that will need to be made if the roles of personnel in the organization are not adequately defined or vary too frequently

can decision-making be reduced to the simple model of analysis of information products?
will the products really be adequate and reliable enough?

Awareness of GIS

varying awareness of GIS in the organization may bias the results

staff will define products based on their personal awareness of GIS, not on an abstract need for information
e.g. staff may be aware of GIS use in a parallel organization, familiar with some of its products
e.g. awareness of 3D perspective views may lead to requests for them, independently of actual value in decision-making process

Funding uncertainty

FRS assumes continued funding over the projection period
can the organization sustain funding over a long implementation period

many organizations find it difficult to commit funds up to 5 years ahead

Changing needs

will the FRS be sufficiently valid at the end of the implementation period?
have to expect changes in the product set long before the system is in full operation
need mechanisms for review and update

Value of GIS

has GIS technology been oversold?
will the production schedule be delayed by data input bottlenecks?

will the costs of the system overrun estimates?
will the technology be obsolete by the time the project is implemented and in full production (up to 5 years may be needed for full database implementation)

F. IMPORTANCE OF THE FRS

despite all the uncertainty, planning, however unreliable, is undoubtedly better than no planning
the exercise of a functional requirements study is beneficial to the organization in focusing discussion of its procedures irrespective of the eventual outcome

management can conduct an initial financial feasibility study
the costs of the existing operation are projected assuming the GIS project is not implemented
these are weighed against the estimated costs of implementing the project, including costs of:
pilot study (if required)
system acquisition
system development
data conversion
duplicate operation during system startup retraining

EXAM AND DISCUSSION QUESTIONS
1. Discuss the methods you would adopt to carry out functional requirements studies for:

a) a National Forest with a staff of 200 and responsibilities ranging from timber sales to management of historical heritage

b) a small consulting firm with a staff of 5 specializing in site selection studies for retailers

c) a One-Call operation answering 200 telephone queries per day about the locations of underground utility facilities likely to interfere with construction projects

2. List and review the assumptions made by the FRS process discussed in this unit

3. In what ways is the GIS FRS process different from any other FRS process in information processing? Do the differences justify a separate approach?

4. Define the input data, products and processing needed for your campus student records system.

5. RFPs and functional requirements studies are often public documents, especially when public agencies are involved. Obtain one from an agency in your area, and discuss its contents using the framework described in this unit.

SYSTEM EVALUATION
A. INTRODUCTION

B. STRATEGIC PLAN

C. REQUEST FOR PROPOSALS (RFP)
Contents of the RFP
Distribution of the RFP
Vendor proposals

D. HARDWARE AND SOFTWARE ISSUES
Software
Hardware

E. SYSTEM CHOICE
Evaluation factors
Two stages of evaluation
The winning proposal
Risk factors

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

UNIT 62 - SYSTEM EVALUATION

Compiled with assistance from Warren Ferguson, Ferguson Cartotech, San Antonio, TX

A. INTRODUCTION

once the functional requirements study is complete and management gives the "go-ahead", the next step is to develop the document which will solicit proposals from interested GIS vendors
this document is the Request for Proposals (RFP)

results from the RFP will produce a number of different GIS options for the organization, each of which will have strong points and weaknesses
at this point, difficult decisions will need to be made in an attempt to match needs with products available in the current marketplace
management will need assurance that the system chosen is the best option available
responses to the RFP will indicate the feasibility of achieving the project''s goals

an open attitude to the relationship with suppliers and the conduct of tests is essential
evaluations must be open to outside scrutiny
decisions may be (and frequently are) challenged by vendors and must stand up in court

this unit examines these aspects of the system evaluation process:
the strategic plan
the RFP
hardware and software issues
system choice and reducing the risks

B. STRATEGIC PLAN

a strategic plan is essential in defining the limits of the project
is important in providing guidance for many later decisions
provides a level of planning above that of the FRS, less specific to the system

decisions are made regarding the scale of the desired project
will it be a small departmental activity or will it be integrated into operations of the whole organization?
will it be centralized or distributed?

how many people will be using the system at full implementation?

need to address which activities need to be automated and which, if any, should remain manual

how fast should acquisition of the system proceed?

what are the priorities of data input, software development and output?

should the project development be directed by a consultant or by in-house committees?

how will the project be funded?

C. REQUEST FOR PROPOSALS (RFP)

the functional requirements study along with decisions made for the strategic plan form the basis for the request for proposals
success of the RFP is directly proportional to the quality of the analysis on which it is based

Contents of the RFP

handout - Extracts from an RFP (2 pages)

the RFP describes in detail the following aspects:
nature of the proposed database
sources of database contents
required functions to create and manipulate the database
specification of required products, including frequency

specifies the functional requirements for the project, not the specific technical processes underlying the functions
must allow vendor to adapt capabilities of system to the organization''s specific requirements
e.g. must not specify raster vs. vector, or other data structure alternatives, but allow vendor to choose most appropriate
the RFP must allow the vendor to determine the best configuration to satisfy the user''s requirements
what size of CPU
how many input devices - digitizers, scanners etc.
how many output devices
what software options and enhancements
an RFP which is too rigid may exclude potential suppliers

the details of the required proposal are made very clear:
defines all the requirements
outlines the form of response expected and format requirements
sets deadlines

Distribution of the RFP

the RFP starts the formal relationship between organization and suppliers

the RFP is sent to all interested suppliers
potential suppliers can be identified by polling, or by inviting response to an RFI (request for information) or RFQ (request for qualifications)

potential suppliers might be invited to a preliminary meeting to ask questions, reach agreement that it is worth proceeding further
cost to vendor in responding to an RFP can be high, need to make sure it is worthwhile

conventional approach is to distribute RFP, make first cut of vendors based on proposals received in response, then proceed with more detailed evaluations of the selected systems
in the early days of GIS (pre-1984) it was common to receive very few (two or three) responses to an RFP, particularly if the RFP was detailed, because of the poor level of software development in the industry
GIS industry has now advanced to the point where six to ten responses might be expected to an RFP for a large (multi-million dollar) project

Vendor proposals

respond in detail to the customer''s requirements

include details of proposed system configuration
software
hardware
network and communications
workstations and digitizers
maintenance and training
costs

vendors may have relatively poor data on rates of throughput for specific configurations
possible that the proposal is either under- configured (cannot meet the required workload) or over-configured (excess capacity)
further tests, such are benchmarks (see Unit 63) are often required to reduce these uncertainties as much as possible

SYSTEM EVALUATION
. HARDWARE AND SOFTWARE ISSUES
Software

the proliferation of GIS software available makes the choice of a single system difficult (see Unit 24)
however, there is no single best software for any particular application or organization
mandates, decision-making processes and data and product requirements make each installation unique

software choices that will need to be considered by an organization include:
sophisticated applications-specific modifications of standard packages
systems with built-in customization options
immature systems with great potential for innovation

different capabilities with regard to data model, functionality, output, database management system, etc.
will each affect the overall operation of the GIS significantly and will need to be individually evaluated and compared

Hardware

decisions made with respect to hardware issues determine:
number of people that can work at one time
size of projects that can be handled
cost of purchasing and maintaining the equipment
need for a computer systems manager
start-up effort
update potential
vendor support and stability

many of these issues will be addressed by the technical requirements laid out in the FRS and RFP

however, there will be several trade-offs required in the final decision

E. SYSTEM CHOICE

evaluation requires balancing many factors

Evaluation factors

costs of hardware and software - will vary despite identical functionality

speed and capacity of hardware

quality and costs of support

supplier''s background
in addition to system capabilities, it is also necessary to evaluate suppliers on:
financial stability
position in the marketplace
reports from other users about quality of support
references are a useful way of obtaining this information
appropriate customer references should be supplied by each vendor

Two stages of evaluation

does the vendor''s proposal live up to the vendor''s own claims?

how does the vendor''s proposal rate against other proposals?

The winning proposal

must be good enough to get the project funded

winning vendor and customer may need to work together in making final presentations to management

justifying selection of supplier is only one part of winning project approval
however a well-managed selection process is more likely to lead to a successful project

Risk factors

each vendor''s system has certain risks associated with its implementation
the vendor''s product may not live up to expectations
e.g. the hardware configuration may be insufficient for the planned workload
e.g. the software may not carry out the functions as claimed

many risks are associated with the project and become part of the final decision-making
several of these risks and uncertainties regarding hardware and software issues have already been pointed out

other risks are much more subtle
e.g. since many vendors are US-based, foreign organizations must consider the stability of the value of the local currency against the US dollar

the typical planning horizon for a GIS project is 5 years
most factors are very difficult or impossible to forecast this far ahead

however good the planning, there is a risk that the system will not satisfy the end-users
in fact the winning vendor''s system may fall short of requirements in several key areas
it may be necessary to modify the system definition because of limited vendor capabilities - some products may have to be dropped
in other cases, the final contract should require the vendor to develop software to deal with these problems

when additional software development is required, the contract must include deadlines and penalties because success is heavily dependent on the additional software being supplied on time and fully debugged
this situation is still common because of the immature state of the GIS industry

in view of these risks, an investment of 5% or even 10% of project costs in planning and system evaluation is more than justified

organizations wishing to reduce these risks further may conduct one or more additional sophisticated, though costly, procedures before making the final commitment
these include:
benchmark tests
pilot studies
cost benefit analyses

REFERENCES
Forrest, E., G.E. Montgomery and G.M. Juhl, 1990. Intelligent Infrastructure Workbook: A Management-Level Primer on GIS, A-E-C Automation Newsletter, P.O. Box 18418, Fountain Hills, AZ 85269-8418.

Guptill, S., 1988. "A process for evaluating GIS," USGS Open File Report 88-105. The report of the Federal Interagency Coordinating Committee on Digital Cartography (FICCDC) on GIS evaluation.

Smith, D.R., 1982. "Selecting a turn-key geographic information system using decision analysis," Computers, Environment and Urban Systems 7:335-45.

EXAM AND DISCUSSION QUESTIONS

1. Review the approach to system selection documented in Smith (1982). What are the arguments for and against the rigorous decision-theoretic approach used in this paper?

2. Discuss the steps in planning and choosing a GIS system. What are the risks associated with a project, and how are these reduced in the project lifecycle approach?

3. "The best-laid plans of mice and men...". Despite the use of a well-defined framework, mistakes inevitably happen in the best-designed projects. Discuss the weaknesses in the approach described in these units.

BENCHMARKING
. INTRODUCTION
Two types of benchmarking
Benchmark script

B. QUALITATIVE BENCHMARKS

C. QUANTITATIVE BENCHMARKS
Performance evaluation (PE)
Subtasks for GIS PE
Requirements for a quantitative benchmark
GIS PE is more difficult

D. EXAMPLE MODEL OF RESOURCE UTILIZATION
Subtasks
Products and data input
Frequency required
Execution of tasks
Prediction
Forecast
Summary of phases of analysis

E. APPLICATION OF MODEL
Three phases of benchmark
Qualitative benchmark
Quantitative benchmark
Model

F. LIMITATIONS

G. AGT BENCHMARK EXAMPLE
Project Background

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

This unit is contains far more information than can possibly be covered in a single lecture. The middle sections, D and E, contain a detailed technical review of a benchmark model. Depending on the abilities and interests of your students you may wish to omit these sections and move on to the description of the AGT benchmark in section G, or focus the lecture on the technical aspects and omit the descriptive example.

UNIT 63 - BENCHMARKING

A. INTRODUCTION

benchmarking is a key element in minimizing the risk in system selection
often the customer does not have precise plans and needs - these will be determined to some extent by what the GIS industry currently has to offer
no vendor''s product yet meets the requirements of an ideal GIS
customer needs reassurance - real, live demonstration - that the system can deliver the vendor''s claims under real conditions
the GIS industry is still young, there are two few success stories out there

a benchmark allows the vendor''s proposed system to be evaluated in a controlled environment
customer supplies data sets and a series of tests to be carried out by the vendor and observed by the customer
an evaluation team is assembled and visits each vendor, performing the same series of tests on each system
tests examine specific capabilities, as well as general responsiveness and user-friendliness

reinforces the written response from the vendor by actual demonstration of capabilities
demonstration is conducted in an environment over which the customer has some control - not completely at the vendor''s mercy as e.g. at trade show demonstrations

equipment is provided by the vendor, data and processes must be defined by the customer

a benchmark can be a major cost to a vendor - up to $50,000 for an elaborate benchmark
in some cases part of these costs may be met by the customer through a direct cash payment

Two types of benchmarking

qualitative benchmark asks:
are functions actually present?
do they live up to expectations?
are they easy to use?

quantitative benchmark asks:
does the proposed configuration have the necessary capacity to handle the planned workload?

Benchmark script

handout - Benchmark script example (2 pages)

benchmark uses a script which
details tests for all of the functions required
permits both:
subjective evaluation by an observer (qualitative)
objective evaluation of performance (quantitative)

must allow all of the required functionality to be examined

failure of one test must not prevent remainder of test from being carried out

must be modular
customer must be able to separate the results of each test

conditions must be realistic
real data sets, realistic data volumes

B. QUALITATIVE BENCHMARKS

in the qualitative part of the benchmark it is necessary to evaluate the way the program handles operations

functions cannot be evaluated simply as present or absent
overhead - Qualitative assessment

functions are not all equally necessary - they may be:
necessary before any products can be generated, e.g. digitizing
necessary to some products but not others, e.g. buffer zone generation
necessary only to low-priority products, i.e. nice to have

C. QUANTITATIVE BENCHMARKS

in quantitative tests, procedures on problems of known size are executed
analysis of results then establishes equations which can be used to predict performance on planned workload
e.g. if it takes the vendor 1 hour to digitize 60 polygons during the benchmark, how many digitizers will be needed to digitize the planned 1.5 million polygons to be put into the system in year 1?
this is known in computer science as performance evaluation

Performance evaluation (PE)

developed in the early days of computing because of need to allocate scarce computing resources carefully

a subfield of computer science

requires that tasks be broken down into subtasks for which performance is predictable

early PE concentrated on the machine instruction as the subtask
specific mixes of machine instructions were defined for benchmarking general-purpose mainframes
e.g. the "Gibson mix" - a standard mixture of instructions for a general computing environment, e.g. a university mainframe

multi-tasking systems are much more difficult to predict because of interaction between jobs
time taken to do my job depends on how many other users are on the system
it may be easier to predict a level of subtask higher than the individual machine instruction
modern operating systems must be "tuned" to perform optimally for different environments
e.g. use of memory caching, drivers for input and output systems

Subtasks for GIS PE

specifying data structures would bias the benchmark toward certain vendors
e.g. cannot specify whether raster or vector is to be used, must leave the choice to the vendor

similarly, cannot specify programming language, algorithms or data structures

a GIS benchmark must use a higher level of subtask
an appropriate level of subtask for a GIS benchmark is:
understandable without technical knowledge
makes no technical specifications
e.g. "overlay" is acceptable as long as the vendor is free to choose a raster or vector approach
e.g. "data input" is acceptable, specifying digitizing or scanning is not

therefore, a GIS PE can be based on an FRS and its product descriptions, which may have been generated by resource managers with no technical knowledge of GIS

Requirements for a quantitative benchmark

need a mathematical model which will predict resource utilization (CPU time, staff time, plotter time, storage volume) from quantities which can be forecast with reasonable accuracy
numbers of objects - lines, polygons - are relatively easy to forecast
technical quantities - numbers of bytes, curviness of lines - are less easy to forecast

the mathematical form of the model will be chosen based on expectations about how the system operates
e.g. staff time in digitizing a map is expected to depend strongly on the number of objects to be digitized, only weakly on the size of the map (unless large maps always have more objects)
requires a proper balance between quantitative statistical analysis and knowledge about how the procedures operate

GIS PE is more difficult

GIS PE is more difficult than other types of PE because:
uncertainties over the approach to be adopted by the vendor (data structure, algorithms)
high level at which tasks must be specified

difficulty of forecasting workload
no chance of high accuracy in predictions
however even limited accuracy is sufficient to justify investment in benchmark

BENCHMARKING
D. EXAMPLE MODEL OF RESOURCE UTILIZATION

this section describes a mathematical model developed for a quantitative benchmark
overhead - Model of resource utilization

handout - A model of resource utilization

Subtasks

begin with a library of subtasks L
this is the set of all GIS functions defined conceptually
e.g. overlay, buffer zone generation, measure area of polygons, digitize

Products and data input

FRS identified a series of products
identified as R1, R2,...,Ri,...

each product requires a sequence of subtasks to be executed

data input also requires the execution of a series of subtasks for each dataset, e.g. digitize, polygonize, label

Frequency required

each product is required a known number of times per year
Yij is the number of times product i is required in year j

knowledge extends only to the end of the planning horizon, perhaps year 5

Execution of tasks

execution of a subtask uses resources
e.g. CPU, staff or plotter time

these can be quantitatively measured
e.g. CPU time measured in seconds
e.g. staff time in minutes
note: indications are (Goodchild and Rizzo, 1987) that staff time (human) is more predictable than CPU
time (machine) because of complications of computer accounting systems, multitasking etc.

Mak is the measure of resource k used by subtask a
k is one of the resources used
a is one of the subtasks in the library L

Prediction

in order to predict the amount of resources needed to create a product, need to find a mathematical relationship between the amount of resource that will be needed and measurable indicators of task size
e.g. number of polygons, queries, raster cells, lines

Pakn is predictor n for measure k, subtask a

Mak = f(Pak1,Pak2,...,Pakn,...)
e.g. the amount of staff time (Mk) used in digitizing (a) is a function of the number of polygons to be digitized (Pak1) and the number of points to be digitized (Pak2)

the general form of the prediction function f will be chosen based on expert insight into the nature of the process or statistical procedures such as regression analysis
e.g. use the results of the benchmark to provide "points on the curve" with which to determine the precise form of f

Forecast

given a prediction function, we can then forecast resource use during production with useful, though not perfect, accuracy
Wkit is the use of resource k by the tth subtask required for a single generation of product i

Wki = sum of Wkit for all t is the amount of the resource k used by all subtasks in making product i once

Vkj = sum of (Wki Yij) for all i is the amount of resource k used to make the required numbers of all products in year j

Summary of phases of analysis

overhead - Summary of phases of analysis
1. Define the products and subtasks required to make them

2. Evaluate each subtask from the results of the qualitative benchmark

3. Analyze the system''s ability to make the products from the qualitative evaluations in (2) above

4. Obtain performance measures for known workloads from the results of the quantitative benchmark

5. Build suitable models of performance from the data in (4 ) above

6. Determine future workloads

7. Predict future resource utilization from future workloads and performance models, and compare to resources available, e.g. how does CPU utilization compare to time available?

E. APPLICATION OF MODEL

this section describes the application of this model of resource use in a benchmark conducted for a government forest management agency with responsibilities for managing many millions of acres/hectares of forest land

FRS was produced using the "fully internalized" methodology described in Unit 61

FRS identified 33 products
50 different GIS functions required to make them out of a total library of 75

GIS acquisition anticipated to exceed $2 million

Three phases of benchmark

1. data input - includes digitizing plus some conversion of existing digital files
2. specific tests of functions, observed by benchmark team

3. generation of 4 selected products from FRS

these three phases provided at least one test of every required function

for functions which are heavy users of resources, many tests were conducted under different workloads

e.g. 12 different tests of digitizing ranging from less than 10 to over 700 polygons

Qualitative benchmark

each function was scored subjectively on a 10-point scale ranging from 0 = "very fast, elegant, user-friendly, best in the industry" to 9 = "impossible to implement without major system modification"
score provides a subjective measure of the degree to which the function inhibits generation of a product
maximum score obtained in the set of all subtasks of a product is a measure of the difficulty of making the product

Quantitative benchmark

since this was an extensive study, consider for example the quantitative analysis for a single function - digitizing

digitizing is a heavy user of staff time in many systems

delays in digitizing will prevent system reaching operational status
digitizing of complete database must be phased carefully over 5 year planning horizon to allow limited production as early as possible

as stated above, benchmark included 12 different digitizing tasks

resource measure of digitizing is staff time in minutes

predictors are number of polygons and number of line arcs
line arcs are topological arcs (edges, 1-cells) not connected into polygons, e.g. streams, roads
other predictors might be more successful - e.g. number of polygons does not distinguish between straight and wiggly lines though the latter are more time-consuming to digitize - however predictors must be readily accessible and easy to forecast

sample of results of quantitative benchmark: polygons line arcs staff time (mins) 766 0 930 129 0 136 0 95 120

benchmark digitizing was done by vendor''s staff - well- trained in use of software, so speeds are likely optimistic

Model

overhead - Models of time resources required

expect time to be proportional to both predictors, but constants may be different
m = k1p1 + k2p2 m is measure of resource used p is a predictor - p1 is polygons, p2 is line arcs k1, k2 are constants to be determined

Results

the equation which fits the data best (least squares) is:
m = 1.21 p1 + 0.97 p2

i.e. it took 1.21 minutes to digitize the average polygon, 0.97 minutes to digitize the average line arc

to predict CPU use in seconds for the digitizing operation:
m = 2.36 p1 + 2.63 p2

i.e. it took 2.36 CPU seconds to process the average polygon

uncertainties in the prediction were calculated to be 34% for staff time, 44% for CPU time
suggests that humans are more predictable than machines

adding together staff time required to digitize the forecasted workload led to the following totals: Year Time required (minutes) 1 185,962 2 302,859 3 472,035 4 567,823 5 571,880 6 760,395

the average working year has about 120,000 productive minutes in the daytime shift
by year 6 the system will require more than 6 digitizing stations, or 3 stations working 2 shifts each, or 2 stations working 3 shifts each
this was significantly higher than the vendor''s own estimate of the number of digitizing stations
required, despite the bias in using the vendor''s own staff in the digitizing benchmark

F. LIMITATIONS

difficult to predict computer performance even under ideal circumstances
GIS workload forecasting is more difficult because of the need to specify workload at a high level of generalization
the predictors available, e.g. polygon counts, are crude

the model is best for comparing system performance against the vendor''s own claims, as implied by the configuration developed in response to the RFP
it is less appropriate for comparing one system to another

it assumes that the production configuration will be the one used in the benchmark
staff will have equal levels of training
hardware and software will be identical
it is difficult to generalize from one configuration to another - e.g. claims that one CPU is "twice as powerful" as another do not work out in practice

however, any prediction, even with high levels of uncertainty, is better than none
after a quantitative benchmark the analyst probably has better knowledge of system performance than the vendor

G. AGT BENCHMARK EXAMPLE
Project Background

in 1983, Alberta Government Telephones (AGT) had been operating a mechanized drawing system for 5 years
however, lack of "intelligence" in automated mapping system was increasingly hard to justify given growing capabilities of GIS
management was showing interest in updating the record-keeping system

an FRS and RFP for an AM/FM system were developed by a consultant in cooperation with staff
three companies were identified as potential suppliers and a benchmark test was designed

tests included placement and modification of "plant" (facilities), mapping, report generation, engineering calculations, work order generation

tests were designed to be progressively more difficult
all vendors were not expected to complete all tests

data and functional requirements analysis were sent in advance to all vendors for examination
actual benchmark script and evaluation criteria were not sent in advance
vendors were asked to load supplied data in advance of benchmark
methods chosen to load and structure data were part of the evaluation
visits were made to each vendor 5 weeks before the actual benchmark to clarify any issues

providing the data before the script is typical of benchmarks for systems that are primarily query oriented
prevents planning for the queries that are presented in the script
on the other hand, benchmarks for systems that are product oriented will normally provide the script in advance

in the AGT case, actual benchmarks were conducted by a team of 3, spending one full working week at each vendor
during the benchmark the vendor''s staff were responsible for interacting with the system, typing commands, etc.
the benchmark team acted as observers and timekeepers, and issued verbal instructions as appropriate

must recognize that the vendor''s staff are more familiar with the system than the typical employee will be during production
thus the benchmark is biased in favor of the vendor in its evaluation of user interaction - the vendor''s staff are presumed to be better than average digitizer operators etc.

during the benchmark, the intent of each phase of testing was explained to the vendor
positive and negative evaluations were communicated immediately to the vendor
the project team met each evening to compare notes

a wrapup session at the end of the benchmark identified major difficulties to the vendor, who was invited to respond

when the three benchmarks were completed the results were assessed and evaluated and became part of the final decision-making stages

REFERENCES
Goodchild, M.F., 1987. "Application of a GIS benchmarking and workload estimation model," Papers and Proceedings of Applied Geography Conferences 10:1-6.

Goodchild, M.F. and B.R. Rizzo, 1987. "Performance evaluation and workload estimation for geographic information systems," International Journal of Geographical Information Systems 1:67-76. Also appears in D.F. Marble, Editor, Proceedings of the Second International Symposium on Spatial Data Handling, Seattle, 497-509 (1986).

Marble, D.F. and L. Sen, 1986. "The development of standardized benchmarks for spatial database systems," in D.F. Marble, Editor, Proceedings of the Second International Symposium on Spatial Data Handling, Seattle, 488-496.

EXAM AND DISCUSSION QUESTIONS

1. Discuss the Marble and Sen paper listed in the references, and the differences between its approach and that presented in this unit.

2. How would you try to predict CPU utilization in the polygon overlay operation? What predictors would be suitable? How well would you expect them to perform based on your knowledge of algorithms for polygon overlay?

3. Since a computer is a mechanical device, it should be perfectly predictable. Why, then, is it so difficult to forecast the resources used by a GIS task?

4. Compare the approach to GIS applications benchmarking described in this unit with a standard description of computer performance evaluation, for example D. Ferrari, 1978, Computer Systems Performance Evaluation. Prentice Hall, Englewood Cliffs, NJ.

5. In some parts of the computing industry, the need for benchmarks has been avoided through the development of standardized tests. For example such tests are used to compare the speed and throughput rates of numerically intensive supercomputers, and of general-purpose mainframes. Are such tests possible or appropriate in the GIS industry?

6. GIS product definition exercise - 2 following pages.

PILOT PROJECT
. INTRODUCTION
Formats for pilot projects

B. MANAGEMENT OF A PILOT PROJECT
Objectives
Issues in pilot design
Results of the pilot

C. EXAMPLE PILOTS - AM/FM SYSTEMS
Pilot projects in AM/FM
Salt River Project
Pilot comparisons

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

There are no widely available references for this unit. We have included a long handout if you want to give your students some background.

UNIT 64 - PILOT PROJECT

Compiled with assistance from Warren Ferguson, Ferguson Cartotech, San Antonio, TX

A. INTRODUCTION

pilot project provides the first physical results from a GIS project
is usually the last major milestone prior to corporate and technical commitment
recognizes the difference between reading about the system and actually experiencing how it operates

a pilot is part of the effort to "sell" the system within the organization
the results of pilot projects can be shown to decision-makers as evidence of the system''s immediate value
provide a tangible way of communicating the potential of the system to skeptics within the organization

some organizations may go to full production without a pilot
in this case may need to rework the first deliverables as the system is "run in"
this risks alienating users

pilots are useful for
verifying estimates of costs and benefits
evaluating hardware, software, system and database design, procedures and alternatives

in summary, pilots provide a range of reduction of risks associated with project before final commitment to full production is made

Formats for pilot projects

1. demonstration of concepts
a chance for the organization to see a similar system running in production, evaluate its products
will be a demonstration of limited facilities on a small area, using a system which may not be part of the final production system, mainly for development and hands-on experience
in some cases even the data may not be part of the organization''s operation
provides early visibility of the system to management and users

2. prototype

a full-scale model of the future system
designed to identify any problems not foreseen by FRS and benchmarks, to finalize design and the conversion process
may be:
a "Development and Technical Prototype" to test code and learn the system
an "Applications Prototype" demonstrating potential applications development
generally convert an entire region or operating division of the organization from existing procedures to the new system

B. MANAGEMENT OF A PILOT PROJECT

the pilot project should be defined and managed as effectively as the major project of which it is part
objectives must be defined clearly

Objectives

evaluate system design
hardware and software
system performance
database design
updating of cost estimates for development

test alternatives
ways of generating products
formats for products

evaluate map input and conversion procedures
evaluate whether or not to use outside suppliers for data input and conversion
improve estimates of input and conversion schedules and costs
improve information on data sources

test management procedures
training for staff
production scheduling
system management
maintenance schedules

market system to end-users and management

Issues in pilot design

enthusiasm and support of management
if support is minimal, the pilot must be oriented to building a sound business case for the system

funds available

an effective pilot will have a substantial cost
to be successful the pilot must justify this cost and the subsequent, larger cost of the production system

geographical area of pilot
if the pilot covers a region within the organization''s service area, this region must be a significant proportion of the total area

level of staff experience
pilot project design must consider the current level of experience of the project staff
must allow sufficient training and experience for those involved to permit realistic evaluation of the potential of the system

the corporate environment
success of the project depends on corporate climate - how conservative, how risk-averse

Results of the pilot

at bare minimum:
experience in implementing a GIS project
management approval to proceed with major project

ideally, it will:
reduces risk in all areas
increases the effectiveness of the major project
improves efficiency in the early stages of the major project

a pilot can result in:
trained staff and users
well-developed technical, managerial and production procedures
near-production computer code
an improved implementation plan
enthusiastic support of management and users

C. EXAMPLE PILOTS - AM/FM SYSTEMS

all pilots are unique to their corporate and technical context
because of the major investments involved in AM/FM projects, AM/FM installations provide good examples of carefully planned pilot projects

Pilot projects in AM/FM

first in late 1960s in Cheyenne, Wyoming, by Public Service Company of Colorado
showed that technology and software cost and performance were not sufficiently advanced to support a large AM/FM project

some pilots today use consultants and hardware/software environments that can produce results in 4 months
these are generally for small municipalities and utilities, less than 100,000 customers

larger projects requiring investments in the $10 million to $100 million range may require 1 to 2 year pilots to meet design objectives

Salt River Project

is a water management system in Arizona

active in AM/FM since 1979
overhead - Salt River Project (2 pages)

Pilot comparisons

overhead - Comparison of several AM/FM pilots
table summarizes 11 AM/FM pilots by size and schedule

length of time used and size are functions of:
scope of pilot definition
resource commitment
corporate experience in AM/FM
type of service area (urban/rural)
system purchased
contents of database
range of applications demonstrated
system requirements

REFERENCES
"Pacific Gas and Electric project history", see handout following (7 pages).

EXAM AND DISCUSSION QUESTIONS

1. Review and discuss the handout provided on Pacific Gas and Electric project history.

2. Summarize the arguments for and against the use of a pilot project as part of the planning process for a major GIS project.

COSTS AND BENEFITS
A. INTRODUCTION
What is benefit/cost analysis?
Why do it?
Accrual

B. DEFINING COSTS
One-time vs recurring costs

C. BENEFITS OF A GIS
Classifying benefits
Examples of benefits

D. COMPARING COSTS AND BENEFITS

E. EXAMPLE - WASHINGTON STATE
Background
Installed system
Data
Costs
Benefits
Benefits vs Costs
Intangible benefits - Orphan roads project

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

UNIT 65 - COSTS AND BENEFITS

Compiled with assistance from Holly J. Dickinson, State University of New York at Buffalo

A. INTRODUCTION
What is benefit/cost analysis?

assessment of benefits of a GIS installation - what is the value of its products?

assessment of costs (initial and recurring)

comparison of benefits and costs
project should go ahead only if benefits exceed the costs
for comparison, benefits and costs must be comparable - measured in same units, over same period of time

Why do it?

a major GIS implementation is a large monetary investment and upper management wants to know the expected benefits of the system before they agree to the purchase
overhead - Costs of 3 example systems

three uses of benefit/cost analysis of computer systems: 1. planning tool for choosing among alternatives
select the system which meets minimal benefit requirements and offers the highest benefit/cost ratio 2. quantitative support to politically influence a decision
a major factor in influencing the decision to proceed 3. an audit tool for existing projects
future planning for the system can be based on the outcome

benefit/cost analysis is a standard procedure in many areas, including the information processing industry (see King and Schrems, 1978, p. 20)

Accrual

an organization will want to know the costs and benefits that accrue to the organization (i.e. must be borne by and benefit the organization respectively)
these are not necessarily all of the costs

some costs may be borne by government through cost-sharing arrangements
some costs may be borne by the vendor
the benefits which accrue to the organization are not necessarily all of the benefits of the system
some government organizations may wish to make decisions based on costs and benefits to society as a whole, not to the organization alone

B. DEFINING COSTS

the most important aspect of reporting costs is to include all costs, not just the acquisition of the hardware and GIS software
overhead - Possible cost categories

not all of these categories will be relevant to all GIS implementations

whether or not to include certain costs leads to questioning the purpose of the agency as well as the purpose of the GIS

for example: should the cost of data collection be included in the total costs for a GIS implementation?
No - if the data would have been collected whether or not a GIS was to be implemented
Yes - if the data was collected specifically to create the GIS database
Partially - if the data would have been collected, but not at the higher level of precision preferred for the GIS database

One-time vs recurring costs

one-time costs are incurred for hardware, software, possibly data, staff training

recurring costs are incurred for maintenance contracts, staff salaries, rent, utilities, etc.

one-time and recurring costs and benefits must be adjusted to identical time periods for purposes of comparison
e.g. sum the one-time and recurring costs and benefits over entire period of project, e.g. 5 years
e.g. express recurring costs and benefits on an annual basis, and apportion one-time costs appropriately
e.g. assign 1/5 of one-time costs to each year of project - may have to add interest charges on initial investment, allowances for inflation, etc.

C. BENEFITS OF A GIS

benefits are much more difficult to quantify than costs
costs can be expressed in dollars
benefits are often intangible, difficult or impossible to quantify

are generally tied to the expected products

products may be:
the same products as before but created by using the GIS instead of the previous manual or CAD/CAM, (i.e., non-GIS) methods
generally the same amount can be produced for less cost, or more can be produced at the same cost
new products that could not be produced without the GIS

types of products 1. simple map output of the database or subsets thereof 2. map products requiring the spatial analysis functions of a GIS 3. products which may not be end products, but input to a decision making process

benefit/cost analysis based solely on map output is different from an analysis involving the spatial analysis and decision support system functions of a GIS
the latter type is much more complex
there is a need to understand how decision makers use information, specifically geographical information, and how they value that information

difficult to define some "products"
e.g. the concept is clear enough in the case of a map or report but less so when the GIS is used to browse a database

there is still much to be understood about supply and demand for GIS products

Classifying benefits

tangible benefits:
cost reductions
decreased operating costs
staff time savings
cost avoidances
increased revenue

intangible benefits:
improved decision making
decreasing uncertainty
improving corporate or organizational image

Examples of benefits

total cost of producing maps by manual means was greater than total cost of making identical maps using GIS
tangible benefit

use of GIS allows garbage collection company to reduce staff through better scheduling of workload and collection routes
tangible, possible to quantify

emergency vehicles reduce average arrival time by using GIS- supplied information on road conditions
tangible if we can quantify the increased cost resulting from delayed arrival (fire has longer to burn, heart attack victim less likely to survive, etc.)

timber company reduced costs of logging because GIS could be used to avoid costly mistakes in locating roads and other logging infrastructure
tangible but hard to quantify, implies we can predict the mistakes which would have been made in the absence of GIS

information from GIS was used to avoid costly litigation in land ownership case
tangible but hard to quantify, implies we can predict the outcome of the case if GIS information had not been available

Forest Service finds a better location for a campsite through use of GIS
intangible, implies we can predict the decision which would have been made in the absence of GIS

some of the problems with measuring benefits might be subject to research
e.g. take two managers, supply one with GIS information, compare resulting decisions - but the results would be hard to generalize

D. COMPARING COSTS AND BENEFITS

those benefits easily quantified can be compared directly to costs

however, it may be wrong to look at the problem as a matter of predicting costs and benefits as static, simple quantities
realistically, a system is likely to change substantially over any extended planning horizon
the ability to expand the system easily without major structural change may be a hidden benefit

Dickinson and Calkins (1988) discuss a model of cost- effectiveness under varying levels of investment
overhead - Cost-effectiveness curve

the manual system produces good performance for low levels of investment, but performance fails to grow rapidly as investment increases
the automated system has high initial cost, but expandability ensures that performance continues to increase as investment increases
Case A shows the reduction in cost from switching from manual to automated at current levels of performance
Case B shows the increase in performance from investing the amount currently spent on the manual system
the old (manual) system is replaced not because its costs are currently high but because additional investment will produce little increase in system performance relative to the new GIS

the appropriate point to switch from manual to automated is at the intersection of the two curves
this argument assumes that the benefits of the two systems are the same, and makes the decision based on cost
the argument is conservative if we believe that the benefits of GIS are at least as high as those of the manual system

because of the difficulty of quantifying intangible benefits, one possibility is to document them as completely as possible and leave their evaluation to the final decision-making group (where the buck finally stops)

COSTS AND BENEFITS
E. EXAMPLE - WASHINGTON STATE

following is a brief analysis of the benefits and costs of a specific GIS implementation
(note: the full case study can be found in Dickinson, 1988)

Background

the organization is Department of Natural Resources, State of Washington, Olympia, WA
seven regional offices and one central office in Olympia

manages three million acres of state-owned land, two million are forested; the rest are in urban, recreational, or agricultural uses

charged with producing revenue, management of the natural resources, and public service
involving such activities as: clearcutting, thinning, fire and insect control, stand conversion, market harvesting, replanting, land exchanges, recreation site planning
these activities can create up to 200 changes daily, in landuse and landcover, affecting up to 13,000 ownership parcels

pre-1980, activity centered around sustainable harvest forestry

two computerized systems were used during this time:
GRIDS (Gridded Resource Inventory Data System) - able to calculate sustainable harvest yields and produce forest inventory reports and line printer maps

CALMA (Calmagraphics Mapping System) - a computer aided drafting system used to maintain soil maps for the state

in the 1980s, the Forest Land Management Program was adopted
required Multiple Use Forest Planning, environmental analysis, and overall, more effective analysis of geographic data
possible answers to this need were either more staff or a GIS

the choice was a GIS, and expected products included: overhead - Washington State study - Examples of Products
base maps of land use and land cover data
land lease and land exchange maps
road and bridge maintenance maps
environmental impact analysis
potential debris flow hazard maps
fire hazard maps
timber harvest tracking
spatial allocation of workloads

Installed system

overhead - Washington State study - Description of GIS

GIS was installed in November of 1983

system is known as GEOMAPS (GEOgraphic Multiple use Analysis and Planning System)
consists of ARC/INFO software and associated macros (procedures) built around ARC/INFO

Equipment: Central Office PRIME 9955 (upgraded as of 4/1/89) 6 Tektronix CRTs 11 other type CRTs 5 digitizers 2 pen plotters
Equipment: Regional Offices workstation consisting of one graphics and one alpha CRT, digitizer, pen plotter, line printer, modem communications

Staff: Central Office 1 administrator, 3 user-coordinators (to coordinate needs between regional offices and central office), 4 programmers, 11 production people

Staff: Regional Offices 1 GEOMAPS coordinator

Data

overhead - Washington State study - Data

database is centralized
regional offices are responsible for updates to their area, but actual update to the master database is performed in the central office, only after the updates have been checked and verified

two main data layers exist: 1. POCA - Public Land Survey Data, State Ownership Parcels, County and Administrative Boundaries
60% of this layer is at a scale of 1:12000; 40% at 1:24000
this layer took 3-8 people over an 8-year time period to digitize (40 person years) 2. LULC - Land Use and Land Cover Inventory Data; scale: 1:24,000
no records on digitizing time were available
updates to this data layer occur approximately 2,000 times per year

these two data layers were combined (polygon overlay) to produce the composite layer called POCAL
approximately 64,600 polygons, each with 77 attributes; updates occur at a rate of about 35 polygons per week

the other major data layer contains all soil data (300,000 polygons, 1:24000 scale)
existed in digital form before GEOMAPS

entry of road and hydrological data was being planned in 1988

Costs

overhead - Detailed costs of Geomaps

shows the detailed costs recorded for Fiscal Years 1984 to 1987
note the percentage of total costs that the different categories of costs cover:
hardware and software = 33%
maintenance contracts = 9%
staff = 43%
travel = 1%
supplies and services = 14%

overhead - Resource management system costs

taken from a DNR report and shows costs of all three systems

total costs for each system are:
GEOMAPS (FY 82-87) =$ 4,611,000
CALMA (FY 80-86) =$ 947,302
GRIDS (FY 80-81) =$ 1,162,613

Benefits

overhead - Summary of GEOMAPS benefits

shows the summary of tangible benefits from GEOMAPS as estimated by the DNR staff
figures appeared in the Post-Implementation Review approved by the DNR executives as well as State data authorities
all estimates are considered to be very conservative

the categories of tangible benefits are as follows:
1. increased revenue

due to the increased net value of timber by optimal thinning choices based on analysis of information about physical parameters of timber stands, location of work camps, and market prices

2. decreased costs
better stewardship by means of better management based on improved calculations, planning tools, and the effective use and storage of data
intensive management produced an estimated decrease of $7 per acre for thinning operations due to decreased number of ground visits, automatic preparation of contract maps, and ability to rank sites for priority harvest based on market information

3. staff savings
estimated staff time savings by using GEOMAPS (this includes salary only, not benefits)

4. cost reductions
DNR also claimed benefits from the cost reductions resulting from the phasing out of the two prior systems

Benefits vs Costs

there are two ways to treat the cost reductions from phasing out the old system:
1. cost reductions can be added to the benefits of GEOMAPS and compared to the costs of all three systems over the total time period (call this version 1)

overhead - Benefits vs costs

version one shows there is a positive benefit/cost ratio between total benefits and costs for all three systems for the fiscal years of 1982, 83, 84, 86, and 87

overhead - Benefits/costs - Version one graph
2. if we only want to look at the benefits and costs of GEOMAPS, we could subtract the cost reductions from the GEOMAPS costs, and then compare this total to the new tangible benefits of GEOMAPS only (version two on overhead)

also shows a positive benefit/cost ratio between the new tangible benefits from GEOMAPS and the costs of GEOMAPS itself for fiscal years of 1984, 86, 87 and 88

Intangible benefits - Orphan roads project

a very specific application of GEOMAPS was not entered into the benefit/cost analysis, primarily because the benefit could not be easily quantified
however, the benefit is by no means trivial

before the 1970 Forest Act, forest road construction was unregulated
loggers would build temporary roads and bridges when they moved in to log a new area
when the task was finished, the roads were left behind (i.e., orphan roads)
since they were only temporary roads, many were constructed on steep gradients without usual engineering controls
this create a high potential for debris flows where these roads cross streams

two disasters, resulting in the loss of lives, were caused by the poor placement of such roads
each of these disasters cost the DNR over two million dollars in law suits

many other orphan roads exist and are still being used across the state

GEOMAPS was used to locate potential hazard locations by locating potential debris flow trigger points

data used included:
road locations, categorized by year of construction (1941, 1947, 1956/62/65, 1969, 1976/78, and 1983)
stream locations
elevation data in a TIN data structure

procedure:

ARC/TIN was used to create a contour map from the elevation data
this was overlaid with the stream data to trace to the stream heads, calculate gradients, and categorize the streams into those with a gradient of less than 3.6 degrees, between 3.6 and 8 degrees, and greater than 8 degrees
ARC/ALLOCATE was used to flag all intersections of roads and streams with a gradient greater than 8 degrees
for the allocation model, the impedance factor was the gradient, and the resource was the debris in the stream
these intersections were potential trigger points for debris flow

obviously, a benefit exists by using GEOMAPS in this type of analysis
but how to quantify the benefit, and how (or if) to include it in benefit/cost analysis?

REFERENCES
Dickinson, H.J., 1988, "Benefit/Cost Analysis of Geographic Information System Implementation," unpublished Master''s Thesis, Department of Geography, State University of New York at Buffalo, NY

Dickinson, H.J., and H.W. Calkins, 1988, "The Economic Evaluation of Implementing a GIS," International Journal of Geographical Information Systems 2:307-327.

Epstein, E., and T.D. Duchesneau, 1984, "The Use and Value of a Geodetic Reference System," University of Maine, Orono, Maine. Available from the National Geodetic Information Center (NOAA), Rockland, Maryland, USA.

Joint Nordic Project, 1987. Digital Map Data Bases, Economics and User Experiences in North America, Publications Division of the National Board of Survey, Helsinki, Finland.

King, John L., and E.L. Schrems, 1978, "Cost-Benefit Analysis in Information Systems Development and Operation," Computing Surveys 10:19-34.

Stutheit, J., 1990. "GIS procurements: Weighing the costs", GIS World, April/May 1990:69-70. A general overview of a process conducted by the US Forest Service to determine the costs and benefits of a GIS project.

Clapp, J.L., J.D. McLaughlin, J.G. Sullivan and A.P. Vonderohe, 1989. "Toward a method for the evaluation of

multipurpose land information systems", URISA Journal, 1(1):39-43. Paper originally published in 1985 describes a model for evaluating LIS which measures "operational efficiency, operational effectiveness, program effectiveness and contributions to well-being".

EXAM AND DISCUSSION QUESTIONS

1. Summarize the issues involved in assessing costs and benefits when a) a manual system is replaced by a digital system, b) an existing digital system is replaced, and c) a digital system is introduced to an organization which does not have any existing equivalent, manual or digital.

2. Design a series of experiments to determine as far as possible the intangible benefits which accrue from GIS-based decision- making in an organization such as a National Forest.

3. A parcel delivery service plans to install vehicle navigation systems in each of its vehicles. These feature continuous display of maps of the area surrounding the vehicle, and of the location of the vehicle in relation to a specified destination. Design a study to assess the benefits of such a system.

4. Discuss the problems presented by the dimension of time in the evaluation of costs and benefits.

DATABASE CREATION
. INTRODUCTION

B. DATABASE DESIGN
Stages in database design

C. ISSUES IN DATABASE CREATION

D. KEY HARDWARE PARAMETERS
Volume
Access speed
Network configuration

E. DATABASE REDEFINITION

F. TILES AND LAYERS
Reasons for partitioning
"Seamless" databases
Organizing data into layers
Selecting tile configurations

G. DATA CONVERSION
Database requirements
In-house conversion

H. SCHEDULING DATABASE CREATION
Scheduling issues

I. EXAMPLE - FLATHEAD NATIONAL FOREST DATABASE
Background
Examples of products
Proposed database contents
Example dataset characteristics
Tiling
Database creation plan
System specific issues
Schedule

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

This unit is the longest one included in the Curriculum. It will be impossible to cover all this material in one lecture, but there is no clear break at which to split this cleanly. Some of the material is technical and some of it management oriented. You will have to decide what to omit, if you need to, based on your students'' interests and educational backgrounds.

UNIT 66 - DATABASE CREATION

A. INTRODUCTION

an FRS establishes:
the products to be generated by the system
the data needed to generate the products
the functions which must operate on the data

working from the outline provided by the FRS, the data base design and creation process begins
this unit examines the management and planning issues involved in the physical creation of the database

note that specific implementation details will not be reviewed as these are highly dependant on the particular GIS used

emphasis is on databases for resource management applications
databases for facilities management are often extensions of existing non-geographic databases
depend too much on specifics of systems

the key individual involved at this stage is the Database Manager or Coordinator who is responsible for:
definition of database contents and its "external views"
see Unit 43 for a discussion of the different "views" of a database
maintenance and update control
day-to-day operation, particularly if database is distributed over a network

B. DATABASE DESIGN

provides a comprehensive framework for the database

allows the database to be viewed in its entirety so that interaction between elements can be evaluated

permits identification of potential problems and design alternatives

without a good database design, there may be
irrelevant data that will not be used
omitted data
no update potential
inappropriate representation of entities
lack of integration between various parts of the database

unsupported applications
major additional costs to revise the database

Stages in database design

recall from Unit 10, that steps in database design are:
1. Conceptual

software and hardware independent
describes and defines included entities and spatial objects

2. Logical
software specific but hardware independent
determined by database management system (discussed in Unit 43)

3. Physical
both hardware and software specific
related to issues of file structure, memory size and access requirements
this unit focuses mainly on this last stage

C. ISSUES IN DATABASE CREATION

what storage media to use?
how large is the database?
how much can be stored online? what access speed is required for what parts of the database?
how should the database be laid out on the various media?
what growth should be allowed for in acquiring storage devices?

how will the database change over time?
will new attributes be added?
will the number of features stored increase?

how should the data be partitioned - both geographically and thematically?
is source data partitioned?
will products be partitioned?

what security is needed?
who should be able to redefine schema - new attributes, new objects, new object classes?
who should be able to edit and update?

should the database be distributed or centralized?
if distributed, how will it be partitioned between hosts?

how should the database be documented?
who is responsible for maintaining standards of definition? standards of format? accuracy? should documentation include access to the compiler of the data?

how should database creation be scheduled?
where will the data come from?

who determines product priorities?
who is responsible for scheduling data availability?

the following sections address some of these questions

D. KEY HARDWARE PARAMETERS
Volume

databases for GIS applications range from a few megabytes (a small resource management project) to terabytes
a small raster-based project using IDRISI, 100 by 200 cells, 50 layers might require 10 Mbytes database on a PC/AT
a mid-sized vector-based project for a National Forest using ARC/INFO might require 300 Mbytes
a national, archival database might reach many hundreds of Gbytes
the spatial database represented by the currently accumulated imagery of Landsat is order 1013 bytes

Access speed

overhead - Storage media

data which can be accessed in order 1 second is said to be "on-line"
to be on-line, data must be stored on fixed or removable disk
relative to other forms of permanent storage, disk costs are high, and there is an effective upper limit of order 100 Gbytes for on-line storage when using common magnetic disk technology

"archival" data (data which is comparatively stable through time) can be stored off-line until needed
only extracts will be on-line for analysis at any one time
archival systems incur additional time to mount media on hardware
access time to extract subsets from archival data once mounted is order 1 minute

archival media:
magnetic tape
removable disk
CD-ROM
no ability to edit data once written - this is acceptable for many types of geographical data
copies are very cheap
optical WORM (Write Once Read Many)
"video" tape

automatic multiple storage and access systems increase capacity and decrease access time
magnetic tape stores can be automated, raising effective capacity to 1 Tbyte (order 10,000 tapes)
order 10,000 tapes is also an effective upper limit to the size of a (conventional, manual mount) tape library
optical WORM libraries can be automated much more easily using "jukebox" technology - automatic selection and mounting of platter
devices to mount cassette tapes automatically are also available

Network configuration

should database be centralized or distributed?

there are two answers: 1. all departments share one common database, or 2. parts of the database exist on different workstations in an integrated network
each department responsible for maintaining its own share of the database
optimizes use of expertise

with modern technology (e.g. NFS (Network File System)) user may be unaware of actual location of data being used
some workstations may be "diskless", owning no part of the database

distributed databases require careful attention to responsibilities, standards, scheduling of updates

E. DATABASE REDEFINITION

in some applications, all files, attributes, objects can be anticipated when the database is defined
e.g. systems for facilities management typically do not allow redefinition of the database structure by user

other applications, particularly those involving analysis, require ability to define new objects, attributes
this capability is generally important in resource management applications

important to determine who is allowed to change the database definitions
database administrator only?
project manager only?
any user?

DATABASE CREATION
F. TILES AND LAYERS

many spatial databases are partitioned internally
partitions may be defined spatially (like map sheets) or thematically or both

the term tile is often used to refer to a geographical partition of a database, and layer to a thematic partition

Reasons for partitioning

capacity of storage devices
may limit the amount of data that the system can handle as one physical unit

update
easier to update one partition (e.g. map sheet) at a time

access speed
may be faster if only one partition is accessed at a time

distribution
easier to copy a partition than to extract part of a larger database
e.g. US Bureau of the Census chose to partition its TIGER files by county for distribution based on user needs
e.g. US Geological Survey partitions digital cartographic data by 1:100,000 map sheet

user needs
users need certain combinations of geographical area and theme more commonly than others
illustrated by the conventional arrangement of topographic and thematic map series
e.g. soils information is not normally shown on standard topographic maps
the best source of usage patterns is conventional cartographic products because their traditions have been established through continual usage and improvement

"Seamless" databases

despite the presence of partitioning, system designers may choose to hide partitions from the user and present a homogeneous, seamless view of the database
e.g. are systems available to automatically mosaic Landsat scenes, so users can work independently of normal scene boundaries

in seamless databases, the data must be fully edgematched
parts of an object which span geographical partitions must be logically related
features which extend across tile boundaries must have identical geographic coordinates and attributes at adjacent edges
every object must have an ID which is unique over the whole database

the term Map Librarian is commonly applied to systems which remove partitions from the user''s view of the database

Organizing data into layers

the source documents (maps) generally determine the initial thematic division of the data into layers
these initial layers need not coincide with the way the data are structured internally
e.g. the application may consider lakes and streams as one layer while the data structure may see them as two different objects - polygons and lines

several distinct layers may be available from the same map sheet
e.g. topographic maps may provide contours, lakes and streams (hydrography), roads
the Database Manager may choose to store these as different thematic partitions in the database

when deciding how to partition the data by theme, need to consider:

data relationships
which types of data have relationships that need to be stored in the database
these will need to be on the same layer or stored in such a way that relationships between them can be quickly determined

functional requirements
what sets of data tend to be used together during the creation of products
it may be more efficient to store these on one layer

user requirements
how diverse will the users requirements be
more diversity may require more layers to allow flexibility

updates
data which needs to be updated frequently should be isolated

common features
features which are common to many coverages, such as shorelines and rivers, may be stored separately then used to create other coverages that incorporate these lines as boundaries

internal organization of layers depends on the system chosen
CAD systems treat each class of object as a separate layer
many raster systems treat each attribute as a separate layer, although objects may have many attributes
some newer GIS designs avoid the concept of layers entirely, storing all classes of objects and their interrelationships together

Selecting tile configurations

tiles may cover the same area throughout the database, or they may have variable sizes

fixed size tiles are:
generally inefficient in terms of storage since some tiles will have lots of data and others very little
good when data volume changes through time since it is not necessary to restructure tiles with updates

variable size tiles are:
efficient in term of storage
difficult to restructure if new data is added

boundaries may be: overhead - Tiling Variations
regular
e.g. based on map sheet boundaries
free-form
e.g. based on political or administrative boundaries, watersheds, major features like roads or rivers

tile sizes and boundaries can be chosen based on:
areal extent of common queries or products
scale needed in output
balance between getting the largest areal coverage possible and speed of processing

practically speaking, in most databases, partitions correspond to conventional map sheet boundaries, e.g. 7.5 minute quadrangles

products will likely be created one tile at a time
e.g. a forest manager wants maps of timber inventory at a scale of 1:24,000
the size of plots is limited by the plotter itself, and by physical constraints on handling and storage
it makes sense to generate timber inventory maps in 7.5 minute quadrangles
since data will be input from quadrangles, why not tile the entire database in quadrangles as well?
however, a Map Librarian will be needed when small- scale products have to be generated using many tiles at once

G. DATA CONVERSION

the process of data input to create the database is often called data conversion
involves the conversion of data from various sources to the format required by the selected system

previously have examined the different ways of inputting data and various data sources
consideration of these options is critical in planning for database creation
Unit 7 discusses several issues related to integrating data from different sources

often there are several alternative sources and input methods available for a single dataset

Database requirements

need to consider database requirements in terms of:
scale
accuracy
scheduling priorities
cost

scale
FRS specifies the scale required for output
will determine the largest scale that is required for datasets

may not need to go to added expense and time to input data at larger scales

accuracy
required accuracy will determine the quality of input necessary and the amount of data that may be created
e.g. coarse scanning or digitizing versus very careful and detailed digitizing
e.g. field data collection versus satellite image interpretation

scheduling priorities
some datasets will be critical in the development of later datasets and early products
these may justify expensive input methods or the purchase of existing sources

alternatives for creating the database include:
obtaining and converting existing digital data
manual or automated input from maps and field sources
contracting data conversion to consultants

In-house conversion

data entry is labor intensive and time consuming
some GIS vendors assist in the conversion effort, and there are a number of companies which specialize in conversion
some agencies do their conversion in-house, but there is a reluctance to do so, in many cases, as the added personnel may not be needed once the initial conversion is complete

advantages of in-house conversion
agency personnel, who are familiar with the "ground truth" and unique situations of the areas of interest, are able to supervise the conversion effort
this can be important for unanticipated situations in which general rules cannot be uniformly applied
auxiliary maps and data are available if needed for interpretation
if the maps are sent out for digitizing, what you send is all you get
in-house validity checks can be made more easily

disadvantages of in-house conversion
additional equipment and personnel need to be added to the project plan
long-term commitment to full-time employees can be expensive

H. SCHEDULING DATABASE CREATION

database creation is a time-consuming and expensive operation which must be phased over several years of operation
the total cost of database creation will likely exceed the costs of hardware and software by a factor of four or five
e.g. over a 5 year period, of a total $5 million cost of a typical GIS project for resource management, $4 million went to data collection and entry, only $1 million to hardware, software, administration, application development

since the benefits of the system derive from its products
database creation must be scheduled so the system can produce a limited number of products as quickly as possible
however, benefits will not be realized at the full rate until the database creation is complete

need to know the complexity of data on each input source document to forecast data input workload
e.g. numbers of points, polygons, lengths of lines, number of characters of attribute

Scheduling issues

to generate a tile of a product, the required data layers for the correct tile must have been input

to determine the order in which datasets must be input, must rank products based on: 1. perceived benefit 2. cost of necessary input
highest ranked are those with high benefit, low cost of necessary input
lowest ranked are low benefit, high data input cost
some layers may be used by several products - once input, the cost to other products is nil

the promotional benefit of a product is highest for a single tile, decreases for subsequent tiles
a single tile of a product can be used to "sell" the system, draw attention to its possibilities
high priority needs to be given to generating a product which can "sell" the system within each department or to each type of user

need to know the payoffs between 1. producing a single tile of a new product and 2. producing further tiles of an existing product

determining priorities under the constraint of data input capacity is a delicate operation for the Database Manager
many layers of data may not exist, may have to be compiled from air photos or field observation
the schedule for data input will have to accommodate availability of data as well as product priorities

I. EXAMPLE - FLATHEAD NATIONAL FOREST DATABASE
Background

Flathead NF located in Northwestern Montana on west slope of Continental Divide
adjacent to Glacier National Park
headquarters in Kalispell, MT

total area within Forest boundary is 2,628,705 acres (1,063,822 ha)
Forest area spread over 133 1:24,000 (7.5 minute) quadrangles

resource management responsibilities include: timber fisheries wildlife water soils recreation minerals wilderness areas rangeland fire plus maintenance of Forest infrastructure (engineering)

substantial investment in use of Landsat imagery for forest inventory and management, using VICAR image processing software

FRS conducted in 1984/5, planning period extended to 1991
important to note that this plan considers the needs of Flathead NF in isolation
may not be compatible with the national needs of the Forest Service or the national policy developed under the National GIS Plan
may conflict with emerging concepts of service- wide Corporate Information (see Unit 71)

Examples of products

FRS identifies 55 information products
handout - Examples of products (2 pages)

extracted from a study by Tomlinson Associates, Inc.

Proposed database contents

total of 58 input datasets required

total database volume estimated at 1 Gbyte

12 already in digital form in VICAR image processing system, running on mainframe outside Forest
3 interpreted from Landsat, available in raster form
9 digitized in vector form, rasterized by VICARS

2 are large attribute files (forest stand attributes, transportation data for roads) maintained as System/2000 database outside Forest

all remaining datasets must be derived from non-digital maps or tables
map scales range from 1:24,000 to 1:250,000
datasets vary in complexity
number of map sheets varies depending on scale

Example dataset characteristics

see page 2 of handout

Tiling

1:24,000 7.5 minute quadrangle dominates both input and output requirements
therefore, makes sense to use quadrangles as tiles in the database if it must be tiled (depends on system chosen)
could use aggregations of 7.5 minute quadrangles, e.g. 15 minute quadrangles

Database creation plan

needed to:
assign input data to object types, layers
determine which relationships to store in the database
determine naming conventions for files, attributes

scheduled input of data from 3,162 individual map sheets over 6 years
need to allow for updates as well as initial data input
some layers updated on a regular basis - e.g. timber harvesting
some irregularly - e.g. forest fires

System specific issues

preferred arrangement is a centralized database at Forest headquarters, access from workstations across the Forest

implementation plan is based on scheduled generation of products

system design provides little access to the database in query mode
therefore product generation can be batched and data need be online only during product generation
however 1 Gbyte is easily accommodated online

Schedule

database creation schedule determines ability to generate products
FRS calls for generation of 4,513 individual map products and 3,871 list products in same 6 years
digitizing need will be heaviest at beginning of period
ability to produce will be highest at end

input phasing:
roads, PLSS section boundaries - all input in year 1
lakes and streams - phased over years 2-4
forest stands - phased over years 2-4
harvest areas - input to begin in year 4

over the 6 years of database creation there will be increasing output, diminishing input

REFERENCES
ACSM-ASPRS GIMS Committee, 1989. "Multi-purpose Geographic Database Guidelines for Local Governments," ASCM Bulletin, Number 121:42-50. Provides a general outline for the consideration of scale and content for municipal GIS databases.

Calkins, H.W., and D.F. Marble, 1987. "The transition to automated production cartography: design of the master cartographic database," The American Cartographer 14:105- 119. Stresses the need for rigorous database design and illustrates the use of the entity-relationship model for spatial databases.

Nyerges, T.L., 1989. "Schema integration analysis for the development of GIS databases," International Journal of Geographical Information Systems 3:152-83. Describes methods for analyzing the differences and similarities between two or more databases

Nyerges, T.L., and K.J. Dueker, 1988. "Geographic Information Systems in Transportation," US Department

IMPLEMENTATION STRATEGIES FOR LARGE ORGANIZATIONS
INTRODUCTION

this unit examines issues that arise when GIS is implemented in large organizations

these issues include:
where in the organizational structure to locate the GIS operation
problems and advantages of multi-participant projects

B. LOCATION WITHIN THE ORGANIZATION

even though a GIS may be an organization-wide tool and is seen as a decentralized resource within the organization, centralized coordination of the GIS operation is still necessary
is needed to ensure efficiency and cost- effectiveness in the operation of the GIS
e.g. avoid redundancy in the collection of data
e.g. ensure expensive hardware is being used efficiently

the location of the GIS manager and the support staff will be seen as the location of the GIS unit within the organization
the location of this unit will affect the way the GIS staff interacts with the rest of the organization

Somers (1990) suggests there are three basic options for the location of the GIS:
1. operational department location

e.g. planning, public works, engineering or assessments
GIS often develops from a small system obtained to deal with specific needs which have grown to support activities outside the mandate of the original department
advantages:
such systems are very responsive to original users needs
disadvantages:
departmental focus makes it difficult for other users to have their needs and priorities recognized

may not have high level management support

2. support department location
e.g. data processing, MIS or management services
in these locations GIS is seen as a service operation like payroll, personnel and DP and will be supported by the organization as such
advantages:
objectivity of system design and management

disadvantages:
remote from the users of the GIS
may not be responsive to the needs of users
priorities of department may be different than users''

3. executive level location
advantages:
high level visibility, support and attention
objectivity
disadvantages:
distance from the real operations of the organization
users may feel GIS support staff is out of touch with their needs

the actual location of the GIS unit within an organization will reflect the circumstance of its introduction, the management structure and the organizational policies and mandates

C. MULTIPARTICIPANT PROJECTS

increasingly, GISs are being implemented by consortia of agencies with a wide range of legal foundations, including:
local government agencies
county governments
state and federal government agencies
public utilities
non-profit organizations

diverse organizations cooperating in such multi- participant GIS are bound by a common geographic setting and are motivated by the need for fiscal responsibility
costs for data collection and management for a common geographic area can be shared among organizations

are guided and coordinated through inter-agency committees consisting of representatives from the departments and agencies involved in the use and design of the GIS
such committees generally have two structural levels:
policy level - senior management
technical level - technical and middle management

Issues for multiparticipant projects

Forrest et al (1990) list several issues that have to be addressed by these inter-agency committees:

participation
involved agencies need to commit financial and other resources to the project

data ownership
who owns the data collected?

data maintenance
which agency or agencies will have the ultimate responsibility of data maintenance and update
how will this responsibility be partitioned?

hardware and software ownership and maintenance
how will the necessary hardware and software be distributed across the agencies?
which vendors'' products will be supported by the multi-agency agreement

standards
what standards will be used for data exchange and communications

financing
how will the project be funded? how will the costs be shared equitably?

new business activities
GIS may provide the involved agencies the opportunity to venture into new business areas
e.g. sale of digital data, maps

D. US FOREST SERVICE EXAMPLE

the following sections describe the development and implementation of a national GIS strategy within the US Forest Service

Forest Service is an agency of the US Department of Agriculture

responsible for management of nearly 200 million acres of federal lands organized into 155 National Forests
mandate to manage land for multiple uses - timber and pulp production, mineral resources, recreation, wildlife, conservation

Organization

National Forests grouped into regions

each National Forest has a headquarters, several district offices

nature of each Forest varies depending on resources
those in the Pacific Northwest are heavy timber producers
others may have significant oil and gas, e.g. in Rocky Mountains
"wealth" (annual budget) of Forest depends on resources, leases

pattern of jurisdiction is typically complex
area of Forest is not singly bounded
many islands of private ownership within boundary
complex system of access rights, grazing and timber leases

map - a map of a local National Forest would be useful at this point, plus a description of its resources, management activities

E. EARLY GIS ACTIVITIES

many Forests and regional offices acquired assorted types of GIS prior to 1987

determining factors in early acquisition included:
availability of funds - "rich" forests were early adopters
presence of a "missionary" on Forest staff, able to persuade management that available funds should be spent on this high risk innovation

examples of status of GIS circa 1985:

San Juan National Forest

large Forest in southern Colorado
extensive mineral resources, recreation
little marketable timber

Forest broken into 80,000 irregularly shaped units, often called "integrated terrain units" (ITU)
the ITU is an area object which is homogeneous on all attributes in the database
i.e. a "smallest common denominator" parcel of land with uniform land use, vegetation, soil
in essence, these units are the result of overlaying maps of all relevant themes
in practice the map is divided up into areas which are both (a) as large as possible and (b) as homogeneous as possible

each unit assigned a unique number
attributes assigned to each unit, covering
forest cover (species, age, density)
administrative unit (county, ranger district)
slope and aspect
watershed
soil type, drainage
etc.

data matrix of 80,000 units by 600 attributes (close to 50,000,000 individual data items) maintained at Region
computing facility using System/2000 hierarchical database

benefit:
low cost of data entry - no digitizing

problems:
no geography - just a "flat file" of attributes
no way of aggregating units based on spatial adjacency, making spatial queries
no point or line objects, no associated operations, e.g. buffers around line objects
no map products

problems with quality control
unlike geographical files, cannot make internal consistency checks, every entry must be checked individually - no possibility of using maps for data checking
virtually impossible to achieve high quality
redundancy
if extended to too many attributes, the ITU approach leads to high levels of redundancy in the database
e.g. there are only two counties in the Forest, these could be represented accurately as a single layer with two area objects, but using the ITU approach 80,000 entries must be made for county attribute
thus while only two possible errors could occur in entering county attribute if county is a separate polygon layer, there are 80,000 chances of error with ITU approach

Flathead National Forest

large Forest in western Montana adjacent to Glacier National Park
much marketable timber, some mineral resources
wildlife conservation important because of adjacency to National Park

heavy reliance on Landsat imagery as primary data source

imagery interpreted with ground checks to provide forest inventory

imagery registered to topographic mapping and DEM

other layers input by rasterizing vector coverages (e.g. climatic variables)

multi-layer raster database at Landsat resolution (80 m) manipulated using remote sensing system (VICAR)

benefits:
easy to use system for mapping, production of images
easy to combine layers for modeling

problems:
difficult to use system to manage timber resource
raster database has no concept of homogeneous stand
difficult to link ground checks of timber type/size/density to pixels
not easy to handle point or line datasets

e.g. campsites, points of historical significance, sightings of endangered species e.g. Grizzly Bear, roads, streams
difficult to attach extensive lists of attributes to pixels
each attribute treated as a separate layer, no easy way of relating objects between layers

Summary

Flathead and San Juan NFs illustrate the problems of delivering GIS products using image processing and conventional database technology respectively

other examples illustrate the problems of CAD systems

by 1985 Forest Service had experience of many GISs in different Forests and regions:
vector systems: COMARC ARC/INFO (ESRI) Strings (Geo-Based) Intergraph MOSS
raster systems: ERDAS VICAR WRIS
input methods included digitizing, scanning and interpretation of imagery

Other technical issues

in the early 1980s the Forest Service began implementation of a nationwide system of networked computing resources to automate office functions
functionality includes electronic mail, word processing, limited database and analysis capabilities
supplied by Data General, installed in every Forest, region and Washington headquarters
compatibility of an eventual GIS with the DG hardware is therefore a major technical issue in GIS planning and acquisition
could the GIS run on the (possibly expanded) DG network?

of the GISs installed in various parts of the Forest Service, one vector system (MOSS) had been developed largely within the Department of the Interior and appeared to have much of the necessary functionality
how should this system be judged relative to the remaining vendor-supplied systems in the acquisition process?

F. 1984/5 FUNCTIONAL REQUIREMENTS STUDIES

as a result of pressure from both inside and outside the Service to acquire GISs for their operations, FRSs were conducted for a small sample of forests in order that functional requirements for the entire Forest Service could be determined

6 Forests with a variety of sizes, resource mixes were selected:
George Washington (Virginia)
Nicolet (Wisconsin)
Flathead (Montana)
San Juan (Colorado)
Siuslaw (Oregon)
Shasta/Trinity (California)

full Functional Requirements Studies for GIS were carried out
fully internal strategy (see Unit 61)
contracted to consultant - Tomlinson Associates Inc.
contract period of 8 months

30-60 information products identified per Forest, similar numbers of input datasets
60-90% of these were new products not previously generated

Siuslaw National Forest FRS

60 information products identified:
10 are simple cartographic products generated by reformatting, rescaling and/or resymbolizing input data
2 require 3D graphics
7 are lists generated from input data
37 require use of GIS functions for simple analysis of input data
8 are the result of sophisticated analysis
some are common to most Forests, e.g. timber inventory maps
some are specific to local conditions, e.g. map to predict areas suitable for growing marijuana required by law enforcement department

database requires input of data from approx. 15,000 map sheets during the 6 year planning period
many of these are repeated updates
1200 in year 1 rising to 3500 in year 6
the 1200 maps in year 1 contain approx. 60,000 polygons and 13,000 points, plus 300,000 cm of line objects

G. THE NATIONAL GIS PLAN

the circa-1985 situation was clearly uncoordinated
duplication of effort, high cost of maintaining expertise in a range of systems
no analysis of what was optimal for the Forest Service as a whole

was an awareness that information should be a corporate resource and managed as such
corporate information is that information which must be commonly used, understood and shared to meet the agency''s mission
must be freely exchangeable between different departments, Forests, regions
must have compatible formats and definitions - well- developed standards

although the software to handle this information need not be standardized, the interfaces, methods of analysis and planning, and data structures and formats should be standard

in January 1988 the Forest Service approved a plan for implementing a service-wide GIS by 1991

Objectives of the GIS

support the management information needed by the Forest Service to accomplish its mission

facilitate understanding and sharing of information horizontally and vertically within the organization, and with other organizations where possible

allow access to information by managers through a non- technical, user-friendly interface

take full advantage of existing Forest Service hardware and networks

be flexible enough to incorporate new technologies in the future

H. COMPONENTS OF THE PLAN

plan is composed of 5 major components or phases:

1. Information Base and Structure

identify the objectives, principles and assumptions of GIS implementation - the "vision" - and convert this into a "blueprint" for structuring resource information

assemble information from a survey of 34 Forests to identify the kinds of data being used to characterize resources
need to distinguish between "basic" and "interpreted" data
"basic" is raw but relatively stable and accurate
"interpreted" is more immediately useful for management
which is more appropriately stored in the database?
is there a relatively small set of data types common to many Forest management efforts, but complicated by differences in definition and practice?

describe the NFS GIS corporate information structure and the database environment
describe the characteristics and functionality of the GIS database environment needed to support the information structure
develop standards for the corporate information structure

define the requirements for the user interface

2. Organizational Readiness

improve awareness of the GIS plan

develop guidelines for planning local implementations

develop strategy for data conversion and acquisition
the data currently availab

IMPLEMENTATION ISSUES
. INTRODUCTION

B. STAGE THEORIES OF COMPUTING GROWTH
Nolan model of computing growth
Incremental model
Radical model

C. RESISTANCE TO CHANGE

D. IMPLEMENTATION PROBLEMS
Overemphasis on technology
Rigid work patterns
Organizational inflexibility
Decision-making procedures
Assignment of responsibilities
System support staffing
Integration of information requirements

E. STRATEGIES TO FACILITATE SUCCESS
Management involvement
Training and education
Continued promotion
Responsiveness
Implementation and follow-up plans

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

UNIT 67 - IMPLEMENTATION ISSUES

Compiled with assistance from Ken Dueker, Portland State University

A. INTRODUCTION

most organizations acquiring GIS technology are relatively sophisticated
some level of investment already exists in electronic data processing (EDP)
they have experience with database management and mapping systems and some combination of mainframes, minis and micros

GIS technology will be moving into an environment with its own institutional structures - departments, areas of responsibility
as an integrating technology, GIS is more likely to require organizational changes than other innovations
the need for changes - cooperation, breaking down of barriers etc. - may have been used as arguments for GIS
existing structures are already changing - centralized computing services with large staffs are disappearing because new distributed workstation hardware requires less support

organizational change is often difficult to achieve and can lead to failure of the GIS project
organizational and institutional issues are more often reasons for failure of GIS projects than technical issues

B. STAGE THEORIES OF COMPUTING GROWTH

several models have been proposed for the growth of computing within organizations
growth is divided into a number of stages

Nolan model of computing growth

the Nolan (1973) model has 4 stages:
Stage 1: Initiation

computer acquisition
use for low profile tasks within a major user department
early problems appear

Stage 2: Contagion

efforts to increase use of computing
desire to use inactive resources completely
supportive top management
fast rise in costs

Stage 3: Control
efforts to control computing expenditures
policy and management board created
efforts to centralize computing and control

formal systems development policies are introduced
rate of increase in cost slows
charge-back policies introduced

Stage 4: Integration
refinement of controls
greater maturity in management of computing
computing is seen as an organization-wide resource
application development continues in a controlled way
costs rise slowly and smoothly
charge-back policy might be modified or abandoned

how does this model fit GIS experience?
two versions - incremental and radical

Incremental model

GIS is a limited expansion of existing EDP facilities, no major organizational changes required
GIS will be managed by EDP department as a service
probably run on EDP''s mainframe
this model fits AM/FM and LIS applications best - adding geographical access to existing administrative database

GIS acquisition will likely be initiated by one or two departments, other departments encouraged to support by management
thus it begins at stage 2 of Nolan''s model
if acquisition is successful, use and costs will grow rapidly, leading to control in stage 3

Radical model

GIS is independent of existing EDP facilities, e.g. uses micros instead of EDP mainframe, may be promoted by staff with little or no history of EDP use
EDP department may resist acquisition, or attempt to persuade management to adopt an incremental-type strategy instead
may be strong pressure to make GIS hardware compatible with main EDP facility to minimize training/maintenance costs

this model more likely in GIS applications with strong analytical component, e.g. resource management, planning

model assumes that GIS will not require large supporting infrastructure - unlike central EDP facility with staff of operators, programmers, analysts, consultants

unlike the incremental model, this begins at step 1 of Nolan''s model
few systems have progressed beyond stage 2 - process of contagion is still under way in most organizations - GIS is still new
stage 2 is slow in GIS because of the need to educate/train users in new approach - GIS does not replace existing manual procedures in many applications (unlike many EDP applications, e.g. payroll)
support by management may evaporate before the contagion period is over - never get to stages 3 and 4

we have little experience of well-controlled (stage 3), well integrated (stage 4) systems at this point in time

C. RESISTANCE TO CHANGE

all organizations are conservative

resistance to change has always been a problem in technological innovation
e.g. early years of the industrial revolution

change requires leadership
stage 1 requires a "missionary" within an existing department
stage 2 requires commitment of top management, similar commitment of individuals within departments
despite the economic, operational, political advantages of GIS, the technology is new and outside many senior managers'' experience

leaders take great personal risk
ample evidence of past failure of GIS projects
initial "missionary" is an obvious scapegoat for failure
Rhind (1988), Chrisman (1988) document the role of various leaders in the early technical development of GIS - similar roles within organizations will likely never be documented

GIS innovation is a sufficiently radical change within an organization to be called a "paradigm shift"
a paradigm is a set of rules or concepts that provide a framework for conducting an organization''s business
the role of paradigms in science is discussed by Kuhn (1970)
use of GIS to support various scientific disciplines (biology, archaeology, health science) may require a paradigm shift

D. IMPLEMENTATION PROBLEMS

Foley (1988) reviews the problems commonly encountered in GIS implementation, and common reasons for failure
reasons are predominantly non-technical

Overemphasis on technology

planning teams are made up of technical staff, emphasize technical issues in planning and ignore managerial issues

planning teams are forced to deal with short-term issues, have no time to address longer-term management issues

Rigid work patterns

it is difficult for the planning team to foresee necessary changes in work patterns

a formerly stable workforce will be disrupted
some jobs will disappear
jobs will be redefined, e.g. drafting staff reassigned to digitizing

some staff may find their new jobs too demanding
former keyboard operators may now need to do query operations
drafting staff now need computing skills

people comfortable in their roles will not seek change
people must be persuaded of the benefits of change through education, training programs

productivity will suffer unless the staff can be persuaded that the new job is more challenging, better paid etc.

Organizational inflexibility

planning team must foresee necessary changes in reporting structure, organization''s "wiring diagram"

departments which are expected to interact and exchange data must be willing to do so

Decision-making procedures

many GIS projects are initiated by an advisory group drawn from different departments
this structure is adequate for early phases of acquisition but must be replaced with an organization with well-defined decision-making responsibility for the project to be successful

it is usually painful to give a single department authority (funds must often be reassigned to that department), but the rate of success has been higher where this has been done
e.g. many states have assigned responsibility for GIS operation to a department of natural resources, with mandated consultation with other user departments through committees

project may be derailed if any important or influential individuals are left out of the planning process

Assignment of responsibilities

assignment is a subtle mixture of technical, political and organizational issues
typically, assignment will be made on technical grounds, then modified to meet pressing political, organizational issues

System support staffing

a multi-user GIS requires at minimum:
a system manager responsible for day-to-day operation, staffing, financing, meeting user requirements
a database manager responsible for database design, planning data input, security, database integrity

planning team may not recognize necessity of these positions

in addition, the system will require
staff for data input, report production
applications programming staff for initial development, although these may be supplied by the vendor

management may be tempted to fill these positions from existing staff without adequate attention to qualifications

personnel departments will be unfamiliar with nature of positions, qualifications required and salaries

Integration of information requirements

management may see integration as a technical data issue, not recognize the organizational responses which may be needed to make integration work at an institutional level

E. STRATEGIES TO FACILITATE SUCCESS
Management involvement

management must take a more active role than just providing money and other resources

must become actively involved by supporting:
implementation of multi-disciplinary GIS teams
development of organizational strategies for crossing internal political boundaries
interagency agreements to assist in data sharing and acquisition

must be aware that most GIS applications development is a long-term commitment

Training and education

staff and management must be kept current in the technology and applications

Continued promotion

the project staff must continue to promote the benefits of the GIS after it has been adopted to ensure continued financial and political support

projects should be of high quality and value

a high profile project will gain public support
an example is the Newport Beach, CA tracking of the 1990 oil spill (see Johansen, 1990)

Responsiveness

the project must be seen to be responsive to users needs

Implementation and follow-up plans

carefully developed implementation plans and plans for checking on progress are necessary to ensure controlled management and continued support

follow-up plans must include assessment of progress, include:
check points for assessing project progress
audits of productivity, costs and benefits

REFERENCES
Chrisman, N.R., 1988. "The risks of software innovation: a case study of the Harvard lab," The American Cartographer 15:291-300.

Foley, M.E., 1988. "Beyond the bits, bytes and black boxes: institutional issues in successful LIS/GIS management," Proceedings, GIS/LIS 88, ASPRS/ACSM, Falls Church, VA, pp. 608- 617.

Forrest, E., G.E. Montgomery, G.M. Juhl, 1990. Intelligent Infrastructure Workbook: A Management-Level Primer on GIS, A-E-C Automation Newsletter, PO BOX 18418, Fountain Hills, AZ 85269-8418. Describes issues in developing management support during project planning and suggests strategies for successful adoption of a project.

Johansen, E., 1990. "City''s GIS tracks the California oil spill," GIS World 3(2):34-7.

King, J.L. and K.L. Kraemer, 1985. The Dynamics of Computing, Columbia University Press, New York. Presents a model of adoption of computing within urban governments, and results of testing the model on two samples of cities. Includes discussion of adoption factors and the Nolan stage model.

Kuhn, T.S., 1970. The Structure of Scientific Revolutions, University of Chicago Press, Chicago.

Nolan, R.L., 1973. "Managing the computer resource: a stage hypothesis," Communications of the ACM 16:339-405.

Rhind, D.W., 1988. "Personality as a factor in the development of a discipline: the example of computer- assisted cartography," The American Cartographer 15:277- 90.

GIS STANDARDS
A. INTRODUCTION
Reasons for standards
Standards organizations related to GIS

B. TYPES OF STANDARDS FOR GIS
Operating system standards
User interface standards
Networking standards
Database query standards
Display and plotting standards
Data exchange standards

C. IMPLEMENTING STANDARDS
Start-up costs
Management support
Technical tradeoffs
Potential for security risks
Innovation

D. WHAT TO STANDARDIZE?

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

UNIT 69 - GIS STANDARDS

A. INTRODUCTION

this unit is based largely on information from Exler (1990) and Tom (1990)

standards are needed as GIS users attempt to integrate their operations with other hardware, GISs and data sources

challenge is to get industry, government and users to implement and promote the use of standards

many standards are set simply through common use, though major attempts are currently being made to develop broad ranging national and international standards

Reasons for standards

1. Portability of applications
need the ability to move developed applications to new hardware platforms in order that development efforts are not duplicated and can be shared

2. Data networks
need ability to access digital data which is distributed through various offices, agencies, states and even countries

3. Common environments
if applications use similar operating environments, learning curves are reduced and productivity is increased

3. Cost of program development
standards are important to software developers as they reduce the need to develop interfaces for many different data formats, operating systems, plotters, etc.

Standards organizations related to GIS

overhead - Standards Organizations

the following is from Exler (1990)
ANSI - American National Standards Institute

approves standards for US industrial and commercial sectors

DCDSTF - Digital Cartographic Data Standards Task Force

combines FICCDC-SWG and NCDCDS for digital cartographic standards

FICCDC-SWG - Federal Coordinating Committee on Digital Cartography - Standards Working Group
formed by the Interagency Coordinating Committee - Office of Management and Budget to serve as a focal point for the coordination of digital cartographic activities

FIPS - Federal Information Processing Standards
official source of information processing standards for federal departments and agencies

IEEE - Institute of Electrical Electronics Engineers
develop standards for a broad range of subjects, including information processing

ISO - International Standards Organization
approves standards for the international community through national standards bodies such as ANSI

NCDCDS - National Committee for Digital Cartographic Data Standards
formed by ACSM (American Congress on Surveying and Mapping) and funded by USGS

NIST - National Institute of Standards and Technology
formerly the National Bureau of Standards
oversees standards activities for the government
recently opened a GIS laboratory

OSF - Open Software Foundation
a vendor consortium of IBM and Digital Equipment Corporation

UNIX International - a vendor consortium of AT&T and SUN
X/Open - a nonprofit independent consortium of 19 computer manufacturers representing 160 software developers from 17 countries attempting to define standards for a complete computing environment

GIS STANDARDS
B. TYPES OF STANDARDS FOR GIS
Operating system standards

for micro-computers, most GIS use the DOS operating system, though applications are being written for OS/2 and Macintosh

UNIX appears to be the current popular operating system for the powerful workstations and mainframe computers, though there are several other well accepted and newly developing options

User interface standards

affect the "look and feel" of GIS programs

windowing is becoming popular as a standard in GIS and as well as most other applications

at the micro-computer level:
for PC computers, Presentation Manager available under the OS/2 operating system and as Microsoft Windows in DOS is becoming the standard

Macintosh operating system has always been a windowing environment

X-Windows is the de facto windowing standard for UNIX and other mainframe and workstation operating systems
this allows different vendor''s hardware to support a common interface in a networked environment

Networking standards

are critical to allow communications between remote computers
networked environments are increasingly popular for GIS as the technology and data becomes widely used within organizations

Database query standards

SQL (standard query language) is emerging as a standard across the data processing spectrum, though in its current form it is limited in its ability to handle spatial queries

Display and plotting standards

several standards have emerged in this area simply as a result of the popularity of specific hardware devices

these include:
CalComp and HPGL - line plotter formats
Postscript - raster, page oriented graphics

Data exchange standards

the largest standardization effort is currently being directed at this area

US Federal government has recognized the need to exchange data between different agencies (see Unit 68) and has formed committees to examine aspects of this
note work done by NCDCDS, FICCDC and DCDSTF
current efforts are directed towards the development of the Spatial Data Transfer Specification (SDTS) (see Tom (1990) for more details on SDTS)
the Defence Mapping Agency''s digital cartographic data standard DIGEST is part of an effort to establish standards within the international defense community, e.g. NATO

however, there are several common data exchange formats currently in use (see GIS World, 1989)

these include: overhead - Spatial data exchange formats
USGS DEM- Digital Elevation Model

format used by the USGS since early 1980s for gridded elevation data
allows a single attribute per cell

USGS DLG - Digital Line Graph

all features of the USGS quadrangle map series are supported by this format
is the most widely used format for exchange of digital cartographic data
used primarily for coordinate information though it does support alphanumeric attributes

GBF/DIME - Geographic Base File/Dual Independent Map Encoding (Census Bureau)
original Census Bureau digital files developed in early 1970s
allows both coordinate and attribute data

TIGER - Topologically Integrated Geographic Encoding and Referencing (see Units 8 and 29)
IGES - Initial Graphic Exchange Specifications (National Bureau of Standards)

used extensively for the exchange of CAD and CAM data
only one attribute per feature

SDDEF - Standard Digital Data Exchange Format (NOAA)
primarily used to exchange data between NOAA, Defense Mapping Agency and the Federal Aviation Administration
only supports point data

SIF - Standard Interchange Format (Intergraph)
developed to support exchange of data between Intergraph and other systems
popular data exchange format for many GIS packages

MOSS - Map Overlay and Statistical System (US Fish and Wildlife)
originally developed as part of the MOSS GIS
a non-topological format for vector data with translators to several common spatial data formats
now used by several federal and local government agencies

DXF - Digital eXchange Format (Autodesk)
developed by Autodesk, Inc. for AutoCAD
like SIF is a popular data exchange format for many GIS packages

C. IMPLEMENTING STANDARDS

several issues are related to the implementation of GIS standards

Start-up costs

implementation of a standard can incur substantial costs in terms of money and time

will be major short-term costs related to user training and reprogramming of software

Management support

management needs to recognize the positive impacts of standards on productivity and system costs and be willing to commit short-term resources for retraining and reprogramming

Technical tradeoffs

adopting of standards require tradeoffs between functionality and performance

standards provide for broad functionality
e.g. adopting software that uses a standard data exchange format allows access to a broad range of data sources
e.g. adopting a standard operating system provides access to a large library of existing applications

however, standards by their very nature, do not allow fine tuning to specific hardware or applications
e.g. plotter standards may not make the optimum use of the hardwired capability of your plotter

some de facto standards are neither efficient nor the best available
many exist simply due to the original popularity of the hardware or software, even though they may no longer be the state-of-the-art

Potential for security risks

wide availability of common operating systems allow for misuse and exploitation
e.g. the spread of computer viruses depends on common operating systems

Innovation

broadly accepted standards make it very difficult to introduce innovations

D. WHAT TO STANDARDIZE?

the majority of standards effort in GIS to date has concerned data formats

standards such as DIGEST provide standard record layouts, coding schemes

although formats are standardized, these efforts deal primarily with the structure of the data, and not with its meaning

data may be written into a standard format for transfer, and thus be readable by some other system, but it may still be virtually meaningless without extensive documentation

the SDTS goes well beyond format standards by defining standard meanings for terms
e.g. SDTS attempts to remove the confusion over the use of arc, link, edge, chain, segment in GIS by establishing a standard term for every type of object

the USFS effort to establish a corporate database may similarly yield standards of meaning, e.g. standardized definitions of GIS layers, at least within this organization

still missing is a standard of data models that would provide standard ways of representing geographic phenomena
e.g. for digital elevation data, should the standard include all of contours, DEMs and TINs?
should there be standard resolutions for DEMs?

should there be standards of vertical accuracy?

also missing are standards of data accuracy for GIS
map accuracy standards deal only with cartographic features
e.g. a GIS standard for digital elevation data might specify the accuracy of elevation for any point in an area, not the accuracy of positioning of a contour

such standards would provide the GIS user with expectations about the reliability of the database as a window on the world, rather than a window on source documents, or a window on transferred databases

REFERENCES
GIS World, 1989. "Spatial data exchange formats", The GIS Sourcebook, GIS World, Fort Collins, CO, pp. 122-23.

Exler, R.D., 1990. "Geographic Information Systems Standards: An Industry Perspective", GIS World Vol 3(2):44-47.

FICCDC, 1988. "The proposed standard for digital cartographic data", The American Cartographer, Vol 15(1).

Thorley, G.A., 1987. "Standards - Why bother?," USGS Open File Reports, 87-314. Abstracts of papers presented at the GIS Symposium.

Tom, H., 1990. "Geographic Information Systems Standards: A Federal Perspective", GIS World Vol 3(2):47-52.

EXAM AND DISCUSSION QUESTIONS

1. Standards can be imposed from above, or emerge through consensus. Discuss the pros and cons of top-down and bottom-up approaches to GIS standardization.

2. How successful do you think DXF can be as a GIS exchange standard? What aspects of information exchange does it standardize?

3. Review the approach taken by SDTS to standardizing the use of the term "chain".

4. "SDTS is a standard for cartography, not GIS" - discuss.

LEGAL ISSUES
A. INTRODUCTION

The legal regime

is the structure of laws, regulations and practices within which society operates

components of the legal regime:
statutes - laws, bills, acts passed by legislative bodies
administrative regulations - rules established by branches of government, under the authority of statutes (enabling legislation)
common law - past decisions of courts and judges have the effect of statute law unless over-ridden
professional and related practices - the system of conventions and traditions which rely on law for their ultimate authority
decisions of some bodies, e.g. medical professional associations, may have force of law in some cases

Function of law in society

to resolve disputes, e.g. over ownership of land, locations of boundaries etc.

maintain order

establish a framework for common expectations about the events of life

secure efficiency, harmony and balance in the functioning of government

protect against excessive or unfair power in government or the private sector

assure an opportunity to enjoy the minimum decencies of life

B. INFORMATION AS A LEGAL AND ECONOMIC ENTITY

information is often considered as a commodity
is information similar or dissimilar to other traded commodities?

Quantity

value of a traded commodity is generally proportional to quantity
difficult to measure quantity of information

quantity of geographic information might be based on geographical area covered
e.g. US Bureau of the Census charges a flat fee for each county''s TIGER file irrespective of the size or population of the county
some Western counties with population &LT100 cost the same as a New York borough with population in the millions

can buy specific information, e.g. health information from a doctor, directory information for phone, information in a book
however, measuring the amount of information acquired for a given fee is a problem which distinguishes information from other products

Property rights

who owns information?

precondition for operation of a market is that property rights are created and can be protected
difficult to assign property rights to information
degree of appropriability (extent to which something can be owned) varies for information

in the absence or limitation of property rights, suppliers will either find it unprofitable to produce the product, or if produced, tend to underproduce it
the patent system tries to assign ownership to information to create incentives for its production

if the producer cannot prevent use in the absence of payment, then the market mechanism cannot operate properly

Public goods

information is sometimes a public good
in economics, a good is public if its use by one person does not prevent or curtail its use by another
e.g. use of a car by one person denies its use by another

a map is a common form of public good produced by governments at all levels

other forms of spatial data produced by governments, e.g. Census statistics, are also public goods

Value of information

decision-makers seek data at reasonable cost to:
reduce uncertainty in planning, investment, development decisions
provide new opportunities

information has value because it can be used to make decisions about the allocation of scarce resources

lack of information results in decisions being made under uncertainty
risk of bad decisions has an associated cost
risks can be averted using information
thus, value of information may equal the expected cost of the averted risk

difficulties that apply to the definition of information as a legal entity also apply to the development of an economic model of the use of information and its value

Information as evidence

spatial information is a form of scientific evidence, contributes to resolution of conflict

maps may be used as evidence in court during litigation
all parties in litigation may have different views on maps and their interpretation

in fact, the idea that a neutral agency might create a non-contentious set of spatial data for decision- making is inconsistent with the American legal system where individuals have the right to question the facts that are used to determine rights and interests

as evidence, it is often assumed that spatial information has been created using scientific measurement, methods
the assumption may not be true:
some maps rely heavily on subjective interpretation
maps contain errors
maps can be used in inappropriate ways
map designer has no control over map''s use

C. LIABILITY

liability is determined in cases where a person alleges harm from a poorly made decision

land management decisions may be later shown to have been based on inaccurate information
e.g. when spatial information is used, decisions are often made without expert knowledge of the forms and accuracy of spatial information and associated processes of data collection and compilation
policy formation process is often based on small- scale, generalized maps
policy is often implemented using large-scale, detailed maps
the problems of scale change and generalization are rarely understood

suppliers and users of spatial information are concerned about liability for such decisions

there are three types - contract, negligence, strict product liability

Contract liability

arises where the terms of an agreement between producer and user allocate responsibilities

contract provisions are upheld by courts

important for those who make contracts for production of maps, digitizing, data conversion, software maintenance

a contract should set standards for products and services
difficult to establish standards for data accuracy

cost of checking data to see if it meets standards - does every data item have to be checked?

where computer or product is involved:
seller should carefully describe standards that apply
buyer should obtain warranty that product is suitable for intended purpose
purpose must be clearly laid out in contract

however, written contracts between public agencies and users of their maps are rare

Negligence

arises where a person fails to exercise the standard of reasonable care normally expected of someone in the same situation, and harm results
courts and legislatures have defined "reasonable care" for many situations
map-producers and users are often covered by this type of standard

computer-based information presents additional problems
so-called "computer error" is often found to be a failure of system owners or operators to respond to known bugs
in these circumstances it seems likely that the basic principles of "reasonable care" will apply
failure to select and maintain the hardware and software that executes the required tasks may constitute negligence
failure to market capabilities of products accurately may be negligence

Strict product liability

user is required to show that the product is of an inherently dangerous nature
e.g. recent lawsuits over children''s toys

does not require that the injured party present evidence that the producer acted improperly

D. LIABILITY SCENARIOS

errors will occur in information services, data and products
the appropriate level of care in design and operation of systems is difficult to establish

three types of errors are important in liability:

Errors in represented location

typically result from measurement and data handling mistakes

national map accuracy standards prescribe a reasonable frequency of errors in locations

court is likely to consider the process of data entry and whether a reasonable level of care was established and used in design and implementation of the system, and emphasized in training
in the case of Indian Towing Co. vs United States, federal government was held to have negligently failed to maintain a lighthouse whose location was marked on charts and whose character was described in the official Light List
in Reminga vs United States the federal government was held to have inaccurately and negligently depicted the location of a broadcasting tower on an aeronautical chart, contributing to the mistakes and fatalities in an airplane accident

Representations of error-free data

e.g. Aetna Casualty and Surety Company vs Jeppeson and Company
asserted that fatal plane crash resulted from defective aeronautical chart published by Jeppeson and Company
chart by Jeppeson and Co. depicted the instrument approach procedure to an airport - information based on tabular data from the Federal Aviation Administration
parties did not dispute the accuracy of the data on the chart but rather the graphic depiction of it
the chart showed two views of the approach, one from above, one from side
two views appeared to have same scale on chart - actually scales differed by factor of 5
court found the crew were misled by representation

Unintended and inappropriate uses

user lacks expertise in interpreting map, and has no access to map''s designers and compilers who could explain it

e.g. in Zinn vs State:
the state owns all land below the Ordinary High Water Mark (OHWM) of a lake
evidence from botanists and surveyors at a regulatory hearing had established an elevation of 990 ft for the OHWM for a certain lake

the report of the hearing included a 1:24,000 USGS quadrangle showing the OHWM, and thus the extent of the state''s land, defined by the 990 ft contour on the map
an adjacent landowner sued the state for the harm resulting from temporarily claiming part of her land (temporary in that the agency subsequently rescinded the report)
the state was held liable - the 990 ft contour had been drawn for purposes other than definition of property rights but its use to depict the OHWM also implied a specific boundary of the property in question - i.e. the property boundary at the lake is defined by the OHWM, not a line on a map

inappropriate uses of data are likely to increase with GIS technology unless safeguards can be built into systems

E. ACCESS AND OWNERSHIP

the general goal of law in the areas of access and ownership is to make as much publicly held data available as possible, subject to reservations about personal privacy and commercial value

Privacy and Confidentiality

these laws protect individual and commercial aspects of property from excessive government and private power

privacy is a recently recognized concept, largely governed by common law

protection is provided through statutes that require data and information gatherers and managers to:
provide physical security to personal and property records
design systems that prevent inappropriate access to publicly held records
experience with computer "viruses" casts doubt on possibility of complete security given present level of interconnection between computer systems

privacy rights problems arise when information is converted to digital form
i.e. property ownership records in paper files do not allow easy searching
in digital form it is possible to search these records for any combination of attributes
this may produce publicly available information that infringes on privacy rights of individuals

balance between public access to information and individual privacy rights changes when publicly held data becomes digital

Open Records Laws, Freedom of Information Act

Open Records Laws (states) and Freedom of Information Act (federal)

designed to provide citizens with reasonable access to publicly held records
provides citizens with the basis for understanding government functions and actions which concern them

Open Records Laws define
what are records
those records not open to general scrutiny
the conditions under which copies can be made available

Copyright

designed to protect the commercial, proprietary aspect of creative works

in some countries (e.g. UK) data from public agencies can be copyright
copyright laws make it easier to establish ownership, and therefore value, of information
in the UK public data can be sold by government agencies (e.g. digital data by the Ordnance Survey) at prices which allow full cost recovery
in the US this is impossible because data produced by federal agencies cannot be copyright - prices of digital data typically cover only direct costs of copying - no control over resale of data by corporations

Conflict of laws

guarantees provided by open access can conflict with protections of privacy and copyright

concept of public information involves a complex balance between access, ownership and economic factors

REFERENCES
Aronoff, S., 1989. Geographic Information Systems: A Management Perspective, WDL Publications, Ottawa. Chapter 8 examines responsibilities for accuracy and access to information contained within GIS.

Epstein, E.F., 1988a. "Litigation over information: the use and misuse of maps," in Proceedings, IGIS: The Research Agenda, NASA, Washington DC, 1:177-84. Good overview of legal issues in the context of conventional and digital mapping.

Epstein, E.F., 1988b. "Legal aspects of global databases," in H. Mounsey and R.F. Tomlinson, editors, Building Databases for Global Science, Taylor and Francis, Basingstoke, UK. Introduces international legal issues and reviews the legal problems involved in building global databases.

Mackay, E., 1982. Economics of Information and Law, Kluwer Nijhoff, Boston. Chapter 5 discusses information, law and economics.

DEVELOPMENT OF NATIONAL GIS POLICY
INTRODUCTION

GIS is a coalescence of many interests and fields:
automation in the surveying and mapping industry
automation of facilities management (AM/FM)
demand for analysis and modeling to support resource management and planning
interest in use of digital databases in marketing, transportation
interest in applying the products of remote sensing
need for automation of land records, and interest in multipurpose cadaster (MPC)

each of these fields have their own societies and institutions, regulatory agencies in government, academic disciplines etc.

coalescence leads to pressure for new institutional structures
new series of conferences, e.g. GIS/LIS (San Antonio, 1988; Orlando, 1989) - jointly sponsored by surveyors and mappers (ACSM), urban managers and planners (URISA), geographers (AAG), private and public facility companies (AM/FM International)
new structures in government - e.g. interdepartmental committees in some states, federal government
new magazines, journals, textbooks, courses

a clear national strategy could:
speed the process of coalescence, e.g. by reorganization of government departments
avoid duplication, mistakes, false starts
provide much needed support for research and development
promote training and education programs

compare US attempts to develop national policy for MPC (see Unit 54 references)

this unit looks at one country''s efforts to develop national policy
the United Kingdom
particularly, the role of the "Chorley Report" (DOE, 1987)

B. BACKGROUND
Predecessors

Ordnance Survey Review Committee
reported in 1979
covered role of digital technology within premier mapping agency

House of Lords Select Committee on Science and Technology
reported in 1984
first recognition of potential role of GIS technology in integrating all forms of geographically referenced data
raised awareness of obvious potential for duplication, inconsistency and incompatibility between different forms of geographical data
led to formation of Committee of Enquiry (Chorley Committee)

Charge to the committee

"to advise the Secretary of State for the Environment within two years on the future handling of geographic information in the UK, taking account of modern developments in information technology and market needs"
similar to Congress''s 1989 charge to Department of the Interior in Public Law 100-409 (see references at end of Unit 53) with reference to land information (more narrowly defined than geographic information)

Scope

problems with interpretation of term "geographic information" in the charge

thus, the committee included all information which can be related to specific locations on the Earth
this is very broad - includes indirect as well as direct spatial referencing

in fact, committee included:
land and property data
resource data - land use, ecological, environmental, etc.
infrastructure data - utilities, facilities
socioeconomic data - census statistics, health, etc.

Membership of committee

11 members

65% from the private sector - vendors, utilities, market research companies, etc.

chair (Lord Chorley) is a member of the House of Lords, accountant with major international management consultancy, familiar with subject, in part from work on previous House of Lords committee

Role of committee

many systems were in process of rapid development in UK in all these areas
many were dependent on government agencies as sources of data, standards, policy

committee''s charge required it to define the role of government in fostering, coordinating, supporting system development

identified the factors which are important in determining the way the technology is adopted and developed:
the costs of adoption, particularly in staffing, training, equipment
variations in the availability of data
need for development of faster, more flexible, easier to use tools
variation in awareness among managers of the benefits of GIS technology
shortage of skilled personnel

needed to define what role government, national policy can play in controlling these factors

Comparison with North America

evidence presented to the Committee indicated that the UK lagged behind North America in many of these areas
lack of training and awareness was more critical
much of the technology had been developed in North America

these problems are likely even more severe in other countries, e.g. Eastern Europe

Relationship to other technologies

GIS is a comparatively small market segment
many key technical developments originated in other areas
e.g. peripherals developed for larger CAD markets

other technologies may be less affected by non-technical factors

lack of training less of a problem in more mature markets like CAD
other technologies may be less innovative, require less reorganization, e.g. word processing

C. RECOMMENDATIONS
Digital mapping

in UK, Ordnance Survey has copyright over its products, virtual monopoly over large-scale mapping

government policy requires OS to stress cost recovery

increasing demand from utilities for digital versions of basemaps
accuracy levels required by utilities were substantially below those of OS
private sector can produce digital data to utilities'' specifications at substantially lower cost than OS

OS''s monopoly and copyright are under pressure from private sector

committee encouraged OS to seek joint agreements with the private sector to relieve pressure

Availability of data

first comprehensive list of geographically related data holdings in government was prepared for committee

evident that data were not sufficiently accessible to users outside government
because of real or imagined concerns for privacy and confidentiality
because government rules prevented departments from repackaging data and receiving financial benefit from sale

Linking data sets

difficult because of e.g. incompatible reporting units for social statistics

committee recommended maximizing use of common geographical referencing systems
extend postal codes from limited application in mail to general role as reporting units for statistics of all kinds

need for further development of data transfer standards

Awareness, education and training

recommended setting up demonstration projects

need for expanded training courses, new teaching packages, greater role in Business education

Research and development

generally, the report stressed the non-technical impediments to GIS adoption

need for R&D in both fundamental and applied areas
particular stress on the development of intelligent Knowledge Based Systems which incorporate rules derived from human experience
development of better tools for estimating reliability of information from GIS

Role of government

government is one of the biggest users of GIS, also the biggest supplier of geographical data
its level of commitment is critical to the development of the field

potential roles of government in:
development and implementation of standards
legislation on relevant issues, e.g. copyright
funding education programs
carrying out or funding R&D
increasing accessibility of data

many submissions to committee urged establishment of a government organization to coordinate GIS

committee recommended a Centre for Geographic Information (CGI) as:
promoter of technology
advisor on national GIS policy
focus for users

D. GENERAL FINDINGS

emergence of a discipline through coalescence of common interests

usefulness of maps is increased enormously by digitizing, but digital systems allow access to vast stores of non- map data as well

geographical data for small areas is very useful in social planning, but government must play an important role in handling such data to prevent invasions of privacy

it is impractical to assemble all geographical data in one national archive - the role of government should be to increase access to geographical data through directories, compatibility etc.

the commercial opportunities of GIS technology will continue to expand rapidly and internationally

change in UK government policy since 1979 has had a profound effect on data collecting agencies because of pressure for cost recovery

E. OUTCOMES

the key technical recommendations - role of postcodes, production of digital basemaps - were rejected in the official government response

government also rejected the recommendation for a Centre for Geographic Information (in effect, rejected the recommendation that it take the lead in organizing and funding the Centre)
with no new organizational structure, there is doubt about whether the more far-reaching recommendations can be implemented
efforts are under way to form an organization outside government to play at least part of the role intended for the Centre

many non-technical recommendations were accepted and many are being implemented by relevant departments
e.g. restructuring of legislation to make it easier to share and access data

the impact of the committee''s meetings, background work for submissions, publicity given to the report may have had more impact than the recommendations

possibility of similar exercises in other countries, e.g. BLM report under PL 100-409

F. RELATED ACTIVITIES IN OTHER COUNTRIES

different countries have focused interest in the development of GIS in different ways (the following based on information in Shepherd et al, 1989)

several aspects vary from country to country:
perception of priorities in GIS
scale of funding
governmental/institutional context
extent of involvement of the private sector
emphasis upon applied as opposed to fundamental research

other national initiatives include:

UK Regional Research Laboratories

established before the completion of the Chorley Report by the UK Economic and Social Research Council

objectives include carrying out basic and applied GIS research, training, providing data services and promotion of the use of GIS in general

U.S. National Center for Geographic Information and Analysis

funded by the National Science Foundation

created to promote basic research in GIS and to improve the education of GIS professionals

The Netherlands

research consortium funded by the Netherlands Science Research Council for four years

at the University of Utrecht, the Technical University of Delft, the Agricultural University of Wageningen and the International Training Center at Enschede

France

creation of the Maison de la Geographie in Montpelier providing a network linking 49 research teams in France

REFERENCES
Department of Environment, 1987. Handling Geographic Information. Her Majesty''s Stationery Office, London. The full Chorley Report.

Lord Chorley, 1988. "Some reflections on the handling of geographical information," International Journal of Geographical Information Systems 2:3-10. Views from the chair, including a summary of the report''s conclusions.

Rhind, D. and H. Mounsey, 1988. "The Chorley Committee and "handling geographic information"," Proceedings, Third International Symposium on Spatial Data Handling. International Geographical Union, Columbus, Ohio, 407-21. Excellent summary of the Chorley Committee and its report.

Shepherd, et al, 1989, "The ESRC''s Regional Research Laboratories: An Alternative Approach to the NCGIA?," AutoCarto 9, Sydney, Australia.

Tomlinson, R.F., 1987. "Current and potential uses of geographical information systems: the North American experience," International Journal of Geographical Information Systems 1:203-18. Based on a background paper for the Chorley Committee which appears in the report''s appendices.

Ventura, S.J., 1990. "Federal land and geographic information system activities," Photogrammetric Engineering and Remote Sensing 56(5):631-4. A useful review of the need for coordination and standardization in the federal government.

GIS AND GLOBAL SCIENCE
A. INTRODUCTION

B. SOURCES OF GLOBAL DATA
Remotely sensed imagery
Terrestrial-based sources

C. CHALLENGES TO DATA INTEGRATION
Multiple sources
Data volumes
Geometric rectification, geographic referencing
Issues of data storage
Database model
Documentation, access, dissemination, archiving
Internal dataset consistency
Merging terrestrial and satellite data
In summary

D. EXAMPLES OF DATABASES AT GLOBAL SCALES
CORINE
UN Environment Program GRID project
Global Change Diskette Project
Digital Chart of the World

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES

UNIT 72 - GIS AND GLOBAL SCIENCE

Compiled with assistance from Helen Mounsey, Birkbeck College, University of London

A. INTRODUCTION

why do we need GIS and databases for the globe?

ever-increasing concern over the quality of the earth''s environment
frequent press reports on issues such as global warming and the greenhouse effect, the ozone hole, deforestation and water pollution

these are global issues, but we can also identify disasters, which, although local in origin, have pronounced continental or global scale consequences

for example, the Brundtland Report noted that during the 900 days the World Commission on Environment and Development was at work:
the African drought put at risk the lives of 35 million people, and probably killed up to 1 million of them
the leak at a chemical factory in Bhopal, India, killed 2000 people and injured 200,000 more
the explosion at a nuclear power plant at Chernobyl, USSR, caused environmental damage throughout Europe
a chemical fire in Switzerland caused toxic materials to be transported by the Rhine as far as the Netherlands
at least 60 million people died of diarrhoeal diseases caused by malnutrition and dirty water
of these, only the Bhopal incident could be argued to be local in its effects

there is clearly an ever-greater need to monitor processes at a global scale in order to gain knowledge of the earth''s processes and how these affect and are affected by human activity
this knowledge is very sketchy at present

two developments contribute to improving the situation:
technical development and ever increasing speed and power of digital computing
increasing sources of data for use in environmental modeling

the ultimate aim is a global database and associated GIS (access and analysis system)
at a large enough scale (e.g. > 1:250,000) and with fine enough resolution (e.g. &LT 250 m)

to enable environmental scientists to develop models which replicate, as near as is possible, the earth''s processes
would assist in data integration and visualization at global scales

B. SOURCES OF GLOBAL DATA

global databases are derived from two sources
remotely sensed imagery
terrestrial-based sources - analog maps, statistics and digital data recording

Remotely sensed imagery

aircraft and (more usually) satellite-borne sensors provide much information at a global scale for environmental analysis

characteristics:
usually global (or near-global) coverage
repeated coverage over intervals of hours to days (depending on sensor) enables construction of time series
spatial resolution of data is improving, e.g. for example Landsat MSS - 80 m, SPOT - 10 m
very many existing sources of remotely sensed imagery, the largest contributor being NASA

major new development is the NASA Earth Observing System (EOS)
comprehensive information system - includes data processing, access and analysis capabilities as well as hardware
aims to be international in system provision, use and benefit
will provide consistent, long-running datasets into the 1990s and beyond
EOS is based on the collection of data from two proposed NASA polar platforms, one European Space Agency platform and one Japanese polar platform (a polar platform is a satellite in an orbit which passes over both poles)
this will generate a massive dataflow (estimated at 1012 bytes (1 terabyte) per day)

Terrestrial-based sources

Analog maps and tabular statistics

digital data derived from maps are an important contributor to global databases, and, as a data source, complementary to remote sensing

usually based on ground survey or checking, digitized cartographic data can provide:
human assigned attributes (e.g. place names or administrative boundaries)
a more useful / detailed classification of features
a historical (pre ''advent of remote sensing'') data source

to be useful the maps from which these data are derived should be:
part of a series which offers global coverage and is based on common standards of accuracy of source material and common cartographic conventions
at scales larger than 1:1 million - smaller scales are too highly generalized to represent reality with any degree of utility and are of use only for general reference

maps are a frequent source of data on topography, soils, geology etc.

tabular statistics originate from many national organizations (e.g. census gathering agencies) and are collected by international organizations into databanks (e.g. the UN, World Bank, OECD, etc.)

mostly this provides a source of socio-economic data on the ''human'' element in global modeling
Digital data recording

this source of data results from automatic data logging
mostly in the geo-physical and climatological sciences
collected mostly on a national basis
then assembled into international databases

some examples include:
the World Data Centers
27 centers worldwide who coordinate the global collection of data
determine standards for collection and documentation
hold multiple copies of the resultant datasets
distribute them freely throughout the world
emphasis on physical data
geology, geophysics, meteorology, atmospheric physics, oceanography
the World Meteorological Organization
under the World Weather Watch program
collects and supplies members with observational data and processed products for meteorological forecasting

there are many other such international organizations, gathered together under the auspices of the International Council of Scientific Unions
ICSU has endorsed the establishment of the International Geosphere Biosphere Project (IGBP), which has the long-term aim of describing the various processes which affect the Earth''s environment, and the manner in which they may be changed by human action

GIS AND GLOBAL SCIENCE
C. CHALLENGES TO DATA INTEGRATION
Multiple sources

global modeling and prediction will in most cases demand data from multiple sources

often there will be a mixture of remote sensing and analog input
remotely sensed data is global in coverage and updated frequently
remotely sensed data is most useful when calibrated with ground-based data
but, ground-based data often lacks global coverage and is updated infrequently

Data volumes

possibly the most pressing problem, especially as far as remotely sensed sources of data are concerned

volumes are potentially huge
surface area of earth is order 1014 sq m
single coverage of SPOT imagery at 10 m resolution is order 1012 pixels
assuming a single value / pixel is stored in 1 byte - then dataset is order 1012 bytes, or 1 terabyte

note that this is for only one coverage!
most application will require more than one coverage in time series, and possibly data from other sources as well

note that this is for current SPOT platform - future EOS will generate order one terabyte per day - this is 104 conventional magnetic tapes per day, or over a mile of shelves in a conventional tape library per week

a number of other problems are a consequence of such massive data volumes:

Geometric rectification, geographic referencing

global databases must be referenced to a common coordinate system if they are to be merged and manipulated from a number of sources
conventionally, latitude / longitude is used

the cost of installing a referencing system into remotely sensed datasets may be prohibitively high

Issues of data storage

simple raster data structures are inadequate if rapid access is required for browsing and retrieval

possible solutions include:
vector - but spatial relationships in the data must be stored (which increases data volumes further) or computed every time (which increases access times further)
hierarchical - structures based on recursive subdivision of the earth''s surface
various forms of data compression

Database model

must be multi-purpose and global scale

the number of possible relationships is large

the object definition is inexact (what may be a point at one scale is an area at larger scales)

Documentation, access, dissemination, archiving

there is a not-insignificant administrative problem in devising methods of user access to global databases
how to document datasets for international, multidisciplinary use?

how to enable the user to access a centralized database, probably over computer networks?
how to disseminate data and documentation - in what format and on what physical medium?

how to handle the costs of archiving such large databases?
are dual copies of every dataset strictly necessary?

Internal dataset consistency

have all the individual datasets being merged into a global database been collected and classified to consistent and high standards of accuracy, with a common definition of variables?

this is less of a problem with remotely-sensed data

can be a serious problem with terrestrial-based sources:
e.g. there is no consistently produced topographic map series of the world at a scale greater that 1:1 million
e.g. for soils, largest scale is 1:1.5 million, with considerable disagreement between soil scientists over a consistent, global classification of soil type
e.g. there is no strictly consistent definition of "total population" in the UK Census of Population through time (some years include visitors, etc.)
this is a problem within a well established national data source
when multiplied to international scales such problems may become insurmountable

Merging terrestrial and satellite data

what errors may be generated through this process?

how are missing data handled?

In summary

there are problems of data acquisition, in particular from terrestrial sources

there are problems of spatial and temporal inconsistency both within and between datasets

we have limited experience in handling very large databases, with consequent issues of structure, access and administrative support
the cost of all this may at least in the short term limit the development of global databases

the increasing application of GIS is, however, critical, to enable users to:
merge datasets from widely disparate sources
handle, analyze and map the results
model environmental processes at a global scale

D. EXAMPLES OF DATABASES AT GLOBAL SCALES

very few truly global environmental databases at present

some are developing at a continental scale, e.g. CORINE

CORINE

Co-Ordinated Information on the European Environment

established as a project in 1985, to build an environmental database covering the 12 Member States of the European Community (2.25 million sq km)

has now assembled a large number of consistent datasets into a centralized database

these include:
topography
soils
climate
nature reserves and other sites of scientific importance
water resources
atmospheric pollution

to be of use to policy makers, these are supported by socio-economic data

certain key findings from the project:
many datasets are unavailable for reasons of cost, confidentiality, administrative inadequacies or non- collection in certain countries
where available, they may mask massive discrepancies in data collection methods and huge internal inconsistencies
e.g. in a climatological dataset we find 8 methods of calculation for evapotranspiration, and 5 for maximum monthly temperature
enormous problems in merging datasets derived from maps of different scales and projections
merging larger scale dataset derived from remotely- sensed sources with smaller scale ones from terrestrial sources involved fundamental decisions on generalization vs. loss of detail
important issues of user access and data use, especially by unskilled users who may not understand the ''fuzzy'' nature of some of the datasets, and the likelihood of error propagation through application of GIS techniques

nevertheless, the project is expanding both in content and scale
likely to be subsumed into the Environmental Agency being established in the European Community for the provision of technical, scientific and economic information for use in environmental monitoring

UN Environment Program GRID project

GRID = Global Resources Information Database

established in 1980, and now based in Nairobi

GRID aims to:
establish global, regional and, in some cases, national environmental datasets of known quality
establish computer systems which can handle these
establish regional nodes for the dissemination of local sub-sets of data

train scientific staff in the use of this information

unlike CORINE, it draws heavily on data from remotely- sensed sources (from NASA), and also from other bodies such as FAO (Food and Agriculture Organization of United Nations), UNESCO (United Nations Educational, Scientific and Cultural Organization) and IUCN (International Union for the Conservation of Nature)

much of its work has been at regional or continental scale thus far
e.g. projects on sea level rise in the Mediterranean, and the distribution of elephants in Africa

moving towards global-scale studies
e.g. global deforestation project

Global Change Diskette Project

a project of the International Geosphere-Biosphere Program

a project designed to create and distribute to research groups, particularly the developing countries, medium- resolution digital data sets on diskettes for micro- computers

contains satellite imagery and complementary thematic data

Digital Chart of the World

sponsored by the Defense Mapping Agency
contracted to ESRI

source - the Operational Navigational Charts
coverage at 1:1 million of all the world''s land area
show elevation (500 m contours), cultural features, hydrography
maintained for air navigation
currently being digitized

is intended to be a general source of high resolution cartographic data for the globe
to be delivered in 1991 on CD-ROM

REFERENCES
Most of the material in this unit is extracted from various papers in:

Mounsey, H.M. (Ed.), 1988. Building Databases for Global Science, Taylor and Francis, London. See in particular papers by Simonett and by Peuquet in Part Two, and by Mooneyhan (on GRID) in Part Three.

Additional material:

Briggs, D.J. and H.M. Mounsey, 1989. "Integrating land resource data into a European Geographical Information System," Journal of Applied Geography 9:5-20. A good source on the CORINE project.

IGBP, 1988. Global change report #4: a plan for action, International Geosphere Biosphere Project, Stockholm.

Many other reports on global science are available from IGBP, ICSU and NASA.

DISCUSSION AND EXAM QUESTIONS

1. Discuss the relative advantages of the various spatial data models in global database building. Give examples of datasets which might be best suited to each type.

2. The greatest problems in the construction of global databases lie not with the datasets, hardware or software, but with the "liveware" - the human element of use (or abuse!) of the databases. Discuss some of the issues which might lie behind this statement.

3. Select one of the major disasters mentioned in this unit (or another known to you of similar magnitude). Discuss likely sources of data, and particular GIS techniques, which you would use to address this problem and its associated issues.

4. Some parts of the world are relatively rich in spatial data, and others are relatively poor. Examples of the latter include much of the Third World and Antarctica. Because of gaps in coverage and variable quality it could be argued that the globe as a whole is data-poor. Is spatial data handling technology more or less valuable in data-poor areas? Discuss the arguments on both sides of this issue.

GIS AND SPATIAL COGNITION
A. INTRODUCTION

B. SPATIAL INFORMATION FROM GIS
Components of the user interface
Fundamental questions

C. SPATIAL LEARNING
1. Developmental psychology perspective
2. Cognitive and environmental psychological perspective

D. FORM OF SPATIAL REPRESENTATION
Images or propositions?
Hierarchical or non-hierarchical structures?
Frames of reference

E. EFFECTS OF INTERNAL REPRESENTATION ON SPATIAL REASONING
Causes of errors in spatial reasoning

F. HOW DOES NATURAL LANGUAGE STRUCTURE SPACE?
Examples
Fuzziness

G. RELEVANCE TO GIS
Design of better user interfaces and query languages
Design of universal GIS systems
New database models
Improved data entry techniques
Expert Systems

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

UNIT 73 - GIS AND SPATIAL COGNITION

Compiled with assistance from Suchi Gopal, Boston University

A. INTRODUCTION

the next two units (73 and 74) examine advanced topics:
knowledge based techniques
spatial cognition

both are efforts to deal with the complexity of real GIS applications
complexity of real-world problems - the number of goals and issues which have to be dealt with in real problem- solving
complexity of the knowledge and rules which can be brought to bear on a problem
complexity of the man/machine interaction which ultimately determines the effectiveness of GIS

interest in both areas is high
progress is still largely in the research domain

B. SPATIAL INFORMATION FROM GIS

GIS are tools for supporting human decision-making

in applications such as car navigation systems, electronic atlases, GIS are tools to help people acquire spatial information, learn about geography
e.g. research is under way on the design of a portable GIS to help visually impaired people navigate in complex spaces
the information acquired through a GIS is used in this case to make simple route-finding decisions

the interface between the GIS and the user is a filter which determines how successfully information can be transferred

Components of the user interface

physical design - keyboard, mouse, tablet, color or monochrome, screen resolution, sound, speech recognition
how can these be combined to maximize transfer of information?
a car navigation system might use either a display screen with a map, or spoken instructions (e.g. turn left), or some combination - which is the most effective mode of communication between the GIS and the driver?

functionality - what set of operations is allowed?

control technique - commands, picking from menus, pointing at icons?

Fundamental questions

to design effective user interfaces we need to know more about how people learn, reason with spatial information

issues that need to be addressed are:
1. spatial learning

how is spatial knowledge learned or acquired by people?

2. form of internal spatial representation
what is the nature of people''s internal representation of space?
how is spatial information stored in the brain?
can this help us design ways of representing spatial data in GIS that lead to better user interfaces?

3. effects on spatial reasoning
how does this internal spatial representation affect decision-making and behavior
e.g. navigation, search for housing
how do people''s naive-models of space lead to errors in geographic reasoning?
how to design GIS user interfaces to minimize these errors?

4. natural language
how does the language people use to communicate (natural language) affect their ability to deal effectively with spatial information?
would GIS interfaces be more effective if they used natural language to describe spatial relations?

5. relevance to GIS
how should the results of research on these fundamental questions be used to improve GIS user interfaces?

this unit looks at each of these major issues

C. SPATIAL LEARNING

how do people learn about space and the objects and routes within it?

two disciplinary perspectives

1. Developmental psychology perspective

study of the qualitative changes in the cognitive and perceptual development of a child

most influential theory of spatial learning is the developmental stage theory proposed by Piaget
describes the stages in a child''s development of spatial skills

4 stages

sensorimotor stage - from birth to about 2 years - locations of all objects with reference to self
preoperational stage - 2 to 7 years of age - simple spatial problems are solved - an understanding of spatial relations between objects and self
concrete operational stage - 7 to 11 years of age - properties of Euclidean space are understood - more complex spatial problems are solved - e.g. concept of reversibility - n steps in one direction followed by n steps in the opposite direction returns one to the same place
formal operational stage - 11 to adulthood - child masters more abstract spatial problems - self and other objects located in an independent frame of reference

e.g. child begins to understand simple spatial relations - "in front of", "left of" in stage 2 - abstract coordinate systems such as UTM are not understandable until stage 4

2. Cognitive and environmental psychological perspective

studies the sequence of development of knowledge about a space by an adult

many alternatives proposed (see references) - following is a consensus:
landmark knowledge - ability to recognize certain features, but no knowledge of their locations or relationships between them
procedural knowledge - knowledge of certain routes, and the procedures necessary to navigate from one end to the other
topological knowledge - knowledge of how the known routes intersect and form a network - ability to combine parts of known routes into new routes
metric knowledge - ability to recall metric relations between locations - distances, angles - this level of knowledge is needed to reason about previously untraveled routes and shortcuts

D. FORM OF SPATIAL REPRESENTATION

how do our minds construct mental images of the world which somehow capture its basic properties and structure?

three major questions:
what form of representation? images or propositions?
what types of structures are used in the representation of spatial relations? hierarchical or non-hierarchical?
what frames of reference are used?

Images or propositions?

images preserve the visual properties of objects and relations between them
the "map in the head"

propositions provide abstract representation of both verbal and visual information
e.g. images of street maps, memory of street names, verbal directions

one form can be generated from the other if we assume the mind is capable of simple processing
compare the GIS''s ability to compute vector from raster
impossible to determine if one form is more accurate than the other as a model of the way people store spatial information

Hierarchical or non-hierarchical structures?

hierarchical structures represent spatial information in a nested fashion
local and global are different levels of a tree
compare hierarchical data structures, e.g. quadtrees

non-hierarchical structures have no clear differentiation of levels

Frames of reference

1. egocentric frame moves with the individual - objects are always represented in their relationship to the individual
2. environmental frame uses a local point as reference, moves when the individual moves from one local area to another

3. global frame is constant irrespective of the location of the individual

E. EFFECTS OF INTERNAL REPRESENTATION ON SPATIAL REASONING

internal representation can be identified by the pattern of errors it produces
errors in direction, distance, orientation, judgment of spatial relations have been studied

Causes of errors in spatial reasoning

lack of explicit representation in memory
not all information is perceived or remembered

use of incorrect procedures in storing or retrieving information
e.g. errors because of incorrect rotation of information to or from internal alignment
suppose the "mental map" has North at the top
errors can be made in reasoning about which way to turn when approaching a junction from the North

natural language used to describe spatial relations may be vague or context-dependent
e.g. "is north of" does not indicate how far or how exactly north

decay of information

processing constraints

limits to size of memory

storage and type of representation
e.g. Reno, NV is actually to the west of San Diego, CA, however, because CA is largely west of NV and the mind stores a hierarchical relationship between states and cities, we expect Reno to be east of San Diego

GIS AND SPATIAL COGNITION
. HOW DOES NATURAL LANGUAGE STRUCTURE SPACE?

natural language appears to affect the way we think and reason about space

basic components of spatial information - objects, relations between objects, motion - are roughly equivalent to nouns, locative expressions and verbs in natural language

however correspondence is not exact
natural language reflects the human view of the world, is more complex than abstract mathematical structures
it may be very difficult to represent the complex human view of the world within a digital system

Examples

use of prepositions to convey spatial relations is subject to complex, hidden rules
"in", "on", "between", "across", "near" convey complex meanings
e.g. we say "the car is near the house" but not "the house is near the car" - why?

e.g. "across the lake" suggests different spatial relationship than "along the lake"
e.g. in North America we live "in" a city but "on" a street

the structure of names has hidden meanings
e.g. whether the word "lake" occurs first or second in a name is determined to some extent by its size - "Lake Erie" vs. "Trout Lake" - but "Great Bear Lake" is very large

nouns can be chosen to convey spatial relations
e.g. "timber" has no spatial meaning by itself, but "stand of timber" suggests a small area occupied by trees - "forest" suggests a large area of trees

translation of prepositions from one language to another poses enormous problems
a multilingual natural language interface for a GIS would have to deal with these

Fuzziness

the spatial relationships defined by natural language are fuzzy and context-dependent
e.g. meaning of "near" an object depends on the size of the object and is imprecise
a natural language GIS interface would have to know the range of distances conveyed by "near"

G. RELEVANCE TO GIS

research in the area of spatial cognition can have several benefits for GIS development, including:

Design of better user interfaces and query languages

given the problems of determining the meaning of natural language, are natural language interfaces worth pursuing?
yes, because some applications must use natural language, e.g. GIS for the visually impaired
yes, because other forms of interface may be impractical, e.g. car navigation aids must not distract the driver''s visual attention to the road
yes, because some applications require more than one mode of interaction to maximize effectiveness, e.g. voice can be used in digitizing to augment input from cursor

Design of universal GIS systems

such systems should be compatible with cognitive models of the way we perceive and structure space
thus would avoid costly problem of transferring GIS technology between different countries and languages

New database models

understanding how spatial information is represented internally may provide novel designs for database models

permit representation is transformed from natural language into GIS database and vice versa

Improved data entry techniques

natural language is the simplest way of collecting information about the world, but difficult to formalize into precise structures in a digital environment

Expert Systems

knowledge of how spatial information is stored and processed will provide fertile input to the design of intelligent expert systems for spatial information

REFERENCES
Herskovits, A., 1987. Spatial Prepositions in English. Cambridge University Press. Interesting book on the use and meaning of spatial prepositions.

Kuipers, B., 1978. "Modeling spatial knowledge," Cognitive Science 2:129-53. One of the most influential papers on the classes of spatial knowledge.

Piaget, J. and B. Inhelder, 1967. The Child''s Conception of Space. The classic developmental theory.

Talmy, L., 1983. "How language structures space," in H. Pick and L. Acredolo, editors, Spatial Orientation: Theory, Research and Application, Plenum Press, New York. Argues that language affects the ways in which we think about spatial relationships.

EXAM AND DISCUSSION QUESTIONS

1. Summarize the arguments for believing that an understanding of processes of spatial learning and reasoning is essential if we are to design better GISs, particularly better user interfaces.

2. What would be the desirable functions and other characteristics of a portable GIS for the visually impaired?

3. A paper by Openshaw and Mounsey ("Geographic Information Systems and the BBC Domesday Interactive Videodisk," International Journal of Geographical Information Systems 1:173-180, 1987) describes the design of the BBC Domesday Project, a form of electronic atlas using optical disk technology. What features of the conventional atlas does this system implement? In what ways does it go beyond the capabilities of the conventional atlas? How might principles of human spatial learning and reasoning be combined with the capabilities of GIS to significantly improve the usefulness of the atlas concept? (Note: a number of other atlas-like digital products are available and might be used as similar bases for discussion.)

4. A simple way to illustrate the problems of spatial relations in natural language is to take a formal representation of some spatial data - e.g. a small part of a topographic map or a city street map. One person is asked to describe the contents of the map using only natural language to another person, who must then try to reconstruct the map. Both are aware of the rules governing the map''s contents, e.g. contour interval. The participants could be asked to summarize the results, including the role of non- verbal communication, e.g. facial expressions and gestures.

KNOWLEDGE BASED TECHNIQUES
INTRODUCTION
Example
Elements of knowledge based systems
Expert system "shells"

B. KNOWLEDGE ACQUISITION
Example of knowledge base constructed by experts
Examples of knowledge inferred from interaction with experts

C. KNOWLEDGE REPRESENTATIONS
Trees
Semantic networks
Frames
Production rules

D. SEARCH MECHANISMS

E. INFERENCE

F. ISSUES

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

UNIT 74 - KNOWLEDGE BASED TECHNIQUES

Compiled with assistance from David Lanter, University of California, Santa Barbara

A. INTRODUCTION

many geographical problems are ill-structured

an ill-structured problem "lacks a solution algorithm and often even clear goal achievement criteria"
goals are poorly defined
data may be incomplete, or lack sufficient spatial resolution
problem is complex - large volume of knowledge may be relevant to the problem
e.g. past experience with similar cases
e.g. precise knowledge in certain narrowly defined parts of the problem

a DSS is one response to ill-structured problems
concentrates on delivering a wide range of functions to the user, rather than one solution
leaves the user with the role of expert

knowledge based techniques are another
concentrate on making use of all available knowledge
goal is to emulate the reasoning of an expert
system takes the role of expert
the term "artificial intelligence" suggests the role of the machine in emulating the reasoning power of humans

Example

where to put a label in a polygon? (the "label placement problem") - important in designing map output from GIS
goals are poorly defined - "maximize legibility", "maximize visual impact"
cannot turn goals into simple rules

one rule might be "draw the label horizontally, centered at the centroid"
easy to turn the rule into an algorithm
rule is too simple - no good if the centroid lies outside the polygon - not clear how it affects legibility, visual impact
an expert system or knowledge based system should know when to use this rule, when not - may be many such rules

there have been many attempts to reduce the label placement problem to a set of simple rules and build these into an "expert system"
ideally, the expert system could then perform the functions of a cartographer

Elements of knowledge based systems

techniques for acquiring knowledge

ways of representing knowledge internally
computers are good at representing numbers, words, even maps, but knowledge is potentially much more difficult

search procedures for working with the internally stored knowledge

inference mechanisms for deducing solutions to problems from stored knowledge

Expert system "shells"

are software packages with functions which help the user construct special-purpose expert systems
provide a framework for organizing and representing knowledge
provide procedures for accessing knowledge in order to respond to queries or make decisions

example applications of shells:
building a system to make medical diagnoses - emulating the medical expert
building a system to emulate the cartographer''s knowledge of map projections, to pick the best projection for a particular problem

B. KNOWLEDGE ACQUISITION

how is a knowledge base constructed?

two approaches:
by asking experts to break their knowledge down into its individual facts, rules etc.
by deducing rules from the behavior of experts
both have been used in a GIS context

Example of knowledge base constructed by experts

local government agency responsible for regulating land use in vast sparsely populated area - small staff

must consider many hundreds of applications for land use permits annually, mostly from oil companies with large budgets and armies of lawyers

decisions are subject to complex system of regulations, laws, past precedents, guidelines

decisions must be defensible in court
desirable to know precise regulations, rules etc. which led to each decision

decisions must not be held to be arbitrary or capricious

basic data - vegetation, soils, wildlife, geology etc. - in GIS

knowledge base of all regulations, laws, precedents, guidelines

decisions can be generated from knowledge base

Examples of knowledge inferred from interaction with experts

Knowledge Based GIS (KBGIS) developed by Smith and others

system can reduce query time by anticipating queries
e.g. certain overlay operations can be done in advance if the results will be needed frequently, redone when updates occur
e.g. certain topological relationships might be computed in advance and stored

KBGIS analyzes queries received to "learn" about the pattern of queries and organize its database to optimize response
examines whether retrieving a stored fact takes longer than deducing it from other facts
if deducing it takes longer, the fact will be stored the first time it is deduced - subsequently it will be retrieved rather than deduced

systems such as KBGIS learn about important spatial facts through the user''s interaction with the system

C. KNOWLEDGE REPRESENTATIONS

data structures in which knowledge can be stored

more general than conventional databases

four general methods for representing knowledge - trees, semantic networks, frames, production rules

Trees

way of organizing objects that are related in a hierarchical fashion

tree structures are common in geographical data
e.g. quadtrees and octrees
e.g. hierarchical nesting of census reporting zones

Semantic networks

knowledge is organized as a set of nodes connected by labeled links
an algorithm can follow the links
e.g. topological data structures for road and river networks, boundaries of polygons (arcs)

the GIS operations required to build an information product from input data layers can be visualized as a network of nodes and links
the links are GIS processes or functions, the nodes are datasets
this is a useful way of tracking the propagation of error through processes (links)

new datasets (nodes) inherit the inaccuracies of their predecessor datasets

Frames

usually consist of the name of a phenomenon and the attributes that describe it
attributes are called "slots"

increasing availability of frame based expert system shells

Production rules

consist of two parts - situation part and action part
if situation exists, do the action
by convention left side is situation, right side is action

most popular knowledge representation in geographical applications

of the four areas of GIS - input, output, analysis and storage - output is most fully explored
production rules used in output for label placement, assignment of class intervals to choropleth maps, choice of projection
production rules for GIS analysis used in planning and resource management
production rules for GIS input center on scanning - rules for interpreting the image seen by the scanner, and vectorizing the image to create objects

D. SEARCH MECHANISMS

need a procedure for accessing knowledge
"brute force" procedures test all knowledge contained in the database to obtain the best answer - only practical for small knowledge bases and simple problems
"heuristic" search procedures use rules designed to obtain the best answer or one close to it while minimizing search time

each knowledge representation has associated search mechanisms
rules for searching trees dictate the branch to be taken at each fork
semantic networks are searched by examining the links at each node
frames - search for relevant frames, then relevant slots
for production rules, look for matching conditions on the left side of each rule

KNOWLEDGE BASED TECHNIQUES
E. INFERENCE

is the creation of new knowledge
the solution to any problem is new knowledge which can be stored in the system
a knowledge base can continue to grow as more knowledge is inferred from the existing base

e.g. a GIS can create new knowledge by computing topological relationships between objects from their geometrical relationships

deductive inference:
creates new knowledge from existing facts through logical implication, e.g. using production rules
e.g. if A=B and B=C, then the system can deduce that A=C

inductive inference:
produces new generalizations ("laws") which are consistent with existing facts
e.g. if the database contains the knowledge that area A is woodland and area B is woodland, and no information on any other area, the system might infer that all areas are woodland

F. ISSUES

knowledge based systems have been only moderately successful in areas where problems are relatively straightforward
e.g. medical diagnosis

several factors may impede greater use:
high cost of developing system - building the knowledge base
uniqueness of every application
dynamic nature of knowledge - knowledge base is not static
inadequacy of alternatives for knowledge representation - few examples fit precisely within any one form, e.g. production rules
unwillingness to trust the decisions of a machine (no "bedside manner")
response time deteriorates rapidly as knowledge base grows
most knowledge is "fuzzy" or uncertain - system must return many possible answers to a problem - few problems have a precise, single answer - technical difficulties of representing and processing fuzzy knowledge
poor design of user interface - not "user friendly"
user often wants the reasoning behind a decision, not just the decision itself

some of the most successful applications have been for instruction
e.g. use of medical expert system to develop diagnostic skills - encouraging students to structure knowledge and process it systematically in response to a problem

as precise, analytical models of knowledge and the ways in which it is used, expert systems can enhance our understanding of human decision-making processes - e.g. how does a cartographer position labels on a map?

REFERENCES
Texts on artificial intelligence and expert systems:

Luger, G.F. and W.A. Stubblefield, 1989. Artificial Intelligence and the Design of Expert Systems, Benjamin/Cummings Publishing Co, Redwood City, CA.

Tanimoto, S.L., 1987. The Elements of Artificial Intelligence, Computer Science Press, Rockville, MD.

Winston, P.H., 1980. Artificial Intelligence, Addison-Wesley, Reading, MA.

KBGIS:

Smith, T.R. and M. Pazner, 1984. "Knowledge-based control of search and learning in a large-scale GIS," Proceedings, International Symposium on Spatial Data Handling, Zurich, 2:498-519.

Smith, T.R. et al., 1987. "KBGIS-II: a knowledge-based geographical information system," International Journal of Geographical Information Systems 1:149-72.

Other:

Freeman, H. and J. Ahn, 1984. "AutoNAP an expert system to automate map name placement," Proceedings, International Symposium on Spatial Data Handling, Zurich, pp 556-571. Design of an expert system for polygon label placement.

Imhof, E., 1975. "Positioning names on maps," The American Cartographer 2. An analysis of rules for label positioning.

Kubo, S., 1986. "The basic scheme of TRINITY: a GIS with intelligence," Proceedings, Second International Symposium on Spatial Data Handling, Seattle. International Geographical Union, Commission on Geographical Data Sensing and Processing, Williamsville, NY, 363-74.

Walker, P.A. and D.M. Moore, 1988. "SIMPLE: an inductive modelling and mapping system for spatially oriented data," International Journal of Geographical Information Systems 2:347-63.

EXAM AND DISCUSSION QUESTIONS

1. Compare the use of knowledge bases and inference in Smith''s KBGIS, Kubo''s TRINITY and Walker and Moore''s SIMPLE. What general principles of knowledge based systems do they each exploit? Which application do you consider the most successful?

2. Artificial intelligence has often been called the study of a set of unsolved problems. However, once an algorithm has been devised to solve a given problem, it becomes simply a solved problem, no longer meriting the mystique associated with the term "artificial intelligence". Do you agree?

3. What areas of GIS - applications, input techniques, processes etc. - do you consider most suitable for development of expert systems?

4. Discuss the differences between spatial decision support systems and knowledge based systems as alternative approaches to solving poorly structured problems.

THE FUTURE OF GIS
INTRODUCTION

GIS originated in the mid 1960s
a continuous history since then

nevertheless, many see GIS as a phenomenon of the late 1980s
major growth phase began in early 1980s due to combined effects of developments in software, cost- effectiveness of hardware

expansion in the late 1980s has been fuelled by:
continuing advances in computing technology
increasing availability of major digital datasets, e.g. TIGER
new application areas, e.g. political districting
coalescence of existing application areas, e.g. specific CAD applications, AM/FM, automated mapping, spatial analysis

how long can growth continue?
will GIS interests continue to converge, or will splits develop?
will GIS software converge on a standard, mature product or diverge into specialized markets?
will the term "GIS" eventually disappear, or will associated symbols of maturity emerge - university programs, textbooks, magazines
what will GIS look like in 10 years? 20 years?

this unit has 3 parts:
historical analogy to history of remote sensing
discussion of convergence/divergence issues
prospects for the future

B. THE REMOTE SENSING ANALOGY
Remote sensing as precursor to GIS

major efforts began in late 1960s
origins of GIS and remote sensing at similar point in time

remote sensing well funded
strong incentive to develop peaceful uses of space technology

potential value of a tool for gathering geographical data quickly and cheaply
remote sensing systems widely installed in universities, research organizations by late 1970s

growth of remote sensing in 1960s and 1970s vastly outpaced growth in GIS
GIS virtually unknown until early 1980s

GIS often seen as add-on to remote sensing systems
potential for sophisticated modeling and analysis
ability to merge ancillary information to improve accuracy of classification of images

three major lessons can be learned from remote sensing analogy:

Need for formal theory

danger that GIS will suffer in the same way as remote sensing from lack of formal theory underpinning use

much work in remote sensing has been purely empirical, limited to specific times and places
impossible to generalize many results to other places or times

much work is on a project basis
little addition to general pool of knowledge

strong theoretical framework would be basis for greater generality

difficult to generalize results from one satellite/sensor to another
much basic work must be repeated for every new satellite/sensor

effects of scale are poorly understood
results in unintentional "ecological fallacy" - falsely imputing results from one scale of analysis to another e.g. in US plains states, correlation may exist between % of area covered by structures and % tree cover at spatial resolutions down to approx. 200 m, but not below - trees and buildings do not generally occupy the same locations

analysis of remote sensing data has not benefited from clear understanding of spatial effects
e.g. effects of spatial dependence on statistical significance - frequently lead to overstating true significance
many analyses treat each pixel as an independent observation, ignore spatial context

the level of theoretical development in these areas is much higher in 1980s
possibility that GIS can avoid some of these mistakes
however GIS designers operating in the commercial sector are often not aware of problems, available theory

will require close liaison between basic research and GIS design

Excessive expectations

early promise of remote sensing was high
e.g. possibility of remote monitoring of agricultural production, forest harvesting

in practice, numerous problems degrade accuracy of classification
seasonal, diurnal changes in spectral response
effects of moisture

continuing need for basic research
few examples of production applications - i.e. where a standard product can be developed using a standard processing method

post-war Western society has been fascinated with technological solutions to problems
remote sensing and GIS are particularly attractive, combining high technology with color graphics

difficulty of defining adequate cost/benefit measures

at the same time, technological change can be opposed by unconvincibles, confirmed nay-sayers, Luddites
technological innovation can produce strong emotions on both sides which confuse rational arguments

Potential for new paradigms

many have expected remote sensing to produce fundamental changes in the ways people think about geographical information

however, even today the magnitudes of its future effects on affected natural sciences is not clear
much research still remains to be done

minimal position: after 17 years of Landsat, remote sensing is here to stay and cannot be ignored

maximal position:
remote sensing is significant factor in emergence of Global Science, major technology of Global Monitoring
view from space has played major role in encouraging view of planet as an integrated system

situation in GIS has similarities:
just as remote sensing led to global view, GIS can lead to integrated view - need to integrate many layers of spatial information - need to couple human and physical systems e.g. need to couple human occupation, settlement processes with effects on deforestation and CO2 increase

Technical advances

both GIS and remote sensing have benefited from developments in:
workstation power - PCs, file servers, mass storage
availability of data, software through networking

many vendors now offer the capability of integrating both technologies in the same workstation

much research and development in remote sensing occurred in government laboratories - NASA, etc. - funded by government
NASA also major source of funds for university R&D

GIS context is very different
level of public funding of GIS R&D has never been high
GIS R&D has been funded by vendors, driven by strong market forces
market forces are not necessarily consistent with needs of scientific research

C. CONVERGENCE OR DIVERGENCE?

GIS is a loose collection of interests
how strong are the linkages between the subcultures of GIS (units 51-56)?
are they strong enough for continued convergence?

several views of possible divergences in GIS:

GIS subcultures

each of the groups identified in units 51-56 has its own tribal customs, ways of thinking

the ties which currently bind the subcultures - e.g. allow AM/FM people to talk a common language with forest managers - may weaken
current "glue" is common technology, terminology

Marketplace

is specialization emerging in the fast-moving PC GIS market?

possible classification of current products:
desktop mapping - produce simple thematic maps from input data
spatial analysis systems - emphasize ability to overlay, combine layers, build buffers
database systems - combine databases with limited geographical functions, e.g. display, data input
geographical spreadsheets - generalize the concept of spreadsheets by adding geographical functions, e.g. ability to merge two adjacent areas or two rows of a spreadsheet into one area or one row, for e.g. political districting applications
query systems - provide access to e.g. TIGER files, limited ability for geocoding, querying, finding optimum routes for vehicles
image processing systems - built to process remotely sensed imagery, now with added GIS functions for data integration

are there submarkets within GIS?

resource management applications need high functionality
AM/FM applications need high data volumes, access speeds
vendors will pursue the most lucrative submarket

two alternative strategies for vendors:
build a product to satisfy a common denominator market - product can then withstand shifts in the market
adapt to the most lucrative submarket - long-term survival requires new adaptation with every shift in the market

What does convergence require?

institutions and symbols to provide focus
e.g. programs, departments, societies, journals, magazines, books, conferences

education and training
to raise awareness of GIS technology and its applications

a market strong enough to support continued vendor R&D, or its replacement by government R&D

technology which can simultaneously deliver the requirements of each submarket
e.g. must be possible to deliver high functionality required by one submarket without detracting from high access speed required by another submarket
in an operating system context this is the idea of "tuning" - one common operating system can satisfy many specialized computing environments

D. PROSPECTS FOR THE FUTURE

several different "visions" for GIS

Automated geography

e.g. see Dobson (1983)

almost all forms of use of geographical data can now be automated
maps and atlases can be queried
geographical information can be analyzed, used in models

we can use digital spatial data for specific purposes or to develop general theories

geographical information becomes much more powerful in a digital environment, e.g.
overlay and integration
measurement and simple map analysis
seamless browse

some have even envisioned "the death of cartography" - the "paperless map library" - along similar lines to the "paperless office"
Don Cooke (Geographic Data Technologies, Inc) sees three stages in this process: 1. automating the cartographic process
the objective is still to produce maps 2. the map as database
the digital database becomes the archive, with the map as the major product 3. using the map database
recognizing the far greater potential of data in digital form - new products, models, analysis - with the map playing a minor role as one form of hard copy display

However:

geographical information is used infrequently compared to text or numerical information
people use maps only in certain limited contexts
effective use of spatial information requires much higher levels of training than e.g. word processing
e.g. the DIDS system - developed within the Executive Office of the President to display geographical information for decision-making - was discontinued in 1983 because of inadequate use

but the potential of automated geography may lead to much greater levels of use - people might use geographical data more frequently if they had better access to it, and if it was easier to use

Spatial information science

GIS and its allied fields, e.g. remote sensing, add up to the makings of a science of spatial information, which would include:
data collection - e.g. remote sensing, surveying, photogrammetry - data compilation - classification, interpretation, cartography
data models - data structures, theories of spatial information
data display - cartography, computer graphics
navigation, spatial information query and access
spatial analysis and modeling

spatial information is sufficiently distinct, theory and problems are sufficiently basic and difficult to justify unique identity, status of minor discipline or subdiscipline

Spatial processes

space provides a framework within which to organize objects
frame is useful for accessing records, e.g. by street address
frame is useful for accounting, e.g. totals by county
frame is basis for relating objects, e.g. by proximity, adjacency, connectedness

what role does space have as a source of explanation and understanding?
spatial coincidence or proximity may suggest explanation, e.g. coincidence of cancer cluster and asbestos mining operation

spatial proximity may be basis for prediction, e.g. more customers will go to closer store
spatial accounting is used as basis for much analysis, e.g. county-to-county variations in employment, health statistics
many processes operate in spatial frames, e.g. atmospheric, ocean dynamics
measures of space are variables in many processes, e.g. measures of territory in ecology, measures of market area in retailing

significance of GIS as a scientific tool - its value in explaining, understanding the world around us - depends on significance of spatial processes

REFERENCES
Dobson, J.E., 1983. "Automated geography," Professional Geographer 35:135-43. Pages 339-53 of the same volume include extensive discussion of Dobson''s article.

Everett, J.E. and D.S. Simonett, 1976. "Principles, concepts and philosophical problems in remote sensing," in J. Lintz and D.S. Simonett, Editors, Remote Sensing of Environment, Addison-Wesley, Reading, MA, pp 85-127. A review of remote sensing from the mid-1970s with striking parallels with current debates within GIS.

WHAT IS GIS?
A. INTRODUCTION

Objectives of this unit

to examine various definitions of GIS - what factors uniquely differentiate it from other forms of automatic geographical data handling?
to determine origins of the field - how does GIS relate to other fields such as statistical analysis, remote sensing, computer cartography?
to give a brief overview of the relevant application areas
What is a GIS?

a particular form of Information System applied to geographical data
a System is a group of connected entities and activities which interact for a common purpose
a car is a system in which all the components operate together to provide transportation
an Information System is a set of processes, executed on raw data, to produce information which will be useful in decision-making
a chain of steps leads from observation and collection of data through analysis
an information system must have a full range of functions to achieve its purpose, including observation, measurement, description, explanation, forecasting, decision-making
a Geographic Information System uses geographically referenced data as well as non-spatial data and includes operations which support spatial analysis
in GIS, the common purpose is decision-making, for managing use of land, resources, transportation, retailing, oceans or any spatially distributed entities
the connection between the elements of the system is geography, e.g. location, proximity, spatial distribution
in this context GIS can be seen as a system of hardware, software and procedures designed to support the capture,

management, manipulation, analysis, modeling and display of spatially-referenced data for solving complex planning and management problems

although many other computer programs can use spatial data (e.g. AutoCAD and statistics packages), GISs include the additional ability to perform spatial operations
Alternative names

alternative names which people have used over the years illustrate the range of applications and emphasis
Why is GIS important?

"GIS technology is to geographical analysis what the microscope, the telescope, and computers have been to other sciences.... (It) could therefore be the catalyst needed to dissolve the regional-systematic and human- physical dichotomies that have long plagued geography" and other disciplines which use spatial information.1
GIS integrates spatial and other kinds of information within a single system - it offers a consistent framework for analyzing geographical data
by putting maps and other kinds of spatial information into digital form, GIS allows us to manipulate and display geographical knowledge in new and exciting ways
GIS makes connections between activities based on geographic proximity
looking at data geographically can often suggest new insights, explanations
these connections are often unrecognized without GIS, but can be vital to understanding and managing activities and resources
e.g. we can link toxic waste records with school locations through geographic proximity
GIS allows access to administrative records - property ownership, tax files, utility cables and pipes - via their geographical positions
Why is GIS so hot?

high level of interest in new developments in computing

____________________ 1Abler, R.F., 1988. "Awards, rewards and excellence: keeping geography alive and well," Professional Geographer, 40:135-40.

GIS gives a "high tech" feel to geographic information
maps are fascinating and so are maps in computers
there is increasing interest in geography and geographic education
GIS is an important tool in understanding and managing the environment
Market value of GIS

Fortune Magazine, April 24, 1989 published a major, general-interest article on the significance of GIS to business:
GIS is described as a geographical equivalent of a spreadsheet, i.e. allows answers to "what if" questions with spatial dimensions
an example of the value of GIS given in the article is the Potlatch Corporation, Idaho
controls 600,000 ac of timberland in Idaho - 4,900 separate timber stands
old method of inventory using hand-drawn maps meant that inventory was "hopelessly out of date"
$180,000/year now being spent on GIS-based inventory "a bargain"
GIS "gives Potlatch up-to-the-minute information on the status of timber.... A forest manager sitting at a terminal can check land ownership changes in a few minutes by zooming in on a map"
$650,000 on hardware and software produces more than 27% annual return on investment
GIS market
Dataquest projected a market of $288 million in 1988, $590 million in 1992 for GIS, growing at 35% per year
ESRI of Redlands, CA, developers of ARC/INFO, had 350 employees and sales of $40 million in 1988 and a reported 42% increase in sales in 1989
Intergraph had 1988 sales of $800 million in a more diverse but GIS-dominated market
the 1989 edition of GIS Sourcebook listed over 60 different "GIS" programs (though not all of these have complete GIS functionality) and over 100 GIS consultants (US)
B. CONTRIBUTING DISCIPLINES AND TECHNOLOGIES

GIS is a convergence of technological fields and traditional disciplines
GIS has been called an "enabling technology" because of the potential it offers for the wide variety of disciplines which must deal with spatial data
each related field provides some of the techniques which make up GIS
many of these related fields emphasize data collection - GIS brings them together by emphasizing integration, modeling and analysis
as the integrating field, GIS often claims to be the science of spatial information
Geography

broadly concerned with understanding the world and man''s place in it
long tradition in spatial analysis
provides techniques for conducting spatial analysis and a spatial perspective on research
Cartography

concerned with the display of spatial information
currently the main source of input data for GIS is maps
provides long tradition in the design of maps which is an important form of output from GIS
computer cartography (also called "digital cartography", "automated cartography") provides methods for digital

representation and manipulation of cartographic features and methods of visualization

Remote Sensing

images from space and the air are major source of geographical data
remote sensing includes techniques for data acquisition and processing anywhere on the globe at low cost, consistent update potential
many image analysis systems contain sophisticated analytical functions
interpreted data from a remote sensing system can be merged with other data layers in a GIS
Photogrammetry

using aerial photographs and techniques for making accurate measurements from them, photogrammetry is the source of most data on topography (ground surface elevations) used for input to GIS
Surveying

provides high quality data on positions of land boundaries, buildings, etc.
Geodesy

source of high accuracy positional control for GIS
Statistics

many models built using GIS are statistical in nature, many statistical techniques used for analysis
statistics is important in understanding issues of error and uncertainty in GIS data
Operations Research

many applications of GIS require use of optimizing techniques for decision-making
Computer Science

computer-aided design (CAD) provides software, techniques for data input, display and visualization, representation, particularly in 3 dimensions
advances in computer graphics provide hardware, software for handling and displaying graphic objects, techniques of visualization
database management systems (DBMS) contribute methods for representing data in digital form, procedures for system design and handling large volumes of data, particularly access and update
artificial intelligence (AI) uses the computer to make choices based on available data in a way that is seen to emulate human intelligence and decision-making - computer can act as an "expert" in such functions as designing maps, generalizing map features
although GIS has yet to take full advantage of AI, AI already provides methods and techniques for system design
Mathematics

several branches of mathematics, especially geometry and graph theory, are used in GIS system design and analysis of spatial data
Civil Engineering

GIS has many applications in transportation, urban engineering
C. MAJOR AREAS OF PRACTICAL APPLICATION
Street network-based

address matching - finding locations given street addresses
vehicle routing and scheduling
location analysis, site selection
development of evacuation plans
Natural resource-based

management of wild and scenic rivers, recreation resources, floodplains, wetlands, agricultural lands, aquifers, forests, wildlife
Environmental impact analysis (EIA)
viewshed analysis
hazardous or toxic facility siting
groundwater modeling and contamination tracking
wildlife habitat analysis, migration routes planning
Land parcel-based

zoning, subdivision plan review
land acquisition
environmental impact statements
water quality management
maintenance of ownership
Facilities management

locating underground pipes, cables
balancing loads in electrical networks
planning facility maintenance
tracking energy use
D. GIS AS A SET OF INTERRELATED SUBSYSTEMS
Data Processing Subsystem

data acquisition - from maps, images or field surveys
data input - data must be input from source material to the digital database
data storage - how often is it used, how should it be updated, is it confidential?
Data Analysis Subsystem

retrieval and analysis - may be simple responses to queries, or complex statistical analyses of large sets of data
information output - how to display the results? as maps or tables? Or will the information be fed into some other digital system?
Information Use Subsystem

users may be researchers, planners, managers
interaction needed between GIS group and users to plan analytical procedures and data structures
Management Subsystem

organizational role - GIS section is often organized as a separate unit within a resource management agency (cf. the Computer Center at many universities) offering spatial database and analysis services
staff - include System Manager, Database Manager, System Operator, System Analysts, Digitizer Operators - a typical resource management agency GIS center might have a staff of 5-7
procedures - extensive interaction is needed between the GIS group and the rest of the organization if the system is to function effectively

MAPS AND MAP ANALYSIS
A. INTRODUCTION

maps are the main source of data for GIS
the traditions of cartography are fundamentally important to GIS
GIS has roots in the analysis of information on maps, and overcomes many of the limitations of manual analysis
this unit is about cartography and its relationship to GIS - how does GIS differ from cartography, particularly automated cartography, which uses computers to make maps?
B. WHAT IS A MAP?

Definition

according to the International Cartographic Association, a map is:
a representation, normally to scale and on a flat medium, of a selection of material or abstract features on, or in relation to, the surface of the Earth
Maps show more than the Earth''s surface

the term "map" is often used in mathematics to convey the notion of transferring information from one form to another, just as cartographers transfer information from the surface of the Earth to a sheet of paper
the term "map" is used loosely to refer to any visual display of information, particularly if it is abstract, generalized or schematic
Cartographic abstraction

production of a map requires:
selection of the few features in the real world to include
classification of selected features into groups (i.e. bridges, churches, railways)
simplification of jagged lines like coastlines
exaggeration of features to be included that are to small to show at the scale of the map
symbolization to represent the different classes of features chosen
Types of maps

in practice we normally think of two types of map:
topographic map - a reference tool, showing the outlines of selected natural and man-made features of the Earth
often acts as a frame for other information
"Topography" refers to the shape of the surface, represented by contours and/or shading, but topographic maps also show roads and other prominent features
thematic map - a tool to communicate geographical concepts such as the distribution of population densities, climate, movement of goods, land use etc.
Thematic maps in GIS

several types of thematic map are important in GIS:
a choropleth map uses reporting zones such as counties or census tracts to show data such as average incomes, percent female, or rates of mortality
the boundaries of the zones are established independently of the data, and may be used to report many different sets of data
an area class map shows zones of constant attributes, such as vegetation, soil type, or forest species
the boundaries are different for each map as they are determined by the variation of the attribute being mapped, e.g. breaks of soil type may occur independently of breaks of vegetation
an isopleth map shows an imaginary surface by means of lines joining points of equal value, "isolines" (e.g. contours on a topographic map)
used for phenomena which vary smoothly across the map, such as temperature, pressure, rainfall or population density
Line maps versus photo maps

an important distinction for GIS is between a line map and a photo map
a line map shows features by conventional symbols or by boundaries
a photo map is derived from a photographic image taken from the air
features are interpreted by the eye as it views the map
certain features may be identified by overprinting labels
photomaps are relatively cheap to make but are rarely completely free of distortions
Characteristics of maps

maps are often stylized, generalized or abstracted, requiring careful interpretation
usually out of date
show only a static situation - one slice in time
often highly elegant/artistic
easy to use to answer certain types of questions:
how do I get there from here?
what is at this point?
difficult or time-consuming to answer other types:
what is the area of this lake?
what places can I see from this TV tower?
what does that thematic map show at the point I''m interested in on this topographic map?
The concept of scale

the scale of a map is the ratio between distances on the map and corresponding distances in the real world
if a map has a scale of 1:50,000, then 1 cm on the map equals 50,000 cm or 0.5 km on the Earth''s surface
the use of the terms "small scale" and "large scale" is often confused, so it is important to be consistent
a large scale map shows great detail, small features
representative fraction is large, e.g. 1/10,000
a small scale map shows only large features
representative fraction is small, e.g. 1/250,000
the scale controls not only how features are shown, but what features are shown
a 1:2,500 map will show individual houses and lamp posts while a 1:100,000 will not
different scales are used in different countries
in the US, 1:100,000 is the largest scale at which complete coverage of the continental states exists, but there is limited coverage at 1:62,500 and 1:24,000
in the UK, there is complete coverage at much larger scales (1:1,250 to 1:10,000)
Map projections

the Earth''s surface is curved but as it must be shown on a flat sheet, some distortion is inevitable
distortion is least for when the map only shows small areas, and greatest when a map attempts to show the entire surface of the Earth
a projection is a method by which the curved surface of the earth is represented on a flat surface
it involves the use of mathematical transformations between the location of places on the earth and their projected locations on the plane
numerous projections have been invented, and arguments continue about which is best for which purposes
projections can be identified by the distortions which they avoid - in general a projection can belong to only one of these classes:
equal area projections preserve the area of features by assigning them an area on the map which is proportional to their area on the earth - these are useful for applications which require measuring area, and are popular in GIS
conformal projections preserve the shape of small features, and show directions (bearings) correctly - they are useful for navigation
equidistant projections preserve distances to places from one or two points
C. WHAT ARE MAPS USED FOR?

traditionally, maps are used as aids to navigation, as reference documents, and as wall decorations
maps have four roles today:
Data display

maps provide useful ways of displaying information in a meaningful way
in practice, the cost of making and printing a map is high, so its contents are often a compromise between different needs
Data stores

as a means of storing data, maps can be very efficient, high density stores
a typical 1:50,000 map might have 1,000 place names on it
the distances between all possible pairs of these 1,000 places would run to (1,000 x 999 / 2) or 499,500 numbers if stored in a table instead of scaled off the map when needed
the information printed on the typical 1:50,000 topographic map sheet in the UK requires 25 million bytes of storage when it is converted to digital form, equivalent to one standard computer tape, or 10 full-length novels
the information on all British topographic maps would require 150 gigabytes (150x109 bytes)
Spatial indexes

a map can show the boundaries of areas (e.g. land use zones, soil or rock types) and identify each area with a label
a separate manual with corresponding entries may provide greater detail about each area
Data analysis tool

maps are used in analysis to:
make or test hypotheses, such as the identification of cancer clusters
examine the relationship between two distributions using simple transparent overlays
D. THE USE OF MAPS FOR INVENTORY AND ANALYSIS

the following examples demonstrate how maps have been used for sophisticated applications in inventory and analysis, and point out some limitations
Measuring land use change

example, two major land use surveys were carried out in the UK, in the late 1930s by Sir Dudley Stamp and in the 1960s by Professor Alice Coleman
the results were published as maps
in order to compare changes in land use between 1930s and 1960s, the area of each land use type was measured using a hand planimeter and counting overlaid dots
despite interest in measuring the amount of change of land use through time, particularly from agricultural to
urban, few results were produced using this method because the traditional techniques are slow and tedious, and because of the difficulty of overlaying or working from very different map sources

Landscape architecture

Ian McHarg pioneered the use of transparent map overlays for planning locations of highways, transmission corridors and other facilities in environmentally sensitive areas (McHarg, 1969)
despite the popularity of this technique and numerous applications, this method remains cumbersome and imprecise
E. AUTOMATED AND COMPUTER-ASSISTED CARTOGRAPHY

Changeover to computer mapping

personalities were critically important in the 1960s and early 1970s - individual interests determined the direction and focus of research and development in computer cartography (see Rhind, 1988)
impetus for change began in two communities:
1. scientists wishing to make maps quickly to see the results of modeling, or to display data from large archives already in digital form, e.g. census tables

quality was not a major concern
SYMAP was the first significant package for this purpose, released by the Harvard Lab in 1967
2. cartographers seeking to reduce the cost and time of map production and editing

hardware costs limited interest in this technology prior to 1980 to the major mapping agencies
the costs of computing have dropped dramatically, by an order of magnitude every six years
what costs $1 to compute in 1989 would have cost $10 in 1983 and $100,000 in 1959
the development of the microcomputer and the launch of the IBM PC in 1983 have had enormous influence
an early belief that the entire map-making process could be automated diminished by 1975 because of difficulties of generalization and design
has resurfaced in the context of Expert Systems where the computer chooses the proper techniques based on characteristics of the data, scale, map purpose, etc.
today, far more maps are made by computer than by hand
now few mapmakers are trained cartographers
also, it is now clear that once created, digital data can serve purposes other than map-making, so it has additional value
Advantages of computer cartography

lower cost for simple maps, faster production
greater flexibility in output - easy scale or projection change - maps can be tailored to user needs
other uses for digital data
Disadvantages of computer cartography

relatively few full-scale systems have been shown to be truly cost-effective in practice, despite early promise
high capital cost, though this is now much reduced
computer methods do not ensure production of maps of high quality
there is a perceived loss of regard for the "cartographic tradition" with the consequent production of "cartojunk"
GIS and Computer Cartography

computer cartography has a primary goal of producing maps
systems have advanced tools for map layout, placement of labels, large symbol and font libraries, interfaces for expensive, high quality output devices
however, it is not an analytical tool
therefore, unlike data for GIS, cartographic data does not need to be stored in ways which allow, for example, analysis of relationships between different themes such as population density and housing prices or the routing of flows along connecting highway or river segments
F. GIS COMPARED TO MAPS

Data stores

spatial data stored in digital format in a GIS allows for rapid access for traditional as well as innovative purposes
nature of maps creates difficulties when used as sources for digital data
most GIS take no account of differences between datasets derived from maps at different scales
idiosyncrasies (e.g. generalization procedures) in maps become "locked in" to the data derived from them
such errors often become apparent only during later processing of digital data derived from them
however, maps still remain an excellent way of compiling spatial information, e.g. field survey
maps can be designed to be easy to convert to digital form, e.g. by the use of different colors which have distinct signatures when scanned by electronic sensors
as well maps can be produced by GISs as cheap, high density stores of information for the end user
however, consistent, accurate retrieval of data from maps is difficult
only limited amounts of data can be shown due to constraints of the paper medium
Data indexes

this function can be performed much better by a good GIS due to the ability to provide multiple and efficient cross-referencing and searching
Data analysis tools

GIS is a powerful tool for map analysis
traditional impediments to the accurate and rapid measurement of area or to map overlay no longer exist
many new techniques in spatial analysis are becoming available
Data display tools

electronic display offers significant advantages over the paper map
ability to browse across an area without interruption by map sheet boundaries
ability to zoom and change scale freely
potential for the animation of time dependent data
display in "3 dimensions" (perspective views), with "real-time" rotation of viewing angle
potential for continuous scales of intensity and the use of color and shading independent of the constraints of the printing process, ability to change colors as required for interpretation
one of a kind, special purpose products are possible and inexpensive

THE RASTER GIS
A. THE DATA MODEL
B. CREATING A RASTER
Cell by cell entry
Digital data
C. CELL VALUES
Types of values
One value per cell
D. MAP LAYERS
Resolution
Orientation
Zones
Value
Location
E. EXAMPLE ANALYSIS USING A RASTER GIS
Objective
Procedure
Result
Operations used
REFERENCES
EXAM AND DISCUSSION QUESTIONS
NOTES
Although most of the material in this Curriculum is designed to be as independent as possible from specific data models, it is necessary to deal with this basic concept early so that students can start hands-on exercises with a GIS program. Following Unit 5, we return to the more fundamental concepts and do not address specific vector GIS issues until Units 13 and 14. There are other several places these topics could be placed in a course sequence. We have tried to make Units 4 and 5 as independent as possible so that you can move them within the Curriculum relatively easily.

UNIT 4 - THE RASTER GIS

Compiled with assistance from Dana Tomlin, The Ohio State University

A. THE DATA MODEL

geographical variation in the real world is infinitely complex
the closer you look, the more detail you see, almost without limit
it would take an infinitely large database to capture the real world precisely
data must somehow be reduced to a finite and manageable quantity by a process of generalization or abstraction
geographical variation must be represented in terms of discrete elements or objects
the rules used to convert real geographical variation into discrete objects is the data model
Tsichritzis and Lochovsky (1977) define a data model as "a set of guidelines for the representation of the logical organization of the data in a database... (consisting) of named logical units of data and the relationships between them."1
current GISs differ according the way in which they organize reality through the data model
each model tends to fit certain types of data and applications better than others
the data model chosen for a particular project or application is also influenced by:
the software available
the training of the key individuals
historical precedent
there are two major choices of data model - raster and vector
raster model divides the entire study area into a regular grid of cells in specific sequence
the conventional sequence is row by row from the top left corner
each cell contains a single value ____________________ 1Tsichritzis, T.C., and F.H. Lochovsky, 1977. Data Base Management Systems, Academic Press, New York.
is space-filling since every location in the study area corresponds to a cell in the raster
one set of cells and associated values is a layer
there may be many layers in a database, e.g. soil type, elevation, land use, land cover
vector model uses discrete line segments or points to identify locations
discrete objects (boundaries, streams, cities) are formed by connecting line segments
vector objects do not necessarily fill space, not all locations in space need to be referenced in the model
a raster model tells what occurs everywhere - at each place in the area
a vector model tells where everything occurs - gives a location to every object
conceptually, the raster models are the simplest of the available data models
therefore, we begin our examination of GIS data and operations with the raster model and will consider vector models after the fundamental concepts have been introduced.
B. CREATING A RASTER

consider laying a grid over a geologic map
create a raster by coding each cell with a value that represents the rock type which appears in the majority of that cells areas
when finished, every cell will have a coded value
in most cases the values that are to be assigned to each cell in the raster are written into a file, often coded in ASCII
this file can be created manually by using a word processor, database or spreadsheet program or it can be created automatically
then it is normally imported into the GIS so that the program can reformat the data for its specific processing needs
there are several methods for creating raster databases
Cell by cell entry

direct entry of each layer cell by cell is simplest
entry may be done within the GIS or into an ASCII file for importing
each program will have specific requirements
the process is normally tedious and time-consuming
layer can contain millions of cells
average Landsat image is around 7.4 x 106 pixels, average TM scene is about 34.9 x 106 pixels
run length encoding can be more efficient
values often occur in runs across several cells
this is a form of spatial autocorrelation - tendency for nearby things to be more similar than distant things
data entered as pairs, first run length, then value
e.g. the array 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 0 1 1 1 1 would be entered as 3 0 2 1 2 0 3 1 2 0 3 1 1 0 4 1
this is 16 items to enter, instead of 20
in this case the saving is 20%, but much higher savings occur in practice
imagine a database of 10,000,000 cells and a layer which records the county containing each pixel
suppose there are only two counties in the area covered by the database
each cell can have one of only two values so the runs will be very long
only some GISs have the capability to use run length encoded files
note: Units 35 and 36 cover run length encoding and other aspects of raster storage in more detail
Digital data

much raster data is already in digital form, as images, etc.
however, resampling will likely be needed in order that pixels coincide in each layer
because remote sensing generates images, it is easier to interface with a raster GIS than any other type
elevation data is commonly available in digital raster form from agencies such as the US Geological Survey

THE RASTER GIS
. CELL VALUES

Types of values

the type of values contained in cells in a raster depend upon both the reality being coded and the GIS
different systems allow different classes of values, including:
whole numbers (integers)
real (decimal) values
alphabetic values
many systems only allow integers, others which allow different types restrict each separate raster layer to a single kind of value
if systems allow several types of values, e.g. some layers numeric, some non-numeric, they should warn the user against doing unreasonable operations
e.g. it is unreasonable to try to multiply the values in a numeric layer with the values in a non- numeric layer
integer values often act as code numbers, which "point" to names in an associated table or legend
e.g. the first example might have the following legend identifying the name of each soil class:
0 = "no class" 1 = "fine sandy loam" 2 = "coarse sand" 3 = "gravel"

One value per cell

each pixel or cell is assumed to have only one value
this is often inaccurate - the boundary of two soil types may run across the middle of a pixel
in such cases the pixel is given the value of the largest fraction of the cell, or the value of the middle point in the cell
note, however, a few systems allow a pixel to have multiple values
the NARIS system developed at the University of Illinois in the 1970s allowed each pixel to have any number of values and associated percentages
e.g. 30% a, 30% b, 40% c
D. MAP LAYERS

the data for an area can be visualized as a set of maps of layers
a map layer is a set of data describing a single characteristic for each location within a bounded geographic area
only one item of information is available for each location within a single layer - multiple items of information require multiple layers
on the other hand, a topographic map can show multiple items of information for each location, within limits
e.g. elevation (contours), counties (boundaries), roads, railroads, urbanized areas (grey tint)
these would be 5 layers in a raster GIS
typical raster databases contain up to a hundred layers
each layer (matrix, lattice, raster, array) typically contains hundreds or thousands of cells
important characteristics of a layer are its resolution, orientation and zone(s)
Resolution

in general, resolution can be defined as the minimum linear dimension of the smallest unit of geographic space for which data are recorded
in the raster model the smallest units are generally rectangular (occasionally systems have used hexagons or triangles)
these smallest units are known as cells, pixels
note: high resolution refers to rasters with small cell dimensions
high resolution means lots of detail, lots of cells, large rasters, small cells
Orientation

the angle between true north and the direction defined by the columns of the raster
Zones

each zone of a map layer is a set of contiguous locations that exhibit the same value
these might be:
ownership parcels
political units such as counties or nations
lakes or islands
individual patches of the same soil or vegetation type
there is considerable confusion over terms here
other terms commonly used for this concept are patch, region, polygon
each of these terms, however, have different meanings to individual users and different definitions in specific GIS packages
in addition, there is a need for a second term which refers to all individual zones that have the same characteristics
class is often used for this concept
note that not all map layers will have zones, cell contents may vary continuously over the region making every cell''s value unique
e.g. satellite sensors record a separate value for reflection from each cell
major components of a zone are its value and location(s)
Value

is the item of information stored in a layer for each pixel or cell
cells in the same zone have the same value
Location

generally location is identified by an ordered pair of coordinates (row and column numbers) that unambiguously identify the location of each unit of geographic space in the raster (cell, pixel, grid cell)
usually the true geographic location of one or more of the corners of the raster is also known
E. EXAMPLE ANALYSIS USING A RASTER GIS

Objective

identify areas suitable for logging
an area is suitable if it satisfies the following criteria:
is Jackpine (Black Spruce are not valuable)
is well drained (poorly drained and waterlogged terrain cannot support equipment, logging causes unacceptable environmental damage)
is not within 500 m of a lake or watercourse (erosion may cause deterioration of water quality)
Procedure

recode layer 2 as follows, creating layer 4
y if value 2 (Jackpine)
n if other value
recode layer 3 as follows, creating layer 5
y if value 2 (good)
n if other value
spread the lake on layer 1 by one cell (500 m), creating layer 6
recode the spread lake on layer 6 as follows, creating layer 7
n if in spread lake
y if not
overlay layers 4 and 5 to obtain layer 8, coding as follows
y if both 4 and 5 are y
n otherwise
overlay layers 7 and 8 to obtain layer 9, coding as follows
y if both 7 and 8 are y
n otherwise
Result

the loggable cells are y on layer 9
Operations used

recode
overlay
spread
we could have achieved the same result using the operations in other sequences, or by combining recode and overlay operations
e.g. overlay layers 2 and 3, coding as follows
y if layer 2 is 2 and layer 3 is 2, n otherwise
this would replace two recodes and an overlay
e.g. some systems allow layers to be overlaid 3 or more at a time
the names given to operations vary from system to system, but most of the operations themselves are common across systems
REFERENCES

Star, J.L. and J.E. Estes, 1990. Geographic Information Systems: An Introduction, Prentice Hall, Englewood Cliffs, NJ. An introduction to GIS with a strong raster orientation.

Further references can be found following Unit 5.

EXAM AND DISCUSSION QUESTIONS

1. What types of geographical data fit the raster GIS data model best? What types fit worst?

2. Review the issues involved in selecting a resolution for a raster GIS project.

3. What resolutions would be appropriate for the following problems: (a) determining logging areas in a National Forest, (b) finding suitable locations for backcountry campsites, (c) planning subdivisions to take account of noise from an airport?

4. Review the methods of planning described in Ian McHarg''s classic book Design with Nature (1969, Doubleday, New York). In what ways would they (a) benefit and (b) suffer from implementation using raster GIS?

5. Using the documentation for the raster GIS program you have, determine how that program uses (a) the concept of "zone" as a contiguous group of cells of the same value, and (b) the concept of several groups of cells that all have the same value. Is there any ambiguity in the way your program deals with these two concepts?

RASTER GIS CAPABILITIES
INTRODUCTION

a raster GIS must have capabilities for:
input of data
various housekeeping functions
operations on layers, like those encountered in the previous unit - recode, overlay and spread
output of data and results
the range of possible functions is enormous, current raster GISs only scratch the surface
because the range is so large, some have tried to organize functions into a consistent scheme, but no scheme has been widely accepted yet
the unit covers a selection of the most useful and common
each raster GIS uses different names for the functions
B. DISPLAYING LAYERS

Basic display

the simplest type of values to display are integers
on a color display each integer value can be assigned a unique color
there must be as many colors as integers
if the values have a natural order we will want the sequence of colors to make sense
e.g. elevation is often shown on a map using the sequence blue-green-yellow-brown-white for increasing elevation
there should be a legend explaining the meaning of each color
the system should generate the legend automatically based on the descriptions of each value stored with the data layer
overhead - Simple display (IDRISI)

on a dot matrix printer shades of grey can be generated by varying the density of dots
if there are too many values for the number of colors, may have to recode the layer before display
Other types of display

it may be appropriate to display the data as a surface
contours can be "threaded" through the pixels along lines of constant value
the searching operation for finding contours is computer-intensive so may be slow
the surface can be shown in an oblique, perspective view
this can be done by drawing profiles across the raster with each profile offset and hidden lines removed
the surface might be colored using the values in a second layer (a second layer can be "draped" over the surface defined by the first layer)
the result can be very effective
"LA The Movie" was produced by Jet Propulsion Lab by draping a Landsat image of Los Angeles over a layer of elevations, then simulating the view from a moving aircraft
these operations are also computer-intensive because of the calculations necessary to simulate perspective and remove hidden lines
C. LOCAL OPERATIONS

produce a new layer from one or more input layers
the value of each new pixel is defined by the values of the same pixel on the input layer(s)
neighboring or distant pixels have no effect
note: arithmetic operations make no sense unless the values have appropriate scales of measurement (see Unit 6)
you cannot find the "average" of soils types 3 and 5, nor is soil 5 "greater than" soil 3
Recoding

using only one input layer
examples:
1. assign a new value to each unique value on the input layer

useful when the number of unique input values is small
2. assign new values by assigning pixels to classes or ranges based on their old values

e.g. 0-499 becomes 1, 500-999 becomes 2, >1000 becomes 3
useful when the old layer has different values in each cell, e.g. elevation or satellite images
3. sort the unique values found on the input layer and replace by the rank of the value

e.g. 0, 1, 4, 6 on input layer become 1, 2, 3, 4 respectively
applications: assigning ranks to computed scores of capability, suitability etc.
some systems allow a full range of mathematical operations
e.g. newvalue = (2*oldvalue + 3)2
Overlaying layers

an overlay occurs when the output value depends on two or more input layers
many systems restrict overlay to two input layers only
examples:
1. output value equals arithmetic average of input values

2. output value equals the greatest (or least) of the input values

3. layers can be combined using arithmetic operations

x and y are the input layers, z is the output
some examples: z = x + y z = xy z = x / y
4. combination using logical conditions

e.g. if y>0, then z = y , otherwise z = x
note: in many raster packages logical conditions cannot be done directly from input layers
must first create reclassified input images so that cells have 0 if they do not meet the condition and 1 if they do
5. assign a new value to every unique combination of input values

e.g.LAYER 1 LAYER 2 OUTPUT LAYER 1 A 1 1 B 2 2 A 3 2 B 4
D. OPERATIONS ON LOCAL NEIGHBORHOODS

the value of a pixel on the new layer is determined by the local neighborhood of the pixel on the old layer
Filtering

a filter operates by moving a "window" across the entire raster
e.g. many windows are 3x3 cells
the new value for the cell at the middle of the window is a weighted average of the values in the window
by changing the weights we can produce two major effects:
smoothing (a "low pass" filter, removes or reduces local detail)
edge enhancement (a "high pass" filter, exaggerates local detail)
weights should add to 1
example filters:
1. .11 .11 .11 .11 .11 .11 .11 .11 .11

replaces each value by the simple unweighted average of it and its eight neighboring values
severely smooths the spatial variation on the layer
2. .05 .05 .05 .05 .60 .05 .05 .05 .05

gives the pixel''s old value 12 times the weight of its neighboring values
slightly smooths the layer
3. -.1 -.1 -.1 -.1 1.8 -.1 -.1 -.1 -.1

slightly enhances local detail by giving neighbors negative weights
filters can be useful in enhancing detail on images for input to GIS, or smoothing layers to expose general trends
Slopes and aspects

if the values in a layer are elevations, we can compute the steepness of slopes by looking at the difference between a pixel''s value and those of its adjacent neighbors
the direction of steepest slope, or the direction in which the surface is locally "facing", is called its aspect
aspect can be measured in degrees from North or by compass points - N, NE, E etc.
slope and aspect are useful in analyzing vegetation patterns, computing energy balances and modeling erosion or runoff
aspect determines the direction of runoff
this can be used to sketch drainage paths for runoff
E. OPERATIONS ON EXTENDED NEIGHBORHOODS

Distance

calculate the distance of each cell from a cell or the nearest of several cells
each pixel''s value in the new layer is its distance from the given cell(s)
Buffer zones

buffers around objects and features are very useful GIS capabilities
e.g. build a logging buffer 500 m wide around all lakes and watercourses
buffer operations can be visualized as spreading the object spatially by a given distance
the result could be a layer with values: 1 if in original selected object 2 if in buffer 0 if outside object and buffer
applications include noise buffers around roads, safety buffers around hazardous facilities
in many programs the buffer operation requires the user to first do a distance operation, then a reclassification of the distance layer
the rate of spreading may be modified by another layer representing "friction"
e.g. the friction layer could represent varying cost of travel
this will affect the width of the buffer - narrow in areas of high friction, etc.
Visible area or "viewshed"

given a layer of elevations, and one or more viewpoints, compute the area visible from at least one viewpoint
e.g. value = 1 if visible, 0 if not
useful for planning locations of unsightly facilities such as smokestacks, or surveillance facilities such as fire towers, or transmission facilities
F. OPERATIONS ON ZONES (GROUPS OF PIXELS)

Identifying zones

by comparing adjacent pixels, identify all patches or zones having the same value
give each such patch or zone a unique number
set each pixel''s value to the number of its patch or zone
Areas of zones

measure the area of each zone and assign this value to each pixel instead of the zone''s number
alternatively output may be in the form of a summary table sent to the printer or a file
Perimeter of zones

measure the perimeter of each zone and assign this value to each pixel instead of the zone''s number
alternatively output may be in the form of a summary table sent to the printer or a file
length of perimeter is determined by summing the number of exterior cell edges in each zone
note: the values calculated in both area and perimeter are highly dependent upon the orientation of objects (zones) with respect to the orientation of the grid
however, if boundaries in the study area do not have a dominant orientation such errors may cancel out
Distance from zone boundary

measure the distance from each pixel to the nearest part of its zone boundary, and assign this value to the pixel
boundary is defined as the pixels which are adjacent to pixels of different values
Shape of zone

measure the shape of the zone and assign this to each pixel in the zone
one of the most common ways to measure shape is by comparing the perimeter length of a zone to the square root of its area
by dividing this number by 3.54 we get a measure which ranges from 1 for a circle (the most compact shape possible) to 1.13 for a square to large numbers for long, thin, wiggly zones
commands like this are important in landscape ecology
helpful in studying the effects of geometry and spatial arrangement of habitat
e.g. size and shape of woodlots on the animal species they can sustain
e.g. value of linear park corridors across urban areas in allowing migration of animal species
G. COMMANDS TO DESCRIBE CONTENTS OF LAYERS

it is important to have ways of describing a layer''s contents
particularly new layers created by GIS operations
particularly in generating results of analysis
One layer

generate statistics on a layer
e.g. mean, median, most common value, other statistics
More than one layer

compare two maps statistically
e.g. is pattern on one map related to pattern on the other?
e.g. chi-square test, regression, analysis of variance
Zones on one layer

generate statistics for the zones on a layer
e.g. largest, smallest, number, mean area
H. ESSENTIAL HOUSEKEEPING

list available layers
input, copy, rename layers
import and export layers to and from other systems
other raster GIS
input of images from remote sensing system
other types of GIS
identify resolution, orientation
"resample"
changing cell size, orientation, portion of raster to analyze
change colors
provide help to the user
exit from the GIS (the most important command of all!)

SAMPLING THE WORLD
A. INTRODUCTION

B. REPRESENTING REALITY
Continuous variation

C. SPATIAL DATA
Location
Attributes
Time

D. SAMPLING REALITY
Scales of measurement
1. Nominal
2. Ordinal
3. Interval
4. Ratio
Multiple representations

E. DATA SOURCES
Primary data collection
Secondary data sources

F. STANDARDS
Sharing data
Agency standards

G. ERRORS AND ACCURACY
Original Sin - errors in sources
Boundaries
Classification errors
Data capture errors
Accuracy standards

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

This unit begins the section on data acquisition by looking at how the infinite complexity of the real world can be discretized and sampled.

UNIT 6 - SAMPLING THE WORLD

Compiled with assistance from Charles Parson, Bemidji State

University and Timothy Nyerges, University of Washington

A. INTRODUCTION

the world is infinitely complex
the contents of a spatial database represent a particular view of the world
the user sees the real world through the medium of the database
the measurements and samples contained in the database must present as complete and accurate a view of the world as possible
the contents of the database must be relevant in terms of:
themes and characteristics captured
the time period covered
the study area

this unit looks at techniques for sampling the world, and associated issues of accuracy, standards

B. REPRESENTING REALITY

a database consists of digital representations of discrete objects
the features shown on a map, e.g. lakes, benchmarks, contours can be thought of as discrete objects
thus the contents of a map can be captured in a database by turning map features into database objects

many of the features shown on a map are fictitious and do not exist in the real world
contours do not really exist, but houses and lakes are real objects

the contents of a spatial database include:
digital versions of real objects, e.g. houses
digital versions of artificial map features, e.g. contours
artificial objects created for the purposes of the database, e.g. pixels

Continuous variation

some characteristics exist everywhere and vary continuously over the earth''s surface
e.g. elevation, atmospheric temperature and pressure, natural vegetation or soil type

we can represent such variation in several ways:
by taking measurements at sample points, e.g. weather stations
by taking transects
by dividing the area into patches or zones, and assuming the variable is constant within each zone, e.g. soil mapping
by drawing contours, e.g. topographic mapping

each of these methods creates discrete objects
the objects in each case are points, lines or areas

a raster can be thought of as:
a special case of a point sample where the points are regularly spaced
a special case of zones where the zones are all the same size

each method is approximate, capturing only part of the real variation
a point sample misses variation between points
transects miss variation not on transects
zones pretend that variation is sudden at boundaries, and that there is no variation within zones
contours miss variation not located on contours

several methods can be used to try to improve the success of each method
e.g. for zones:
map the boundaries as fuzzy instead of sharp lines
describe the zones as mixtures instead of as single classes, e.g. 70% soil type A, 30% soil type B

C. SPATIAL DATA

phenomena in the real world can be observed in three modes: spatial, temporal and thematic
the spatial mode deals with variation from place to place
the temporal mode deals with variation from time to time (one slice to another)
the thematic mode deals with variation from one characteristic to another (one layer to another)

all measurable or describable properties of the world can be considered to fall into one of these modes - place, time and theme
an exhaustive description of all three modes is not possible
when observing real-world phenomena we usually hold one mode fixed, vary one in a controlled" manner, and measure"the third (Sinton, 1978)
e.g. using a census of population we could fix a time such as 1990, control for location using census tracts and measure a theme such as the percentage of persons owning automobiles

holding geography fixed and varying time gives longitudinal data
holding time fixed and varying geography gives cross- sectional data
the modes of information stored in a database influence the types of problem solving that can be accomplished

Location

the spatial mode of information is generally called location

Attributes

attributes capture the thematic mode by defining different characteristics of objects
a table showing the attributes of objects is called an attribute table
each object corresponds to a row of the table
each characteristic or theme corresponds to a column of the table
thus the table shows the thematic and some of the spatial modes

Time

the temporal mode can be captured in several ways
by specifying the interval of time over which an object exists
by capturing information at certain points in time
by specifying the rates of movement of objects

depending on how the temporal mode is captured, it may be included in a single attribute table, or be represented by series of attribute tables on the same objects through time

SAMPLING THE WORLD
D. SAMPLING REALITY

Scales of measurement

numerical values may be defined with respect to nominal, ordinal, interval, or ratio scales of measurement
it is important to recognize the scales of measurement used in GIS data as this determines the kinds of mathematical operations that can be performed on the data
the different scales can be demonstrated using an example of a marathon race:

1. Nominal

on a nominal scale, numbers merely establish identity
e.g. a phone number signifies only the unique identity of the phone

in the race, the numbers issued to racers which are used to identify individuals are on a nominal scale
these identity numbers do not indicate any order or relative value in terms of the race outcome

2. Ordinal

on an ordinal scale, numbers establish order only
phone number 9618224 is not more of anything than 9618049, so phone numbers are not ordinal

in the race, the finishing places of each racer, i.e. 1st place, 2nd place, 3rd place, are measured on an ordinal scale
however, we do not know how much time difference there is between each racer

3. Interval

on interval scales, the difference (interval) between numbers is meaningful, but the numbering scale does not start at 0
subtraction makes sense but division does not
e.g. it makes sense to say that 200C is 10 degrees warmer than 100C, so Celsius temperature is an interval scale, but 200C is not twice as warm as 100C
e.g. it makes no sense to say that the phone number 9680244 is 62195 more than 9618049, so phone numbers are not measurements on an interval scale

in the race, the time of the day that each racer finished is measured on an interval scale
if the racers finished at 9:10 GMT, 9:20 GMT and 9:25 GMT, then racer one finished 10 minutes before racer 2 and the difference between racers 1 and 2 is twice that of the difference between racers 2 and 3
however, the racer finishing at 9:10 GMT did not finish twice as fast as the racer finishing at 18:20 GMT

4. Ratio

on a ratio scale, measurement has an absolute zero and the difference between numbers is significant
division makes sense
e.g. it makes sense to say that a 50 kg person weighs half as much as a 100 kg person, so weight in kg is on a ratio scale
the zero point of weight is absolute but the zero point of the Celsius scale is not

in our race, the first place finisher finished in a time of 2:30, the second in 2:40 and the 450th place finisher took 5 hours
the 450th finisher took twice as long as the first place finisher (5/2.5 = 2)

note these distinctions, though important, are not always clearly defined
is elevation interval or ratio? if the local base level is 750 feet, is a mountain at 2000 feet twice as high as one at 1000 feet when viewed from the valley?

many types of geographical data used in GIS applications are nominal or ordinal
values establish the order of classes, or their distinct identity, but rarely intervals or ratios

thus you cannot:
multiply soil type 2 by soil type 3 and get soil type 6
divide urban area by the rank of a city to get a meaningful number
subtract suitability class 1 from suitability class 4 to get 3 of anything

however, you can:
divide population by area (both ratio scales) and get population density
subtract elevation at point a from elevation at point b and get difference of elevation

Multiple representations

a data model is essential to represent geographical data in a digital database
there are many different data models
the same phenomena may be represented in different ways, at different scales and with different levels of accuracy
thus there may be multiple representations of the same geographical phenomena
it is difficult to convert from one representation to another
e.g. from a small scale (1:250,000) to a large scale (1:10,000)

thus it is common to find databases with multiple representations of the same phenomenon
this is wasteful, but techniques to avoid it are poorly developed

E. DATA SOURCES

Primary data collection

some of the data in a spatial database may have been measured directly
e.g. by field sampling or remote sensing

the density of sampling determines the resolution of the data
e.g. samples taken every hour will capture hour-to- hour variation, but miss shorter-term variation
e.g. samples taken every 1 km will miss any variation at resolutions less than 1 km

a sample is designed to capture the variation present in a larger universe
e.g. a sample of places should capture the variation present at all possible places
e.g. a sample of times will be designed to capture variation at all possible times

there are several standard approaches to sampling:
in a random sample, every place or time is equally likely to be chosen
systematic samples are chosen according to a rule, e.g. every 1 km, but the rule is expected to create
no bias in the results of analysis, i.e. the results would have been similar if a truly random sample had been taken

in a stratified sample, the researcher knows for some reason that the universe contains significantly different sub-populations, and samples within each sub-population in order to achieve adequate representation of each
e.g. we may know that the topography is more rugged in one part of the area, and sample more densely there to ensure adequate representation
if a representative sample of the entire universe is required, then the subsamples in each subpopulation will have to be weighted appropriately

Secondary data sources

some data may have been obtained from existing maps, tables, or other databases
such sources are termed secondary

to be useful, it is important to obtain information in addition to the data themselves:
information on the procedures used to collect and compile the data
information on coding schemes, accuracy of instruments

unfortunately such information is often not available
a user of a spatial database may not know how the data were captured and processed prior to input
this often leads to misinterpretation, false expectations about accuracy

F. STANDARDS

standards may be set to assure uniformity
within a single data set
across data sets
e.g. uniform information about timber types throughout the database allows better fire fighting methods to be used, or better control of insect infestations

data capture should be undertaken in standardized ways that will assure the widest possible use of the information

Sharing data

it is not uncommon for as many as three agencies to create databases with, ostensibly, the same information
e.g. a planning agency may map landuse, including a forested class
e.g. the state department of forestry also maps forests
e.g. the wildlife division of the department of conservation maps habitat, which includes fields and forest

each may digitize their forest class onto different GIS systems, using different protocols, and with different definitions for the classes of forest cover
this is a waste of time and money
sharing information gives it added value
sharing basic formats with other information providers, such as a department of transportation, might make marketing the database more profitable

Agency standards

state and national agencies have set standards for certain environmental data
the Soil Conservation Service (SCS) has adopted the "seventh approximation"as the national taxonomy
the US Geological Survey has set standards for landuse, transportation, and hydrography that are used as guidelines in many states
forest inventories are not standardized; agencies may use different systems while managing a contiguous region of forest land

Unit 69 covers standards for GIS in greater depth

G. ERRORS AND ACCURACY

note: Units 45 and 46 discuss this topic in detail
there is a nearly universal tendency to lose sight of errors once the data are in digital form
errors:
are implanted in databases because of errors in the original sources (source errors)
are added during data capture and storage (processing errors)
occur when data are extracted from the computer
arise when the various layers of data are combined in an analytical exercise

Original Sin - errors in sources

are extremely common in non-mapped source data, such as locations of wells, or lot descriptions
can be caused by doing inventory work from aerial photography and misinterpreting images
often occur because base maps are relied on too heavily
a recent attempt in Minnesota to overlay Department of Transportation bridge locations on USGS transportation data resulted in bridges lying neither beneath roads, nor over water, and roads lying apparently under rivers
until they were compared in this way, it was assumed that each data set was locationally acceptable
the ability of GIS to overlay may expose previously unsuspected errors

Boundaries

boundaries of soil types are actually transition zones, but are mapped by lines less than 0.5 mm wide
lakes fluctuate widely in area, yet have permanently recorded shorelines

Classification errors

are common when tabular data are rendered in map form
simple typing errors may be invisible until presented graphically
floodplain soils may appear on hilltops
pastureland may appear to be misinterpreted marsh

more complex classification errors may be due to the sampling strategies that produced the original data
timber appraisal is commonly done using a few, randomly selected points to describe large stands
information may exist that documents the error of the sampling technique
however, such information is seldom included in the GIS database

Data capture errors

manual data input induces another set of errors
eye-hand coordination varies from operator to operator and from time to time
data input is a tedious task - it is difficult to maintain quality over long periods of time

Accuracy standards

many agencies have established accuracy standards for geographical data
these are more often concerned with accuracy of locations of objects than with accuracy of attributes

location accuracy standards are commonly decided from the scale of source materials
for natural resource data 1:24,000 scale accuracy is a common target
at this scale, 0.5 mm line width = 12 m on the ground

USGS topographic information is currently available in digital form at 1:100,000
0.5 mm line width = 50 m on the ground

higher accuracy requires better source materials
is the added cost justified by the objectives of the study?

accuracy standards should be determined by considering both the value of information and the cost of collection

REFERENCES

Berry, B.J.L and A.M. Baker, 1968. "Geographic sampling. In B.J.L. Berry and D.F. Marble, editors, Spatial Analysis. Prentice Hall, Englewood Cliffs NJ, 91-100. A classic paper on sampling geographical distributions.

Hopkins, Lewis D., 1977, "Methods for generating land suitability maps: A comparative evaluation,"AIP Journal October 1977:386-400. An excellent discussion of the different measurement scales is given in an appendix.

Sinton, D., 1978. "The inherent structure of information as a constraint to analysis: mapped thematic data as a case study, Harvard Papers on Geographic Information Systems, Vol. 7, G. Dutton (ed.), Addison Wesley, Reading, MA. A classic paper on the relationships between the database and reality.

Standard sampling theory is covered in many texts on scientific measurement.

EXAM AND DISCUSSION QUESTIONS

1. Take an example map showing the observed occurrences of some rare event, and discuss the factors influencing the sampling process. Good examples are maps of tornado sightings, herbarium records of rare plants.

2. Using a topographic map, discuss the ways in which the contents and design of the map influence the user''s view of the real world.

3. Review the accuracy information available for several different scales and types of maps, and spatial databases if available.

4. The Global Positioning System (GPS) will soon be capable of providing latitude and longitude positions to the nearest meter using portable receivers weighing on the order of 1 kg, in no more than one minute. This is significantly more accurate than the best base mapping generally available in the US (1:24,000). Discuss what effect this system might have on map makers and map users.

DATA INPUT
A. INTRODUCTION
Modes of data input

B. DIGITIZERS
Hardware
The digitizing operation
Problems with digitizing maps
Editing errors from digitizing
Digitizing costs

C. SCANNERS
Video scanner
Electromechanical scanner
Requirements for scanning

D. CONVERSION FROM OTHER DIGITAL SOURCES
Automated Surveying
Global Positioning System (GPS)

E. CRITERIA FOR CHOOSING MODES OF INPUT

F. RASTERIZATION AND VECTORIZATION
Rasterization of digitized data
Vectorization of scanned images

G. INTEGRATING DIFFERENT DATA SOURCES
Formats
Projections
Scale
Resampling rasters

REFERENCES

DISCUSSION AND EXAM QUESTIONS

NOTES

This unit examines the common methods of data input. This may be a good time to take a field trip to a local GIS shop to show students the operation of these various devices. If you can''t find local examples, the slide set contains some examples of the hardware items described.

UNIT 7 - DATA INPUT

Compiled with assistance from Jeffrey L. Star, University of California at Santa Barbara, and Holly Dickinson, SUNY Buffalo

A. INTRODUCTION

need to have tools to transform spatial data of various types into digital format

data input is a major bottleneck in application of GIS technology
costs of input often consume 80% or more of project costs
data input is labor intensive, tedious, error-prone
there is a danger that construction of the database may become an end in itself and the project may not move on to analysis of the data collected
essential to find ways to reduce costs, maximize accuracy

need to automate the input process as much as possible, but:
automated input often creates bigger editing problems later
source documents (maps) may often have to be redrafted to meet rigid quality requirements of automated input

because of the costs involved, much research has gone into devising better input methods - however, few reductions in cost have been realized

sharing of digital data is one way around the input bottleneck
more and more spatial data is becoming available in digital form

data input to a GIS involves encoding both the locational and attribute data

the locational data is encoded as coordinates on a particular cartesian coordinate system
source maps may have different projections, scales
several stages of data transformation may be needed to bring all data to a common coordinate system

attribute data is often obtained and stored in tables

Modes of data input

keyboard entry for non-spatial attributes and occasionally locational data

manual locating devices
user directly manipulates a device whose location is recognized by the computer
e.g. digitizing

automated devices
automatically extract spatial data from maps and photography
e.g. scanning

conversion directly from other digital sources

voice input has been tried, particularly for controlling digitizer operations
not very successful - machine needs to be recalibrated for each operator, after coffee breaks, etc.

B. DIGITIZERS

digitizers are the most common device for extracting spatial information from maps and photographs
the map, photo, or other document is placed on the flat surface of the digitizing tablet

Hardware

the position of an indicator as it is moved over the surface of the digitizing tablet is detected by the computer and interpreted as pairs of x,y coordinates
the indicator may be a pen-like stylus or a cursor (a small flat plate the size of a hockey puck with a cross-hair)

frequently, there are control buttons on the cursor which permit control of the system without having to turn attention from the digitizing tablet to a computer terminal

digitizing tablets can be purchased in sizes from 25x25 cm to 200x150 cm, at approximate costs from $500 to $5,000

early digitizers (ca. 1965) were backlit glass tables
a magnetic field generated by the cursor was tracked mechanically by an arm located behind the table
the arm''s motion was encoded, coordinates computed and sent to a host processor
some early low-cost systems had mechanically linked cursors - the free-cursor digitizer was initially much more expensive

the first solid-state systems used a spark generated by the cursor and detected by linear microphones
problems with errors generated by ambient noise

contemporary tablets use a grid of wires embedded in the tablet to generate a magnetic field which is detected by the cursor
accuracies are typically better than 0.1 mm
this is better than the accuracy with which the average operator can position the cursor
functions for transforming coordinates are sometimes built into the tablet and used to process data before it is sent to the host

The digitizing operation

the map is affixed to a digitizing table

three or more control points ("reference points", "tics", etc.) are digitized for each map sheet
these will be easily identified points (intersections of major streets, major peaks, points on coastline)
the coordinates of these points will be known in the coordinate system to be used in the final database, e.g. lat/long, State Plane Coordinates, military grid
the control points are used by the system to calculate the necessary mathematical transformations to convert all coordinates to the final system
the more control points, the better

digitizing the map contents can be done in two different modes:
in point mode, the operator identifies the points to be captured explicitly by pressing a button
in stream mode points are captured at set time intervals (typically 10 per second) or on movement of the cursor by a fixed amount

advantages and disadvantages:
in point mode the operator selects points subjectively
two point mode operators will not code a line in the same way
stream mode generates large numbers of points, many of which may be redundant
stream mode is more demanding on the user while point mode requires some judgement about how to represent the line

most digitizing is currently done in point mode

Problems with digitizing maps

arise since most maps were not drafted for the purpose of digitizing
paper maps are unstable: each time the map is removed from the digitizing table, the reference points must be re-entered when the map is affixed to the table again
if the map has stretched or shrunk in the interim, the newly digitized points will be slightly off in their location when compared to previously digitized points
errors occur on these maps, and these errors are entered into the GIS database as well
the level of error in the GIS database is directly related to the error level of the source maps

maps are meant to display information, and do not always accurately record locational information
for example, when a railroad, stream and road all go through a narrow mountain pass, the pass may actually be depicted wider than its actual size to allow for the three symbols to be drafted in the pass

discrepancies across map sheet boundaries can cause discrepancies in the total GIS database
e.g. roads or streams that do not meet exactly when two map sheets are placed next to each other

user error causes overshoots, undershoots (gaps) and spikes at intersection of lines
diagram

user fatigue and boredom

for a complete discussion on the manual digitizing process, see Marble et al, 1984

Editing errors from digitizing

some errors can be corrected automatically
small gaps at line junctions
overshoots and sudden spikes in lines

error rates depend on the complexity of the map, are high for small scale, complex maps

these topics are explored in greater detail in later Units
Unit 13 looks at the process of editing digitized data
Units 45 and 46 discuss digitizing error

Digitizing costs

a common rule of thumb in the industry is one digitized boundary per minute
e.g. it would take 99/60 = 1.65 hours to digitize the boundaries of the 99 counties of Iowa

C. SCANNERS
Video scanner

essentially television cameras, with appropriate interface electronics to create a computer-readable dataset
available in either black and white or color
extremely fast (scan times of under 1 second)
relatively inexpensive ($500 - $10,000)

produce a raster array of brightness (or color) values, which are then processed much like any other raster array
typical data arrays from video scanners are of the order of 250 to 1000 pixels on a side

typically have poor geometrical and radiometrical characteristics, including various kinds of spatial distortions and uneven sensitivity to brightness across the scanned field
video scanners are difficult to use for map input because of problems with distortion and interpretation of features

Electromechanical scanner

unlike the video scanning systems, electromechanical systems are typically more expensive ($10,000 to 100,000) and slower, but can create better quality products

one common class of scanners involves attaching the graphic to a drum
as the drum rotates about its axis, a scanner head containing a light source and photodetector reads the reflectivity of the target graphic, and digitizing this signal, creates a single column of pixels from the graphic
the scanner head moves along the axis of the drum to create the next column of pixels, and so on through the entire scan
compare the action of a lathe in a machine shop

this controls distortion by bringing the single light source and detector to position on a regular grid of locations on the graphic

systems may have a scan spot size of as little as 25 micrometers, and be able to scan graphics of the order of 1 meter on a side

an alternative mechanism involves an array of photodetectors which extract data from several rows of the raster simultaneously
the detector moves across the document in a swath
when all the columns have been scanned, the detector moves to a new swath of rows

for an in-depth discussion scanning techniques, see Peuquet and Boyle (1984)

Requirements for scanning

documents must be clean (no smudges or extra markings)

lines should be at least 0.1 mm wide

complex line work provides greater chance of error in scanning

text may be accidently scanned as line features

contour lines cannot be broken with text

automatic feature recognition is not easy (two contour lines vs. road symbols)
diagram

special symbols (e.g. marsh symbols) must be recognized and dealt with

if good source documents are available, scanning can be an efficient time saving mode of data input

DATA INPUT
D. CONVERSION FROM OTHER DIGITAL SOURCES

involves transferring data from one system to another by means of a conversion program

more and more data is becoming available in magnetic media
USGS digital cartographic data (DLGs - Digital Line Graphs)
digital elevation models (DEMs)
TIGER and other census related data
data from CAD/CAM systems (AutoCAD, DXF)
data from other GIS

these data generally are supplied on digital tapes that must be read into the computer
however, CD-ROM is becoming increasingly popular for this purpose
provides better standards
CD-ROM hardware is much less expensive - CD-ROM drive $1000, tape drive $14,000

Automated Surveying

directly determines the actual horizontal and vertical positions of objects

two kinds of measurements are made: distance and direction
traditionally, distance measuring involved pacing, chains and tapes of various materials
direction measurements were made with transits and theodolites

modern surveyors have a number of automated tools to make distance and direction measurements easier

electronic systems measure distance using the time of travel of beams of light or radio waves
by measuring the round-trip time of travel, from the observing instrument to the object in question and back, we can use the relationship (d = v x t) to determine the distance
an instrument based on timing the travel of a pulse of infrared light can measure distances on the order of 10 km with a standard deviation of +/- 15 mm

the total station (cost about $30,000) captures distance and direction data in digital form
the data is downloaded to a host computer at the end of each session for direct input to GIS and other programs

Global Positioning System (GPS)

a new tool for determining accurate positions on the surface of the earth

computes positions from signals received from a series of satellites (NAVSTAR)
as of April, 1990 there are 20 in orbit, by 1991 there should be the full set of 24
are currently 7 active but eventually will be 21

depends on precise information about the orbits of the satellites

a radio receiver with appropriate electronics is connected to a small antenna, and depending on the method used, in one hour to less than 1 second, the system is able to determine its location in 3-D space

developed and operated by the US armed forces, but access is generally available and civilian interest is high

particularly valuable for establishing accurate positional control in remote areas

current GPS receivers cost about $5,000 to $15,000 (mid 1990) but costs will decline rapidly

railroad companies are using GPS to create the first accurate survey of the US rail network and to track train positions

recently, the use of GPS has resulted in corrections to the elevations of many of the world''s peaks, including Mont Blanc and K2

current GPS positional accuracies are order 5 to 10 m with standard equipment and as small as 1 cm with "survey grade" receivers
accuracy will continue to improve as more satellites are placed in orbit and experts fine tune the software and hardware

GPS accuracy is already as good as the largest scale base mapping available for the continental US

E. CRITERIA FOR CHOOSING MODES OF INPUT

the type of data source
images favor scanning
maps can be scanned or digitized

the database model of the GIS
scanning easier for raster, digitizing for vector

the density of data
dense linework makes for difficult digitizing

expected applications of the GIS implementation

F. RASTERIZATION AND VECTORIZATION
Rasterization of digitized data

for some data, entry in vector form is more efficient, followed by conversion to raster

we might digitize the county boundary in vector form by
mounting a map on a digitizing table
capturing the locations of points along the boundary
assuming that the points are connected by straight line segments

this may produce an ASCII file of pairs of xy coordinates which must then be processed by the GIS, or the output of the digitizer may go directly into the GIS

the vector representation of the boundary as points is then converted to a raster by an operation known as vector-raster conversion
the computer calculates which county each cell is in using the vector representation of the boundary and outputs a raster

digitizing the boundary is much less work than cell by cell entry

most raster GIS have functions such as vector-raster conversion to support vector entry
many support digitizing and editing of vector data

Vectorization of scanned images

for many purposes it is necessary to extract features and objects from a scanned image
e.g. a road on the input document will have produced characteristic values in each of a band of pixels
if the scanner has pixels of 25 microns = 0.025 mm, a line of width 0.5 mm will create a band 20 pixels across
the vectorized version of the line will be a series of coordinate points joined by straight lines, representing the road as an object or feature instead of a collection of contiguous pixels

successful vectorization requires a clean line scanned from media free of cluttering labels, coffee stains, dust etc.
to create a sufficiently clean line, it is often necessary to redraft input documents
e.g. the Canada Geographic Information System redrafted each of its approximately 10,000 input documents

since the scanner can be color sensitive, vectorizing may be aided by the use of special inks for certain features

although scanning is much less labor intensive, problems with vectorization lead to costs which are often as high as manual digitizing
two stages of error correction may be necessary: 1. edit the raster image prior to vectorization 2. edit the vectorized features

G. INTEGRATING DIFFERENT DATA SOURCES
Formats

many different format standards exist for geographical data

some of these have been established by public agencies
e.g. the USGS in cooperation with other federal agencies is developing SDTS (Standard Data Transfer Standard) for geographical data, will propose it as a national standard in 1990
e.g. the Defense Mapping Agency (DMA) has developed the DIGEST data transfer standard

some have been defined by vendors
e.g. SIF (Standard Interchange Format) is an Intergraph standard for data transfer

see Unit 69 for more on GIS standards

a good GIS can accept and generate datasets in a wide range of standard formats

Projections

there are many ways of representing the curved surface of the earth on a flat map
some of these map projections are very common, e.g. Mercator, Universal Transverse Mercator (UTM), Lambert Conformal Conic
each state has a standard SPC (State Plane Coordinate system) based on one or more projections
see Unit 27 for more on map projections

a good GIS can convert data from one projection to another, or to latitude/longitude

input derived from maps by scanning or digitizing retains the map''s projection

with data from different sources, a GIS database often contains information in more than one projection, and must use conversion routines if data are to be integrated or compared

Scale

data may be input at a variety of scales

although a GIS likely will not store the scale of the input document as an attribute of a dataset, scale is an important indicator of accuracy

maps of the same area at different scales will often show the same features
e.g. features are generalized at smaller scales, enhanced in detail at larger scales

variation in scales can be a major problem in integrating data
e.g. the scale of most input maps for a GIS project is 1:250,000 (topography, soils, land cover) but the only geological mapping available is 1:7,000,000
if integrated with the other layers, the user may believe the geological layer is equally accurate
in fact, it is so generalized as to be virtually useless

Resampling rasters

raster data from different sources may use different pixel sizes, orientations, positions, projections

resampling is the process of interpolating information from one set of pixels to another

resampling to larger pixels is comparatively safe, resampling to smaller pixels is very dangerous

REFERENCES
Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment, Clarendon, Oxford. Chapter 4 reviews alternative methods of data input and editing for GIS.

Chrisman, N.R., 1978. "Efficient digitizing through the combination of appropriate hardware and software for error detection and editing," International Journal of Geographical Information Systems 1:265-77. Discusses ways of reducing the data input bottleneck.

Drummond, J., and M. Bosman, 1989. "A review of low-cost scanners," International Journal of Geographical Information Systems 3:83-97. A good review of current scanning technology.

Ehlers, M., G. Edwards and Y. Bedard, 1989. "Integration of remote sensing with GIS: a necessary evolution," Photogrammetric Engineering and Remote Sensing 55(11):1619-27. A recent review of the relationship between the two technologies.

Goodchild, M.F. and B.R. Rizzo, 1987. "Performance evaluation and work-load estimation for geographic information systems," International Journal of Geographical Information Systems 1:67-76. Statistical analysis of costs of scanning.

Lai, Poh-Chin, 1988. "Resource use in manual digitizing. A case study of the Patuxent basin geographical information system database," International Journal of Geographical Information Systems 2(4):329-46. A detailed analysis of the costs of building a practical database.

Marble, D.F., J.P. Lauzon, and M. McGranaghan, 1984. "Development of a Conceptual Model of the Manual Digitizing Process," Proceedings of the International Symposium on Spatial Data Handling, Volume 1, August 20- 24, 1984, Zurich Switzerland, Symposium Secretariat, Department of Geography, University of Zurich-Irchel, 8057 Zurich, Switzerland. Conceptual discussion of the digitizing process.

Peuquet, D. J., 1981. "Cartographic data, part I: the raster-to-vector process," Cartographica 18:34-48.

Peuquet, D. J., 1981. "An examination of techniques for reformatting digital cartographic data, part II: the vector-to-raster process," Cartographica 18:21-33.

Peuquet, D. J., and A. R. Boyle, 1984. Raster Scanning, Processing and Plotting of Cartographic Documents, SPAD Systems, Ltd., P.O. Box 571, Williamsville, New York, 14221, U.S.A. A comprehensive discussion of scanning technology.

Tomlinson, R.F., H.W. Calkins and D.F. Marble, 1976. Computer Handling of Geographical Data, UNESCO Press, Paris. Comparison of input methods and costs of 5 GISs.

DISCUSSION AND EXAM QUESTIONS

1. In his book Computers and the Representation of Geographical Data (Wiley, New York, 1987), E.E. Shiryaev argues that maps must be redesigned to be equally readable by humans and computer scanners, and that this would ultimately make scanning much more cost-effective than digitizing. How might this be done, and what advantages would it have?

2. The cost of digitizing has remained remarkably constant over the past 20 years despite dramatic reductions in computer hardware and software cost. Why is this, and what impact has it had on GIS? Do you predict any change in this situation in the future?

3. "Digitizing is a suitable activity for convicted criminals." Discuss.

4. As manager of a GIS operation, you have the task of laying out rules which your staff must follow in digitizing

complex geographical lines. What instructions would you give them to ensure a reasonable level of accuracy? Assume they will be using point mode digitizing, and that points will be connected by straight lines for analysis and output.

5. What type of documents are best suited for automatic scanning?

6. After reading the article by Marble, Lauzon and McGranaghan on the conceptual model of digitizing, describe and explain the importance of map pre-processing

SOCIO-ECONOMIC DATA
INTRODUCTION
Socio-economic data
Aggregate and disaggregate data
Cross-sectional and longitudinal data

B. SOCIO-ECONOMIC DATA FOR GIS
Sources of socio-economic data
"Geography"
Issues in using secondary socio-economic data

C. SOURCES OF SOCIO-ECONOMIC DATA
Population census
Economic census
Agricultural census
Labor force statistics
Land records
Transportation and infrastructure inventories
Administrative records

D. US CENSUS OF POPULATION AND HOUSING
Process of taking the census
Content
Processing of returns
Geographic referencing
Census reporting zones
Availability of Census data

E. TIGER
Development
Content
Marketing TIGER files
Non-census uses for TIGER

F. LAND RECORDS
Issues in land records modernization

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

It may be useful to illustrate this unit with several different examples of the data products described, including examples of census products such as summary reports, maps and even digital tapes.

UNIT 8 - SOCIO-ECONOMIC DATA

Compiled with assistance from Hugh Calkins, State University of New York at Buffalo

A. INTRODUCTION

Socio-economic data

are data about humans, human activities, and the space and/or structures used to conduct human activities

specific classes include
demographics (age, sex, ethnic and marital status, education)
housing (quality, cost)
migration
transportation
economics (personal incomes, employment, occupations, industry, regional growth)
retailing (customer locations, store sites, mailing lists)

Aggregate and disaggregate data

disaggregated data - data about individuals or single entities, for example:
a person''s age, sex, level of education, income, occupation, etc.
gross sales, number of employees, profit, etc. for a retail store
registration number and type for a single vehicle

aggregated data - describing a group of observations with the grouping made on a defined criterion
geographical data are often grouped by spatial units such as a census tract, traffic zone, etc.
aggregation can also be by time interval
e.g. number of persons leaving area in 5 years
also by socio-economic grouping
e.g. persons aged 5 through 14 years
examples of aggregated data are:
number of persons, average income, median housing value for a census tract
number of commute trips and average trip length from a suburban traffic zone to the central business district

Cross-sectional and longitudinal data

recall from Unit 6
cross-sectional data gives information on many areas for the same single slice or interval of time

e.g. average income in census tracts of Los Angeles for 1988
e.g. numbers migrating out of each state in the period 1971-75
longitudinal data gives information on one or more areas for a series of times
e.g. average income for State of New York from 1970-1988 by year

B. SOCIO-ECONOMIC DATA FOR GIS
Sources of socio-economic data

field surveys
much data used in marketing is gathered by door-to- door or street interview
field surveys require careful sampling design
how to obtain a representative sample
how to avoid bias toward certain groups in street interviews

government statistics
statistics collected and reported by government as part of required activities, e.g. Bureau of the Census
usually based on entire population, except sampling is used for some Census questions

government administrative records
records are collected by government as part of administrative functions, e.g. tax records, auto registrations, property taxes
these are useful sources of data provided confidentiality can be preserved
usually available only to government or for research purposes

secondary data collected by another group, often for different purposes
e.g. the original mandated purpose of the Census was to provide data for congressional districting

increasingly socio-economic data is available in digital form from private sector companies
retailers and direct-mail companies are major clients for these companies
includes data originally from census augmented from other sources and surveys
data can be customized for clients (special sets of variables, special geographical coverage or aggregation)
customizing justifies costs, which are often higher than for "raw" census data

"Geography"

for use in GIS, socio-economic statistics are of little use without associated "geography," the term often used to describe locational data
e.g. data on census tracts must be supported by digital information on locations of census tract boundaries

geography also allows data to be aggregated geographically, e.g. by merging data on individual cities into metropolitan regions

thus, many suppliers of socio-economic data also supply digitized geography of reporting zones

boundaries of many standard types of reporting zones change from time to time
e.g. changes occur occasionally in county boundaries
e.g. census enumeration districts are redefined for each census (see Redistricting in Unit 56)
difficult to assemble longitudinal data for such units due to changing geography

data is often needed for one set of reporting zones, only available for another set
e.g. data available for census tracts, required for school districts which do not follow same boundaries
such problems of cross-area estimation are facilitated by GIS technology

these problems are often grouped into the area of modifiable area problems (MAP)
considerable effort has been expended recently to develop statistically sound techniques to deal with these problems (see Openshaw, 1981)

Issues in using secondary socio-economic data

cost
usually secondary data is much less expensive than field surveys
large expenditures by government agencies on data collection (e.g. US Census) are indirect subsidies to users, who often pay much less than real cost of data

documentation
quality of documentation, supporting information (e.g. maps) is usually high for data collected by government

data quality

major difficulty is undercounting - census and other social surveys tend to miss certain groups, leading to bias in results
undercounting in US Census may be as high as 25% for certain social groups

data conversion
conversion steps may be necessary to make data useful in GIS
e.g. format, type of data may be incompatible

aggregation
are data available with suitable level of spatial, temporal aggregation?
e.g. study to change elementary school district boundaries will require data at resolution of city blocks or higher
e.g. location for gas station will require city block level data, for regional shopping mall much lower resolution (greater aggregation of data) is adequate

currency
social data changes rapidly, can be quickly out of date because of births, deaths, migration, changing economy
competitive edge in retailing depends on having current data
US has a major census only every 10 years, so its data may be 10 years old
often have to estimate current or future patterns based on old data

accuracy of location
census locates people by place of residence - "night-time" census
"daytime" data would show locations during the day (place of work, school etc.) but is generally not available from standard sources

medical records often locate individuals by place of treatment (hospital), not residence or workplace
e.g. consider implications for detecting exposure to cancer-causing agents

C. SOURCES OF SOCIO-ECONOMIC DATA
Population census

questions on age, sex, income, education, ethnicity, migration, housing quality etc.

summary statistics used in research, planning, market research, available at high level of geographic resolution in many countries

see detailed discussion following for US case (Census of Population and Housing)

Economic census

enumeration and tabulation of business activity is conducted in the US by the Census Bureau in years ending in 2 and 7

detailed information on classes of industry

low level of geographic resolution (i.e. large reporting zones)

data collected in many countries through annual, quarterly or monthly returns of information from companies

Agricultural census

annual data on crops, yields, livestock etc.

more extensive periodic surveys of farm economy

available in spatially disaggregated form to e.g. county level in US

Labor force statistics

enumeration of employment, unemployment

produced from periodic (e.g. monthly) sample surveys of workforce

other special-purpose surveys often combined with regular labor force survey - e.g. household expenditures, recreation activities

often available for small areas, e.g. parts of city

Land records

record of land parcel description, ownership and value for taxation purposes

updated on a regular basis (e.g. annually) by municipality or county government

also used for land use planning

source of current demographic information in some countries/states (i.e. local census)

see detailed discussion following

Transportation and infrastructure inventories

planning, management and maintenance of facilities

includes roads and streets, power lines, gas lines, water, sewer lines

collected by local utilities, responsible government departments

valuable to variety of users
e.g. construction companies needing information on buried pipes
e.g. emergency management departments needing data on hazardous facilities

compiling agency often sees a substantial market for such data which can offset costs of collection

Administrative records

vehicle registrations, tax returns etc.

useful for various marketing, research purposes

based on 100% sample so can be disaggregated spatially
however, disaggregation causes problems over confidentiality of records

D. US CENSUS OF POPULATION AND HOUSING
Process of taking the census

purpose is to enumerate the population for redefining election districts

taken every ten years (l960, l970, etc.)

April lst is census day, although complete enumeration takes a "few" weeks

most households receive forms in mail, some require visit by enumerator

Content

two types of items - those completed by "100%" of the population, those by random sample

Processing of returns

automated encoding to digital form

automated editing to correct obvious inconsistencies

some missing items can be assigned automatically using simple rules

other missing items are assigned based on probabilities

data assembled into master database

sample surveys processed to produce statistical summaries

Geographic referencing

initially returns are identified by street address

address is converted into geographic location using a digital referencing system
for the 1980 census, DIME (Dual Independent Map Encoding) files were used for digital geographic referencing of urbanized portions of the US
for the 1990 census, TIGER files covering every county will be used

since TIGER files will have a major impact on GIS databases in the next decade, they are discussed in detail in the next section

Census reporting zones

range from blocks to states

as noted previously, the geographic boundaries and definitions of these areas may change from one census to the next

Availability of Census data

tabulation of statistics by reporting zones, e.g. population by county, population by age by county

crosstabulation, e.g. population by age and sex by county

special tabulations, e.g. for unusual combinations of characteristics, or for unusual or custom reporting zones

number of possible tabulations and crosstabulations is infinite, volume of census products vastly exceeds volume of data collected

alternative formats for products
printed reports
magnetic media - tapes, disks
microfiche, microfilm, now CDs

sources of census data
state data centers distribute Census data
private firms repackage and customize data, produce custom reports (e.g. tabulation of population by distance from proposed mall location)

geography products available

base maps showing reporting zones
atlases produced for urban areas
digital products - boundary files, TIGER

SOCIO-ECONOMIC DATA
E. TIGER

reference: beginning of this Unit (TIGER)

Development

TIGER stands for Topologically Integrated Geographic Encoding and Referencing

designed to:
support pre-census geographic and cartographic functions in preparation for the 1990 Census
to complete and evaluate the data collection operations of the census
to assist in the analysis of the data as well as to produce new cartographic products

TIGER files were created by the Bureau of the Census with the assistance of the US Geological Survey

Content

TIGER/line files are organized by county

they contain:
map features such as roads, railroads and rivers
census statistical area boundaries
political boundaries
in metropolitan areas, address ranges and ZIP codes for streets

Marketing TIGER files

Census Bureau
1990 Census versions of TIGER/Line files will be available from the Census Bureau in early 1991
cost for prototype and precensus TIGER/Line files on magnetic tape are $200 (US) for the first county and $25 for each additional county in that state ordered at the same time
the 50 states plus DC on tape cost $87,450
precensus files are also available on CD-ROM for $250 per disk, 40 disks are required for coverage of the entire country (all prices as of Jan. 1990)

Third party vendors

as of December 1989, 25 vendors had notified the Census Bureau that they will market repackaged versions of TIGER/Line files, in many cases with software which will enable users to access this data easily and quickly
many of these products are being designed for use on micro-computers

Non-census uses for TIGER

TIGER files are valuable for other purposes
e.g. locating customers from address lists
e.g. planning vehicle routes through city streets, for parcel delivery, cab dispatching
for these purposes TIGER files need to be kept current at all times, but Bureau of the Census only requires them to be current every 10 years

see Unit 29 for technical details of TIGER files

F. LAND RECORDS

many systems have been developed by local governments in the US to manage land, particularly in urban areas

in other countries there has been more effective coordination at provincial and national levels, e.g. Australia
practices in different countries depend on the system of land tenure

the basic entity in land records systems is the land parcel, i.e. the basic unit of ownership

traditionally, land records have been managed by hand using methods which often date back 200 years

land records are the basis of the system of local taxation, administration, as well as transfer of ownership and subdivision

Issues in land records modernization

accurate land records systems require accurate base mapping at a large enough scale, e.g. 1:1,000
such base mapping is not normally available in the US, only the wealthiest governments can afford to create it, e.g. from air photos
the term cadaster is used for mapping of land ownership

the cost of building land records systems can often be recovered, at least partially, from sales of data (e.g.
to utilities, real estate developers) and use in other departments

the term multi-purpose cadaster (MPC) describes the idea of using the cadaster for many purposes

because land records systems are being developed independently by many different jurisdictions, there is little standardization of approach, software, etc.

see Unit 54 for a discussion of MPC applications

REFERENCES
The Bureau of the Census, US Department of Commerce produces numerous documents on the Census and its products, including TIGER. Factfinder for the Nation describes data available from the Census Bureau. Census ''90 Basics describe the content, geographic areas and products of the census. Similar material is available from appropriate organizations in other countries, e.g. Statistics Canada.

Marx, R. W., ed, 1990. "The Census Bureau''s TIGER System," a special issue of Cartography and Geographic Information Systems Vol 17(1). Contains several articles providing details on the contents and database structure of TIGER.

Kaplan, C.P. and T.L. van Valey, 1980. CENSUS ''80: Continuing the Factfinder Tradition, US Department of Commerce, Bureau of the Census. A good review of Census applications.

Richards, D. and P.M. Jones, 1984. "General sources of information," in R.L. Davies and D.S. Rogers, eds., Store Location and Store Assessment Research, John Wiley and Sons, New York, Chapter 4. This chapter reviews sources of socio-economic data in both the US and the UK.

Marx, R.W., 1986. "The TIGER System: Automating the Geographic Structure of the United States Census," Government Publications Review 13:181-201. Discusses the development of the TIGER system

Openshaw, S., 1977. "A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modelling," Institute of British Geographers, Transactions 2(NS):459-72.

Openshaw, S., and P.J. Taylor, 1981. "The modifiable areal unit problem," in N. Wrigley and R.J. Bennett, editors, Quantitative Geography: A British View, Routledge, London.

EXAM AND DISCUSSION QUESTIONS

1. Confidentiality is a major issue in the US Census, and the need to preserve privacy conflicts directly with the need for disaggregated data for numerous purposes. What are the factors to be considered in trying to reconcile these conflicting needs? Is the balance affected by use of GIS?

2. Devise a scheme for creating and maintaining a constantly updated digital file of all streets and associated address ranges etc., i.e. a perpetually current TIGER. What would be the costs of the scheme, and what advantages would it have over the current situation?

3. "The concept of a decennial census was devised almost two hundred years ago and has become increasingly inappropriate to the modern age". Discuss.

4. A spreadsheet (such as Lotus 1-2-3) allows the user to perform a variety of functions on tabular data. Discuss the possibility of a "geographical spreadsheet" - what would it do, and what applications would it have it?

ENVIRONMENTAL AND NATURAL RESOURCE DATA
. INTRODUCTION
Contents of environmental databases

B. CHARACTERISTICS
Spatial management units

C. SOURCES OF DATA
Thematic
Topographic
Remote sensing

D. REMOTE SENSING AND GIS
Wavelengths
Scale in images
Elevation
Image interpretation
Classification
Problems in classification
Using remotely sensed data in GIS

E. EXAMPLE DATABASE - MLMIS
Minnesota Land Management Information System (MLMIS)
Example use of MLMIS data layers

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

You may prefer to use a local example of a natural resources database in place of the section on the MLMIS. This section can then serve as an outline for the organization of information about your local example.

Examples of different air photos (low level, high level, oblique), satellite (natural color, false color) and radar images would be useful illustrations for this unit.

UNIT 9 - ENVIRONMENTAL AND NATURAL RESOURCE DATA

Compiled with assistance from Charles Parson, Bemidji State University and Jeffrey L. Star, University of California, Santa Barbara

A. INTRODUCTION

natural resource-based GISs may be used:
as an inventory tool
to better manage the marketing of the resource
to protect the resource from improper development
to model the complex interactions between phenomena so that forecasts can be used in decision-making

Contents of environmental databases

there are several different kinds of information needed in an environmental database
many of these are obvious: geology, vegetation, hydrology, soils

however, to address a range of issues, the environmental database must include several characteristics that are not generally perceived as "natural"
transportation network
political boundaries
management unit boundaries

other data may be needed for modeling, e.g. variables relating to:
erosion
groundwater flow
soil productivity

B. CHARACTERISTICS

natural resource data in GIS is comparatively static
update can be infrequent

spatial resolution can be relatively low
e.g. grid cells covering large areas

historically, natural resource GIS have been raster-based
adequate for many planning and management applications
can provide comprehensive coverage of a jurisdiction at reasonable cost
could often run on existing mainframes - hardware requirements were modest

Spatial management units

the actual management units of most natural resources in North America are pseudo-rasters
square, forty acre parcels are the standard building block for PLSS areas (areas surveyed under the Public Land Survey System) of the Midwest, and Western United States, and much of Canada
"forties" are frequently broken into ten acre units, or combined into:
quarter sections (160 acres)
sections (640 acres, 1 square mile)
townships (6x6 miles)
farms are managed in rectangular fields and forest resources are sold in similar acreage units

however, natural resources do not commonly conform to PLSS grids
vector-based systems appear better able to accurately represent them

on the other hand, satellite imagery, which is an important source of environmental data is raster-based

C. SOURCES OF DATA
Thematic

thematic map series are compiled by various agencies:
soil maps (e.g. Soil Conservation Service)
land use (e.g. USGS land use series)
vegetation (forestry agencies, many state governments)
surficial geology (US and state geological surveys)

Topographic

topographic maps can supply:
elevations
roads and railroads
cultural features
streams and lakes
political and administrative boundaries
public land survey system (PLSS) - "township and range"

this type of data from USGS topographic maps is becoming available in digital form as DLG (digital line graph) files

elevation data is available from the USGS in the form of DEMs, (digital elevation models) at various resolutions

US Geological Survey supplies 30 m resolution data for much of US

Remote sensing

remotely sensed imagery data can be interpreted to yield many layers
e.g. urban/rural, vegetation, crops, surface geology, land use

LANDSAT and TM (Thematic Mapper) are commonly used sources

D. REMOTE SENSING AND GIS

definition of remote sensing
"In the broadest sense, the measurement or acquistion of information of some property of an object or phenomena, by a recording device that is not in physical or intimate contact with the object or phenomena under study" (Manual of Remote Sensing)

aircraft and satellite platforms can be used

selection of a platform involves balancing a number of competing goals:
ability to schedule the acquisition
atmospheric distortions vs. platform stability
the available suite of sensors for a given application
issues of coverage and scale
cost

data can be captured in analog (photographs) or digital form (data, transmitted to a ground station or recorded onboard)

Wavelengths

key issue in a remotely sensed observation is the range of wavelengths of energy that will be observed

the human eye sees only a limited range of wavelengths
photographs capture visible light

remotely sensed observations may include information in the infrared portion of the spectrum which is not visible to human eyes
infrared sensors allow recording of the thermal characteristics of the earth''s surface

microwave wavelengths can also be used

Radar is a form of microwave system
sometimes of particular value due to the ability to penetrate clouds and carry their own source of illumination
i.e. radar systems generate and collect radiation - they are active sensors
objects with large differences in their electrical properties may be discriminated, and the size of the
object compared to the wavelength of the radar system is also important

Scale in images

key concern is the scale of the images, and how the scale varies within each image due to distortion

many sources of distortion
focal length of the optical system, viewing geometry, surface topography greatly affect the scale at each location in the image

Elevation

information on elevation can be obtained by comparing photographs taken from different camera positions, i.e. stereographic images

the simplest devices for viewing pairs of photographs in stereo, called stereoscopes, effectively recreate the illusion of one''s eyes being in the same position as the camera lenses when the photographs were taken
produce the impression of 3-D images

more complex instruments known as stereoplotters allow operators to use pairs of photographs to develop accurate topographic maps and contours

thus, by understanding the geometrical details of the camera system and the Earth''s surface, one can determine both horizontal and vertical positions of objects with high accuracy and precision

an analytical plotter is a partially automated form of stereoplotter which obtains contours by automatically comparing photographs

Image interpretation

the identification of objects and determination of their significance involves:
Identification - recognizing features on the image

Measurement - once features have been identified, can make measurements (i.e., the distance between objects, the number of features per unit area)

Interpretation - normally based on a systematic examination of the primitive elements of the photograph, in conjunction with a wide range of ancillary data

primitive elements include tone, color, size, shape, texture, pattern, shadow, site, association
automated image analysis typically relies on only the first few primitive elements (tone, color, size)
ancillary data are often very diverse, may include maps, vegetation phenologies, and many kinds of information about human activities in the general area
human experts bring all these elements, plus their acquired skills and knowledge of related disciplines
the best photointerpreters have expertise in such related disciplines as physical geography, geology and plant biology and ecology
human interpretation also includes a significant perceptual or subjective component

Classification

the information obtained from a remote sensing instrument consists of reflectance measurements, often in several different bands or parts of the electromagnetic spectrum
measurements are in discrete units with fixed range, e.g. 0-255

the process of classification, an important part of image interpretation, attempts to assign each pixel to one of a number of classes based on its reflectance in one or more bands
e.g. vegetation types or land use classes ("urban", "pasture", "cropland", "water", "forested")

many techniques exist for classification
supervised classification develops the rules for assigning reflectance measurements to classes using a "training area", based on input from the user, then applies the rules automatically to the remaining image
unsupervised classification develops the rules automatically

Problems in classification

since reflectances vary with time of day, season of the year, etc., classification rules vary from image to image

classification is often uncertain or inaccurate
also pixels may often contain several classes - mixed pixels

despite this, classification assigns a single class to every pixel, ignoring uncertainty

there is no best method of classification - successful classification is time-consuming and can be expensive

Using remotely sensed data in GIS

often difficult or time consuming to develop systematic products of known accuracy
complex operations are required to force images to correspond to a known map projection and/or to have a consistent scale
difficult to go from image (varying reflectance or emissivity in different wavelength bands) to interpreted features and objects

however, since the value of a GIS is directly related to the quality and currency of its internal data
remote sensing offers a suite of tools for quickly creating current, consistent datasets for input to a GIS

conversely, remotely sensed data is best interpreted when additional spatial datasets (representing other dates, other scales, other sensors, other methods for acquiring data about the earth) are employed
such data may be obtained from a GIS

thus, strong links between remote sensing and GIS can improve both technologies

E. EXAMPLE DATABASE - MLMIS
Minnesota Land Management Information System (MLMIS)

one of the most extensive natural resource databases

a statewide inventory of layers for natural resource management and planning

list is the result of over fifteen years of involvement in projects that added data to the system

referred to as MLMIS40 because the fundamental structure is a raster with 40 acre cells

to improve spatial resolution, this is being gradually replaced with
vector files at a common scale of 1:24,000 (line- width resolution 12 m)

raster files with hectare grid cells

Example use of MLMIS data layers

how might the database (and a GIS) be used to assist a county to locate a waste disposal incinerator?

REFERENCES
Marble, D.F. et al., 1983. "Geographic information systems and remote sensing," Manual of Remote Sensing. ASPRS/ACSM, Falls Church, VA, 1:923-58. Reviews the various dimensions of the relationship between the two fields.

Niemann, Jr., B.J., et al, 1988. "The CONSOIL project: Conservation of natural resources through the sharing of information layers," Proceedings GIS/LIS ''88, San Antonio, TX, pp. 11-25. Reviews a multi-agency project in Wisconsin to design and evaluate an LIS for soil conservation.

Radde, G.L., 1987. "Under the Rainbow: GIS and Public Land Management Realities," Proceedings, IGIS ''87, Arlington, VA, 3:461-472. A discussion of the MLMIS, describes some projects that have made use of the system and how policy makers attitudes towards GIS have changed.

Star, J.L., and J. Estes, 1990. Geographic Information Systems: An Introduction, Prentice-Hall, Englewood Cliffs, NJ. Chapter 5 reviews data sources.

Sullivan, J.G., and B.J. Niemann, Jr., 1987. "Research Implications of eleven natural resource GIS applications," Proceedings, IGIS ''87, Arlington, VA, 3:329-341. A short review of several LIS for natural resource applications, discusses common themes, problems and techniques.

EXAM AND DISCUSSION QUESTIONS

1. Review the difficulties inherent in obtaining interpreted features and objects from remotely sensed images.

2. Assume that you have access to remotely sensed images of your city with a resolution of 80 m (roughly the pixel size of Landsat). What functions of city government or local business would be able to make use of this resolution?

3. Discuss the range of errors which may exist in a soils map.

4. Discuss each of the types of data mentioned in this class in terms of required frequency of update.

5. How does a soil map become outdated?

6. What layers might you want for siting a waste incinerator which are not in the MLMIS catalog?

SPATIAL DATABASES AS MODELS OF REALITY
A. INTRODUCTION

the real world is too complex for our immediate and direct understanding

we create "models" of reality that are intended to have some similarity with selected aspects of the real world

databases are created from these "models" as a fundamental step in coming to know the nature and status of that reality

Definition

a spatial database is a collection of spatially referenced data that acts as a model of reality
a database is a model of reality in the sense that the database represents a selected set or approximation of phenomena
these selected phenomena are deemed important enough to represent in digital form
the digital representation might be for some past, present or future time period (or contain some combination of several time periods in an organized fashion)

Standards

many of the definitions in this Unit have been standardized by the proposed US National Digital Cartographic Standard (DCDSTF, 1988)
these standards have been developed to provide a nationally uniform means for portraying and exchanging digital cartographic data
these cartographic standards will form part of a larger standard being developed for the digital representation of all earth science information

B. DATABASE CONTENT AND AN ORGANIZATION''S MISSION
Organization mandates

organizations have mandates to perform certain tasks that carry out their missions
mandates are the reasons they exist as organizations

organizations have different needs for data depending on their mandates and the activities required to carry out these mandates
mandates often help identify and define entities of interest, requiring a certain view of the world
what might seem at first glance to be the same data need in two different organizations can actually be quite different when we look at a more detailed level
e.g. wildlife and forestry departments both need information on vegetation but the detail needed is different

Database contents

Example: Transportation

highway data from the different points of view of a natural resources organization and a highway transportation organization
a natural resource organization might only need logging roads and the connecting access to state highways
the transportation organization''s main interest is in characterizing highways used by the public
the database might also be used to store detailed highway condition and maintenance information
we would expect their need for highway data to be more detailed than would the natural resource organization''s

Example: wetlands

wetlands data from the different points of view of an ecological organization and a taxing authority
ecological organization might define wetlands as a natural resource to be preserved and restricted from development
that perspective might require considerable detail for describing the area''s biology and physical resources
a taxing authority might define a wetland to be a "wasteland" and of very little value to society
that description might require only the boundary of the "wasteland" in the database

Database design

in each organization only certain phenomena are important enough to collect and represent in a database
the data collection process involves a sampling of geographic reality, to determine the status of that reality (whether past, present or future)

identifying the phenomena and then choosing an appropriate data representation for them is part of a process called database design

see Units 11 and 66 for more on database design

C. FUNDAMENTAL DATABASE ELEMENTS

elements of reality modeled in a GIS database have two identities: 1. the element in reality - entity
2. the element as it is represented in the database - object

a third identity that is important in cartographic applications is the symbol that is used to depict the object/entity as a feature on a map or other graphic display

these definitions and the following concepts are based on those defined by the DCDSTF, 1988 (see references)
handout - Definition of terms

Entity

an entity is "a phenomenon of interest in reality that is not further subdivided into phenomena of the same kind"
e.g. a city could be considered an entity and subdivided into component parts but these parts would not be called cities, they would be districts, neighborhoods or the like
e.g. a forest could be subdivided into smaller forests

Object

an object is "a digital representation of all or part of an entity"

the method of digital representation of a phenomenon varies according to scale, purpose and other factors
e.g. a city could be represented geographically as a point if the area under consideration were continental in scale
the same city could be geographically represented as an area if we are dealing with a geographic database for a state or a county

Entity types

similar phenomena to be stored in a database are identified as entity types

an entity type is any grouping of similar phenomena that should eventually get represented and stored in a uniform way, e.g. roads, rivers, elevations, vegetation
provides convenient conceptual framework for describing phenomena at a general level
organizational perspective influences this interpretation to a large degree

precise definitions should be generated for each entity type

helps with identifying overlapping categories of information
aids in clarifying the content of the database
the proposed US National Standard for Digital Cartographic Data Volume 2 (DCDSTF 1988) includes a large number of definitions for entity types

handout - Sample entity definitions

the first step in database development is the selection and definition of entity types to be included
this is guided by the organization''s mandate and purpose of the database
this framework can be as important as the actual database because it guides the development

the second step of database design is to choose an appropriate method of spatial representation for each of the entity types

Spatial object type

the digital representation of entity types in a spatial database requires the selection of appropriate spatial object types

the National Standard for Digital Cartographic Databases specifies a basic list of spatial objects and their characteristics

this classification is based on the following definition of spatial dimensions: 0-D - an object that has a position in space, but no length
a point 1-D - an object having a length
composed of two or more 0-D objects
a line 2-D - an object having a length and width
bounded by at least three 1-D line segment objects
an area 3-D - an object having a length, width and height/depth
bounded by at least four 2-D objects
a volume

overhead - Spatial object types (3 pages)
handout (cont) - Spatial object types

note very specific definitions for line segment, string, link, chain

spatial objects as representations of reality are dealt with in depth in Unit 11

Object classes

an object class is the set of objects which represent the set of entities
e.g. the set of points representing the set of wells

Attributes

an attribute is a characteristic of an entity selected for representation

usually non-spatial
though some may be related to the spatial character of the phenomena under study
e.g. area, perimeter

Attribute value

the actual value of the attribute that has been measured (sampled) and stored in the database

an entity type is almost always labeled and known by attributes
e.g. a road usually has a name and is identified according to its class - e.g. alley, freeway

attributes values often are conceptually organized in attribute tables which list individual entities in the rows and attributes in the column
entries in each cell of the table represent the attribute value of a specific attribute for a specific entity
note: attribute table is not an official DCDSTF term

Database model

is a conceptual description of a database defining entity type and associated attributes
each entity type is represented by specific spatial objects

after the database is constructed, the database model is a view of the database which the system can present to the user
other views can be presented, but this one is likely useful because it was important in the conceptual design
e.g. the system can model the data in vector form but generate a raster for purposes of display to the user
need not be related directly to the way the data are actually stored in the database
e.g. census zones may be defined as being represented by polygons, but the program may actually represent the polygon as a series of line segments

examples of database models can be grouped by application area
e.g. transportation applications require different database models than do natural resource applications

Layers

spatial objects can be grouped into layers, also called overlays, coverages or themes

one layer may represent a single entity type or a group of conceptually related entity types
e.g. a layer may have only stream segments or may have streams, lakes, coastline and swamps
options depend on the system as well as the database model
some spatial databases have been built by combining all entities into one layer

D. DATABASE DESIGN

almost all entities of geographic reality have at least a 3-dimensional spatial character, but not all dimensions may be needed
e.g. highway pavement actually has a depth which might be important, but is not as important as the width, which is not as important as the length

representation should be based on the types of manipulations that might be undertaken

map-scale of the source document is important in constraining the level of detail represented in a database
e.g. on a 1:100,000 map individual houses or fields are not visible

Steps in database design

1. Conceptual
software and hardware independent
describes and defines included entities
identifies how entities will be represented in the database
i.e. selection of spatial objects - points, lines, areas, raster cells
requires decisions about how real-world dimensionality and relationships will be represented
these can be based on the processing that will be done on these objects

e.g. should a building be represented as an area or a point?
e.g. should highway segments be explicitly linked in the database?

2. Logical
software specific but hardware independent
sets out the logical structure of the database elements, determined by the data base management system used by the software
this is discussed in greater detail in Unit 43

3. Physical
both hardware and software specific
requires consideration of how files will be structured for access from the disk
covered in Unit 66

Desirable database characteristics

database should be:
contemporaneous - should contain information of the same vintage for all its measured variables
as detailed as necessary for the intended applications
the categories of information and subcategories within them should contain all of the data needed to analyze or model the behavior of the resource using conventional methods and models
positionally accurate
exactly compatible with other information that may be overlain with it
internally accurate, portraying the nature of phenomena without error - requires clear definitions of phenomena that are included
readily updated on a regular schedule
accessible to whoever needs it

Issues in database design

almost all entities of geographic reality have at least 3-dimensional spatial character, but not all dimensions may be needed
e.g. highway pavement has a depth which might be important, but is not as important as the width, which is not as important as the length

representation should be based on types of manipulations that might be undertaken

map-scale of the source document is important in constraining the level of detail represented in a database

e.g. on a 1:100,000 map individual houses or fields are not visible

REFERENCES
Codd, E. F., 1981. "Data Models in Database Management," ACM SIGMOD Record 11(2):112-114. Explains the nature of data models, their role in constructing databases.

DCDSTF - Digital Cartographic Data Standards Task Force. 1988. "The proposed standard for digital cartographic data," The American Cartographer 15(1). Summary of the major components of the proposed US National Standard.

Robinson, A., R. Sale, J. Morrison, and P. Muehrcke, 1984. The Elements of Cartography, (5th ed.), John Wiley and Sons, New York. Useful survey of cartographic terminology and models.

Unwin D., 1981. Introductory Spatial Analysis, Methuen, London. A spatial analysis perspective on spatial data models.

SPATIAL OBJECTS AND DATABASE MODELS
A. INTRODUCTION

B. POINT DATA

C. LINE DATA
Network entities
Network characteristics
Attributes
Networks as linear addressing systems

D. AREA DATA
1. Environmental/natural resource zones
2. Socio-economic zones
3. Land records
Areal coverage
Holes and islands

E. REPRESENTATION OF CONTINUOUS SURFACES
General nature of surfaces
Data structures for representing surfaces
Spatial interpolation

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

This unit continues the development of basic concepts about representing reality as spatial data. Here we look at how the representation of reality in the form of entities is handled with the spatial objects points, lines and areas.

UNIT 11 - SPATIAL OBJECTS AND DATABASE MODELS

Compiled with assistance from Timothy L. Nyerges, University of Washington

A. INTRODUCTION

the objects in a spatial database are representations of real-world entities with associated attributes

the power of a GIS comes from its ability to look at entities in their geographical context and examine relationships between entities

thus a GIS database is much more than a collection of objects and attributes

in this unit we look at the ways a spatial database can be assembled from simple objects
e.g. how are lines linked together to form complex hydrologic or transportation networks
e.g. how can points, lines or areas be used to represent more complex entities like surfaces?

B. POINT DATA

the simplest type of spatial object

choice of entities which will be represented as points depends on the scale of the map/study
e.g. on a large scale map - encode building structures as point locations
e.g. on a small scale map - encode cities as point locations

the coordinates of each point can be stored as two additional attributes

information on a set of points can be viewed as an extended attribute table
each row is a point - all information about the point is contained in the row
each column is an attribute
two of the columns are the coordinates

overhead - Point data attribute table
here northing and easting represent y and x coordinates

each point is independent of every other point, represented as a separate row in the database model

C. LINE DATA
Network entities

infrastructure networks
transportation networks - highways and railroads
utility networks - gas, electric, telephone, water
airline networks - hubs and routes

natural networks
river channels

Network characteristics

a network is composed of:
nodes - junctions, ends of dangling lines
links - chains in the database model

diagram

valency of a node is the number of links at the node
ends of dangling lines are "1-valent"
4-valent nodes are most common in street networks
3-valent nodes are most common in hydrology

a tree network has only one path between any pair of nodes, no loops or circuits are possible
most river networks are trees

Attributes

examples of link attributes:
direction of traffic, volume of traffic, length, number of lanes, time to travel along link
diameter of pipe, direction of gas flow
voltage of electrical transmission line, height of towers
number of tracks, number of trains, gradient, width of most narrow tunnel, load bearing capacity of weakest bridge

examples of node attributes:
presence of traffic lights, presence of overpass, names of intersecting streets
presence of shutoff valves, transformers

note that some attributes (e.g. names of intersecting streets) link one type of entity to another (nodes to links)

some attributes are associated with parts of network links
e.g. part of a railroad link between two junctions may be inside a tunnel
e.g. part of a highway link between two junctions may need pavement maintenance

many GIS systems require such attributes to be attached to the network by splitting existing links and creating new nodes
e.g. split a street link at the house and attach the attributes of the house to the new (2-valent) node
e.g. create a new link for the stretch of railroad which lies inside the tunnel, plus 2 new nodes

this requirement can lead to impossibly large numbers of links and 2-valent nodes
e.g. at a scale of 1:100,000, the US rail network has about 300,000 links
the number of links would increase by orders of magnitude if new nodes had to be defined in order to locate bridges on links

Networks as linear addressing systems

often need to use the network as an addressing system, e.g. street network

address matching is the process of locating a house on a street network from its street address
e.g. if it is known that the block contains houses numbers from 100 to 198, house #124 would probably be 1/4 of the way along that link

points can be located on the network by link number and distance from beginning of link
this can be more useful than the (x,y) coordinates of points since it links the points to a location on the network

this approach provides an answer to the problem of assigning attributes to parts of links
keep such entities (houses, tunnels) in separate tables, link them to the network by link number and distance from beginning of link
need one distance for point entities, two for extended entities like tunnels (start and end locations)
the GIS can then compute the (x,y) coordinates of the entities if needed

links need not be permanently split in this scheme

D. AREA DATA

is represented on area class maps, choropleth maps

boundaries may be defined by natural phenomena, e.g. lake, or by man, e.g. forest stands, census zones

there are several types of areas that can be represented

1. Environmental/natural resource zones

examples include
land cover data - forests, wetlands, urban
geological data - rock types
forestry data - forest "stands", "compartments"
soil data - soil types

boundaries are defined by the phenomenon itself
e.g. changes of soil type

almost all junctions are 3-valent

2. Socio-economic zones

includes census tracts, ZIP codes, etc.

boundaries defined independently of the phenomenon, then attribute values are enumerated

boundaries may be culturally defined, e.g. neighborhoods

3. Land records
land parcel boundaries, land use, land ownership, tax information

Areal coverage

overhead - Areal coverage
1. entities are isolated areas, possibly overlapping

any place can be within any number of entities, or none
e.g. areas burned by forest fires
areas do not exhaust the space

2. any place is within exactly one entity
areas exhaust the space
every boundary line separates exactly two areas, except for the outer boundary of the mapped area
areas may not overlap

any layer of the first type can be converted to one of the second type
each area may now have any number of fire attributes, depending on how many times it has been burned - unburned areas will have none

Holes and islands

areas often have "holes" or areas of different attributes wholly enclosed within them
diagram

the database must be able to deal with these correctly
this has not always been true of GIS products

cases can be complex, for example:
Lake Huron is a "hole" in the North American landmass
Manitoulin Island is a "hole" in Lake Huron
Manitoulin Island has several large lakes, including one which is the largest lake on an island in a lake anywhere
some of these lakes have islands in them

some systems allow area entities to have islands
more than one primitive single-boundary area can be grouped into an area object
e.g. the area served by a school or shopping center may have more than one island, but only one set of attributes

diagram

SPATIAL OBJECTS AND DATABASE MODELS
E. REPRESENTATION OF CONTINUOUS SURFACES

examples of continuous surfaces are:
elevation (as part of topographic data)
rainfall, pressure, temperature
population density

potential must exist for sampling observations everywhere on an interval/ratio level

General nature of surfaces

critical points
peaks and pits - highest and lowest points
ridge lines, valley bottoms - lines across which slope reverses suddenly
passes - convergence of 2 ridges and 2 valleys

faults - sharp discontinuities of elevation - cliffs

fronts - sharp discontinuities of slope

slopes and aspects can be derived from elevations

Data structures for representing surfaces

traditional data models do not have a method for representing surfaces
therefore, surfaces are represented by the use of points, lines or areas

note: the following series of three overheads on Tiefort Mountains all represent the same area
1. points - grid of elevations overhead - Elevation represented as points

DEM or Digital Elevation Model
based on sampling the elevation surface at regular intervals
result is a matrix of points
much digital elevation data available in this form

2. lines - digitized contours overhead - Elevation represented as lines
from DLG hypsography layer, identical to those on the printed map, plotted directly from stereo photography
based on string object type
a line connecting sampled points of equal elevation
elevation is attribute
could be done for rainfall, barometric pressure etc.

3. areas - TIN (Triangulated irregular network) overhead - Triangulation of a terrain surface
overhead - Elevation represented as areas

note: perspective diagram is developed from the triangulated surface (TIN created by M.P. Kumler, USGS)
sample points often located at peaks, pits, along ridges and valleys
sampling can be varied depending on ruggedness of the surface
a very efficient way of representing topography
result is TIN composed of nodes, lines and triangular faces

Spatial interpolation

frequently when using continuous data we wish to estimate values at specific locations which are not part of the point, line or area dataset
these values must be determined from the surrounding values using techniques of spatial interpolation (see Units 40 and 41)
e.g. to interpolate contours, a regular grid is often interpolated from an irregular scatter of points or densified from a sparse grid

REFERENCES
Burrough, P. A., 1986. Geographical Information Systems for Land Resources Assessment, Clarendon Press, Oxford. See chapter 2 for a review of database models.

Dueker, K. J., 1987. "Geographic Information Systems and Computer-Aided Mapping," American Planning Association Journal, Summer 1987:383-390. Compares database models in GIS and computer mapping.

Mark, D.M., 1978. "Concepts of Data Structure for Digital Terrain Models," Proceedings of the Digital Terrain Models (DTM) Symposium, ASP and ACSM, pp. 24-31. A comprehensive discussion of DEM database models.

Marx, R. W., 1986. "The TIGER System: Automating the Geographic Structure of the United States Census," Government Publications Review 13:181-201. Issues in the selection of a database model for TIGER.

Nyerges, T. L. and K. J. Dueker, 1988. Geographic Information Systems in Transportation, Federal Highway Administration, Division of Planning, Washington, D. C. Database model concerns in transportation applications of GIS.

Peuquet, D.J., 1984. "A conceptual framework and comparison of spatial data models," Cartographica 21(4):66-113. An

excellent review of the various spatial data models used in GIS.

EXAM AND DISCUSSION QUESTIONS

1. How does a natural zone coverage differ from an enumeration zone coverage? Describe the differences in terms of (a) application areas, (b) visual appearance, (c) compilation of data.

2. Compare the various data models for elevation data. Which would you expect to be best for (a) a landscape dominated by fluvial erosion and dendritic drainage patterns, (b) a glaciated landscape, (c) a barometric weather map with fronts, (d) a map of population densities for North America.

3. What data models might be needed in a system to monitor oil spills and potential environmental damage to coastlines? Give examples of appropriate spatial objects and associated attributes.

4. Describe the differences between the data models commonly used in remote sensing, computer assisted design, automated cartography

RELATIONSHIPS AMONG SPATIAL OBJECTS
A. INTRODUCTION
Three types of relationship

B. EXAMPLES OF SPATIAL RELATIONSHIPS
Point-point
Point-line
Point-area
Line-line
Line-area
Area-area

C. CODING RELATIONSHIPS AS ATTRIBUTES
Example - "flows into" relationship
Example - "is contained in" relationship

D. OBJECT PAIRS

E. CARTOGRAPHIC AND TOPOLOGICAL DATABASES
Strict definition of "topological"
Usage of "topological" in GIS

F. PLANAR ENFORCEMENT
Process
Objective

G. RELATIONSHIPS IN RASTER SYSTEMS

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

This final unit in the spatial databases module looks at the complex issue of relationships and how they can be coded. The important concept of planar enforcement, introduced here, is referred to several times in later units.

UNIT 12 - RELATIONSHIPS AMONG SPATIAL OBJECTS

Compiled with assistance from Gerald White, California State University, Sacramento

A. INTRODUCTION

there are a vast number of possible relationships in spatial data

many are important in analysis
e.g. "is contained in" relationship between a point and an area is important in relating objects to their surrounding environment
e.g. "intersects" between two lines is important in analyzing routes through networks

relationships can exist between entities of the same type or of different types
e.g. for each shopping center, can find the nearest shopping center (same type)
e.g. for each customer, can find the nearest shopping center (different types)

Three types of relationship

1. relationships which are used to construct complex objects from simple primitives
e.g. relationship between a line (chain) and the ordered set of points which defines it
e.g. relationship between an area (polygon) and the ordered set of lines which defines it

2. relationships which can be computed from the coordinates of the objects
e.g. two lines can be examined to see if they cross - the "crosses" relationship can be computed
e.g. areas can be examined to see which one encloses a given point - the "is contained in" relationship can be computed
e.g. areas can be examined to see if they overlap - the "overlaps" relationship

3. relationships which cannot be computed from coordinates - these must be coded in the database during input
e.g. we can compute if two lines cross, but not if the highways they represent intersect (may be an overpass)
some databases allow an entity called a "complex object", composed of "simple objects", e.g. objects representing "house", "lot", "cable", with
associated attributes might be grouped together logically as "account"

B. EXAMPLES OF SPATIAL RELATIONSHIPS
Point-point

"is within", e.g. find all of the customer points within 1 km of this retail store point

"is nearest to", e.g. find the hazardous waste site which is nearest to this groundwater well

Point-line

"ends at", e.g. find the intersection at the end of this street

"is nearest to", e.g. find the road nearest to this aircraft crash site

Point-area

"is contained in", e.g. find all of the customers located in this ZIP code boundary

"can be seen from", e.g. determine if any of this lake can be seen from this viewpoint

Line-line

"crosses", e.g. determine if this road crosses this river

"comes within", e.g. find all of the roads which come within 1 km of this railroad

"flows into", e.g. find out if this stream flows into this river

Line-area

"crosses", e.g. find all of the soil types crossed by this railroad

"borders", e.g. find out if this road forms part of the boundary of this airfield

Area-area

"overlaps", e.g. identify all overlaps between types of soil on this map and types of land use on this other map

"is nearest to", e.g. find the nearest lake to this forest fire

"is adjacent to", e.g. find out if these two areas share a common boundary

C. CODING RELATIONSHIPS AS ATTRIBUTES

in the database model we can visualize relationships as additional attributes

Example - "flows into" relationship

overhead - Coding relationships as attributes I

option A:
each stream link in a stream network could be given the ID of the downstream link which it flows into
flow could be traced from link to link by following pointers

option B
alternatively the network could be coded as two sets of entities - links and nodes
the links could "point" to their downstream node
the nodes could "point" to the next downstream link

Example - "is contained in" relationship

overhead - Coding relationships as attributes II

given:
locations of 4 wells, with attributes of depth and flow
wells lie in two different counties with attributes of population

we wish to determine how much flow is available in each county:
1. find the containing county of each well (compute the "is contained in" relationship)

store the result as a new attribute, County, of each well

2. using this revised attribute table, total flow by county and add results to the county table
County Population Flow A 20,000 4,500 B 35,000 5,500

RELATIONSHIPS AMONG SPATIAL OBJECTS
. OBJECT PAIRS

distance is an attribute of a pair of objects

there are other types of information which are similarly attributes of pairs of objects
e.g. flow of commuters between a suburb and downtown
e.g. trade between two countries
e.g. flow of groundwater between a sink and a spring

in some cases these attributes can be attached to an object linking the origin and destination objects
e.g. on a map, trade can be an attribute of an arrow connecting the two countries
thick arrows indicate strong trade
however, such maps quickly become impossibly complex

in general, it is necessary to allow for information which is not an attribute of any one object but of a pair of objects, including:
distance
connectedness - yes or no
flow of goods, trade
number of trips

such attributes cannot necessarily be ascribed to any real object
e.g. commuting flows between a suburb and downtown are not necessarily attributes of any set of links in the transport network
e.g. flow of groundwater between a sink and a spring does not necessarily follow any aquifer or conduit

these are attributes of object pairs

object pairs are important in various kinds of spatial analysis using GIS

attributes of object pairs can be thought of as tables which have one object as rows and the other object as columns with the values in each cell representing the value of the interaction between them
are many different terms for the implementation of this concept - e.g. interaction matrix, turn table, Cartesian product

E. CARTOGRAPHIC AND TOPOLOGICAL DATABASES
Strict definition of "topological"

if a map is stretched and distorted, some of its properties change, including:
distances
angles
relative proximities

other properties remain constant, including:
adjacencies
most other relationships, such as "is contained in", "crosses"
types of spatial objects - areas remain areas, lines remain lines, points remain points

strictly, topological properties are those which remain unchanged after distortion

Usage of "topological" in GIS

a spatial database is often called "topological" if one or more of the following relationships have been computed and stored
connectedness of links at intersections
ordered set of lines (chains) forming each polygon boundary
adjacency relationships between areas

unfortunately the precise meaning of the term has become distorted by use

in general, "topological" implies that certain relationships are stored, making the data more useful for various kinds of spatial analysis

by contrast, a database is called "cartographic" if the above conditions are absent
objects can be manipulated individually

relationships between them are unavailable or are considered unimportant

cartographic databases are less useful for analysis of spatial data
however they are satisfactory for simple mapping of data
many packages designed for mapping only use cartographic database models
a cartographic database can usually be converted to a topological database by computing relationships - the process of "building topology" through planar enforcement

F. PLANAR ENFORCEMENT

objects and their attributes are capable of describing the conditions existing on a map or in reality

variation of a single property like soil type or elevation over a mapped area is achieved by including appropriate attributes for entity types
e.g. elevation described by giving attributes to elevation points
e.g. soil type described by giving attributes to areas

in cases like soil type, the objects used to describe spatial variation must obey certain simple rules
e.g. two areas cannot overlap
e.g. every place must be within exactly one area, or on a boundary

these rules are collectively referred to as planar enforcement
a set of objects obeying these rules is said to be planar enforced

planar enforcement is a very important operation in a vector GIS

Process

begin with a number of unrelated line segments
imagine a number of limp spaghetti noodles lying on a table

the following elements are now defined (terminology from the US Census Bureau for development of digital spatial database concepts): overhead - Planar enforcement
a 0-cell (or node) is identified wherever two noodles cross or a noodle terminates
i.e. all intersections are calculated
1-cell (or link, also "chain", "arc", "edge") is identified for each length of noodle between two consecutive 0-cells (nodes)
a 2-cell (or area, also "face", "polygon") is defined for each group of consecutive 1-cells forming an enclosed area that does not contain any 1-cells that are not part of the boundary
note that these definitions relate directly to the ordinary concept of dimensionality

the results are:
0-cells are either isolated ("points") or adjacent to one or more 1-cells ("nodes")
all 1-cells end in exactly two 0-cells
each line segment (chain) between adjacent 0-cells is assigned to exactly one 1-cell
all 1-cells lie between exactly two 2-cells
every place on the "map" between noodles is assigned to a single 2-cell (the rest of the world is a 2- cell as well, often given the ID zero)

Objective

planar enforcement is used to build objects out of digitized lines (hence the phrase "building topology")

it is a consistent and precise approach to the problem of making meaningful objects out of groups of lines

simple rules can be used to correct some digitizing errors:
a very short 1-cell terminating in a 1-valent 0-cell indicates an overshoot diagram

a long 1-cell terminating in a 1-valent 0-cell very close to another 1-cell indicates an undershoot diagram

planar enforcement is often needed when a set of data is being imported from another system
e.g. if the source is a cartographic database and needs to have relationships computed
e.g. if the database models of the two systems are incompatible, data is transferred as unrelated noodles, then objects are rebuilt

planar enforcement must be applied one layer at a time

planar enforcement concepts are built into many systems

G. RELATIONSHIPS IN RASTER SYSTEMS

in general, it is easier to work with relationships in vector systems
the concept of object is not as natural for raster systems, which model the world as composed of pixels

however, relationships can be handled in raster systems with simple techniques:
overhead - Relationships in raster systems

e.g. a map of county boundaries
in one layer each pixel has a county code attribute which is an ID pointing to an entry in a county attribute table

in a second layer each well location is coded by giving the appropriate pixel an ID pointing to a well attribute table
the "is contained in" relationship can be computed by an overlay operation and stored as an additional column in the well attribute table

only a few raster systems contain this type of capability to extract relationships into attribute tables
most do not handle relationships between spatial objects

REFERENCES
Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment. Clarendon, Oxford. Chapter 2 describes objects, attribute tables and relationships.

Goodchild, M.F., 1988. "Towards an enumeration and classification of GIS functions," Proceedings, IGIS ''87. NASA, Washington DC 2:67-77. Defines and discusses object pairs.

Keating, T., W. Phillips and K. Ingram, 1987. "An integrated topologic database design for geographic information systems," Photogrammetric Engineering and Remote Sensing Vol. 53. Good discussion of topological and cartographic database models.

EXAM AND DISCUSSION QUESTIONS

1. Discuss the use of planar enforcement for street networks, and the problems presented by overpasses and underpasses. Can you modify the basic rules to maintain consistency but allow for such instances?

2. What additional examples of relationships can you devise in each of the six categories used in section B?

3. Why have designers of raster GIS not commonly devised ways of coding spatial relationships between objects in their systems? Is this likely to change in the future, and if so, why?

4. "Topology is what distinguishes GIS from automated cartography". Discuss.