| |||
DATABASE CONCEPTS I A. INTRODUCTION Two ways to use DBMS within a GIS GIS as a database problem B. CONCEPTS IN DATABASE SYSTEMS Definition Advantages of a database approach Views of the database C. DATABASE MANAGEMENT SYSTEMS Components Types of database systems D. HIERARCHICAL MODEL Summary of features Advantages and disadvantages E. NETWORK MODEL Restrictions Summary F. RELATIONAL MODEL Terminology Examples of relations Keys Normalization Advantages and disadvantages REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES UNIT 43 - DATABASE CONCEPTS I Compiled with assistance from Gerald White, California State University, Sacramento A. INTRODUCTION very early attempts to build GIS began from scratch, using very limited tools like operating systems and compilers more recently, GIS have been built around existing database management systems (DBMS) purchase or lease of the DBMS is a major part of the system''s software cost the DBMS handles many functions which would otherwise have to be programmed into the GIS any DBMS makes assumptions about the data which it handles to make effective use of a DBMS it is necessary to fit those assumptions certain types of DBMS are more suitable for GIS than others because their assumptions fit spatial data better Two ways to use DBMS within a GIS 1. Total DBMS solution all data are accessed through the DBMS, so must fit the assumptions imposed by the DBMS designer 2. Mixed solution some data (usually attribute tables and relationships) are accessed through the DBMS because they fit the model well some data (usually locational) are accessed directly because they do not fit the DBMS model GIS as a database problem some areas of application, notably facilities management: deal with very large volumes of data often have a DBMS solution installed before the GIS is considered the GIS adds geographical access to existing methods of search and query such systems require very fast response to a limited number of queries, little analysis in these areas it is often said that GIS is a "database problem" rather than an algorithm, analysis, data input or data display problem B. CONCEPTS IN DATABASE SYSTEMS Definition a database is a collection of non-redundant data which can be shared by different application systems stresses the importance of multiple applications, data sharing the spatial database becomes a common resource for an agency implies separation of physical storage from use of the data by an application program, i.e. program/data independence the user or programmer or application specialist need not know the details of how the data are stored such details are "transparent to the user" changes can be made to data without affecting other components of the system. e.g. change format of data items (real to integer, arithmetic operations) change file structure (reorganize data internally or change mode of access) relocate from one device to another, e.g. from optical to magnetic storage, from tape to disk Advantages of a database approach reduction in data redundancy shared rather than independent databases reduces problem of inconsistencies in stored information, e.g. different addresses in different departments for the same customer maintenance of data integrity and quality data are self-documented or self-descriptive information on the meaning or interpretation of the data can be stored in the database, e.g. names of items, metadata avoidance of inconsistencies data must follow prescribed models, rules, standards reduced cost of software development many fundamental operations taken care of, however DBMS software can be expensive to install and maintain security restrictions database includes security tools to control access, particularly for writing Views of the database overhead - Views of the database the database can present different views of itself to users, programmers these are built and maintained by the database administrator (DBA) the internal data representation (internal view) is normally not seen by the user or applications programmer the conceptual view or conceptual schema is the primary means by which the DBA builds and manages the database the DBMS can present multiple views of the conceptual schema to programmers and users, depending on the application these are called external views or schemas overhead - Water district database C. DATABASE MANAGEMENT SYSTEMS Components Data types includes: integer (whole numbers only) real (decimal) character (alphabetic and numeric characters) date more advanced systems may include pictures and images as data types e.g. a database of buildings for the fire department which stores a picture as well as address, number of floors, etc. Standard operations e.g. sort, delete, edit, select records Data definition language (DDL) the language used to describe the contents of the database e.g. attribute names, data types - "metadata" Data manipulation and query language the language used to form commands for input, edit, analysis, output, reformatting etc. some degree of standardization has been achieved with SQL (Standard Query Language) Programming tools besides commands and queries, the database should be accessible directly from application programs through e.g. subroutine calls File structures the internal structures used to organize the data Types of database systems several models for databases: tabular ("flat file") - data in a single table hierarchical network relational the hierarchical, network and relational models all try to deal with the same problem with tabular data: inability to deal with more than one type of object, or with relationships between objects e.g. database may need to handle information on aircraft, crew, flights and passengers - four types of records with different attributes, but with relationships between them (e.g. "is booked on" between passenger and flight) database systems originated in the late 1950s and early 1960s largely by research and development of IBM Corporation most developments were responses to needs of business, military, government and educational institutions - complex organizations with complex data and information needs trend through time has been increasing separation between the user and the physical representation of the data - increasing "transparency" DATABASE CONCEPTS I D. HIERARCHICAL MODEL early 1960s, IBM saw business world organizing data in the form of a hierarchy rather than one record type (flat file), a business has to deal with several types which are hierarchically related to each other e.g. company has several departments, each with attributes: name of director, number of staff, address each department requires several parts to make its product, with attributes: part number, number in stock each part may have several suppliers, with attributes: address, price diagram certain types of geographical data may fit the hierarchical model well e.g. Census data organized by state, within state by city, within city by census tract diagram the database keeps track of the different record types, their attributes, and the hierarchical relationships between them the attribute which assigns records to levels in the database structure is called the key (e.g. is record a department, part or supplier?) Summary of features a set of record "types" e.g. supplier record type, department record type, part record type a set of links connecting all record types in one data structure diagram (tree) at most one link between two record types, hence links need not be named for every record, there is only one parent record at the next level up in the tree e.g. every county has exactly one state, every part has exactly one department no connections between occurrences of the same record type cannot go between records at the same level unless they share the same parent diagram Advantages and disadvantages data must possess a tree structure tree structure is natural for geographical data data access is easy via the key attribute, but difficult for other attributes in the business case, easy to find record given its type (department, part or supplier) in the geographical case, easy to find record given its geographical level (state, county, city, census tract), but difficult to find it given any other attribute e.g. find the records with population 5,000 or less tree structure is inflexible cannot define new linkages between records once the tree is established e.g. in the geographical case, new relationships between objects cannot define linkages laterally or diagonally in the tree, only vertically the only geographical relationships which can be coded easily are "is contained in" or "belongs to" DBMSs based on the hierarchical model (e.g. System 2000) have often been used to store spatial data, but have not been very successful as bases for GIS E. NETWORK MODEL developed in mid 1960s as part of work of CODASYL (Conference on Data Systems Languages) which proposed programming language COBOL (1966) and then network model (1971) other aspects of database systems also proposed at this time include database administrator, data security, audit trail objective of network model is to separate data structure from physical storage, eliminate unnecessary duplication of data with associated errors and costs uses concept of a data definition language, data manipulation language uses concept of m:n linkages or relationships an owner record can have many member records a member record can have several owners hierarchical model allows only 1:n example of a network database a hospital database has three record types: patient: name, date of admission, etc. doctor: name, etc. ward: number of beds, name of staff nurse, etc. need to link patients to doctor, also to ward doctor record can own many patient records patient record can be owned by both doctor and ward records network DBMSs include methods for building and redefining linkages, e.g. when patient is assigned to ward Restrictions links between records of the same type are not allowed while a record can be owned by several records of different types, it cannot be owned by more than one record of the same type (patient can have only one doctor, only one ward) Summary the network model has greater flexibility than the hierarchical model for handling complex spatial relationships it has not had widespread use as a basis for GIS because of the greater flexibility of the relational model F. RELATIONAL MODEL the most popular DBMS model for GIS the INFO in ARC/INFO EMPRESS in System/9 several GIS use ORACLE several PC-based GIS use DBase III flexible approach to linkages between records comes closest to modeling the complexity of spatial relationships between objects proposed by IBM researcher E.F. Codd in 1970 more of a concept than a data structure internal architecture varies substantially from one RDBMS to another Terminology each record has a set of attributes the range of possible values (domain) is defined for each attribute records of each type form a table or relation each row is a record or tuple each column is an attribute note the potential confusion - a "relation" is a table of records, not a linkage between records the degree of a relation is the number of attributes in the table 1 attribute is a unary relation 2 attributes is a binary relation n attributes is an n-ary relation Examples of relations unary: COURSES(SUBJECT) binary: PERSONS(NAME,ADDRESS) OWNER(PERSON NAME,HOUSE ADDRESS) ternary: HOUSES(ADDRESS,PRICE,SIZE) Keys a key of a relation is a subset of attributes with the following properties: unique identification the value of the key is unique for each tuple nonredundancy no attribute in the key can be discarded without destroying the key''s uniqueness e.g. phone number is a unique key in a phone directory in the normal phone directory the key attributes are last name, first name, street address if street address is dropped from this key, the key is no longer unique (many Smith, John''s) a prime attribute of a relation is an attribute which participates in at least one key all other attributes are non-prime Normalization concerned with finding the simplest structure for a given set of data deals with dependence between attributes avoids loss of general information when records are inserted or deleted overhead - Normalization consider the first relation (prime attribute underlined): this is not normalized since PRICE is uniquely determined by STYLE problems of insertion and deletion anomalies arise the relationship between ranch and 50000 is lost when the last of the ranch records is deleted a new relationship (triplex costing 75000) must be inserted when the first triplex record occurs consider the second relation: here there are two relations instead of one one to establish style for each builder the other price for each style several formal types of normalization have been defined - this example illustrates third normal form (3NF), which removes dependence between non-prime attributes although normalization produces a consistent and logical structure, it has a cost in increased storage requirements some GIS database administrators avoid full normalization for this reason a relational join is the reverse of this normalization process, where the two relations HOMES2 and COST are combined to form HOMES1 Advantages and disadvantages the most flexible of the database models no obvious match of implementation to model - model is the user''s view, not the way the data is organized internally is the basis of an area of formal mathematical theory most RDBMS data manipulation languages require the user to know the contents of relations, but allow access from one relation to another through common attributes Example: Given two relations: PROPERTY(ADDRESS,VALUE,COUNTY_ID) COUNTY(COUNTY ID,NAME,TAX_RATE) to answer the query "what are the taxes on property x" the user would: retrieve the property record link the property and county records through the common attribute COUNTY_ID compute the taxes by multiplying VALUE from the property tuple with TAX_RATE from the linked county tuple REFERENCES Standard database texts: Date, G.J., 1987. An Introduction to Database Systems, Addison-Wesley, Reading, MA. Howe, D.R., 1983. Data Analysis for Data Base Design, Arnold, London. Kent, W., 1983. "A simple guide to five normal forms in relational database theory," Communications of the Association for Computing Machinery 26:120. Tsichritzis, D.C. and F.H. Lochovsky, 1977, Database Management Systems, Academic Press, New York. The relational model in GIS: van Roessel, J.W., 1987. "Design of a spatial data structure using the relational normal forms," International Journal of Geographical Information Systems 1:33-50. EXAM AND DISCUSSION QUESTIONS 1. Compare the four database models (flat file, hierarchical, network and relational) as bases for GIS. What particular features of the relational model account for its popularity? 2. Polygon overlay has been called a spatial analog of a relational join. Do you agree? 3. Summarize the arguments against organizing spatial databases as flat files. 4. Why do you think the term "relation" was chosen for a table of attributes in the relational model? DATABASE CONCEPTS II A. INTRODUCTION Databases for spatial data The relational model in GIS B. DATA SECURITY Integrity constraints Transactions C. CONCURRENT USERS Three types of concurrent access Checkout/checkin Determining extent of data locking Deadlock D. SECURITY AGAINST DATA LOSS E. UNAUTHORIZED USE Summary REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES UNIT 44 - DATABASE CONCEPTS II Compiled with assistance from Gerald White, California State University, Sacramento A. INTRODUCTION setting up and maintaining a spatial database requires careful planning, attention to numerous issues many GIS were developed for a research environment of small databases many database issues like security not considered important in many early GIS difficult to grow into an environment of large, production-oriented systems Databases for spatial data many different data types are encountered in geographical data, e.g. pictures, words, coordinates, complex objects very few database systems have been able to handle textual data e.g. descriptions of soils in the legend of a soil map can run to hundreds of words e.g. descriptions are as important as numerical data in defining property lines in surveying - "metes and bounds" descriptions variable length records are needed, often not handled well by standard systems e.g. number of coordinates in a line can vary this is the primary reason why some GIS designers have chosen not to use standard database solutions for coordinate data, only for attribute tables standard database systems assume the order of records is not meaningful in geographical data the positions of objects establish an implied order which is important in many operations often need to work with objects that are adjacent in space, thus it helps to have these objects adjacent or close in the database is a problem with standard database systems since they do not allow linkages between objects in the same record type (class) there are so many possible relationships between spatial objects, that not all can be stored explicitly however, some relationships must be stored explicitly as they cannot be computed from the geometry of the objects, e.g. existence of grade separation at street crossing the integrity rules of geographical data are too complex e.g. the arcs forming a polygon must link into a complete boundary e.g. lines cannot cross without forming a node effective use of non-spatial database management solutions requires a high level of knowledge of internal structure on the part of the user e.g. user may need to be aware that polygons are composed of arcs, and stored as arc records, cannot treat them simply as objects and let the system take care of the internal structure users are required to have too much knowledge of the database model, cannot concentrate on knowledge of the problem users may have to use complex commands to execute processes which are conceptually simple The relational model in GIS the relational model captures geographical reality through a set of tables (relations) linked by keys (common fields or attributes) each table contains a set of records (tuples) tables are normalized to minimize redundancy of information, maximize integrity in general, the relational model is a convenient way to represent reality each table corresponds to a set of real-world features with common types of attributes the user needs to know which features are stored in which tables however the relational model has certain deficiencies for spatial data many implementations (e.g. ARC/INFO) store only the attribute tables in the relational model, since it is less straightforward to store the geometrical descriptions of objects - such systems have been called "hybrid" most spatial operations are not part of the standard query language of RDBMSs, e.g. find objects within a user-defined polygon, e.g. overlay, e.g. buffer zone generation the relational model does not deal easily and efficiently with the concept of complex objects (objects formed by aggregating simple objects) - this concept is more compatible with the hierarchical data model DATABASE CONCEPTS II DATA SECURITY many systems for small computers, and systems specializing in geometric and geographical data, do not provide functionality necessary to maintain data integrity over long periods of time Integrity constraints integrity constraints are rules which the database must obey in order to be meaningful attribute values must lie within prescribed domains relationships between objects must not conflict, e.g. "flows into" relationship between river segments must agree with "is fed by" relationship locational data must not violate rules of planar enforcement, contours must not cross each other, etc. Transactions transactions may include: modifications to individual data items addition or deletion of entire records addition or deletion of attributes changes in schema (external views of the database) e.g. addition of new tables or relations, redefinition of access keys all of the updates or modifications made by a user are temporary until confirmed system checks integrity before permanently modifying the database ("posting" the changes to the database) updates and changes can be abandoned at any time prior to final confirmation C. CONCURRENT USERS in many cases more than one user will need to access the database at any one time this is a major advantage of multi-user systems and networks however, if the database is being modified by several users at once, it is easy for integrity constraints to be violated unless adequate preventative measures exist changes may interfere and produce loss of integrity e.g. user B may change an object while user A is processing it the results will not be valid for either the old or the new version of the object e.g. a dispatching system operator A receives a fire call, sends a request to fire station 1 to dispatch a vehicle, waits for fire station to confirm operator B receives a fire call after A''s call but before A confirms the dispatch result may be that both A and B request a dispatch of the same fire truck solution should be to "lock" the first request until confirmed automatic control of concurrent use is based on the transaction concept the database is modified only at the end of a transaction concurrent users never see the effects of an incomplete transaction interference between two concurrent users is resolved at the transaction level Three types of concurrent access unprotected - applications may retrieve and modify concurrently in practice, no system allows this, but if one did, system should provide a warning that other users are accessing the data protected - any application may retrieve data, but only one may modify it e.g. user B should be able to query the status of fire trucks even after user A has placed a "hold" on one exclusive - only one application may access the data Checkout/checkin in GIS applications, digitizing and updating spatial objects may require intensive work on one part of the database for long periods of time e.g. digitizer operator may spend an entire shift working on one map sheet work will likely be done on a workstation operating independently of the main database because of the length of transactions, a different method of operation is needed at beginning of shift, operator "checks out" an area from the database at end of work, the same area is "checked in", modifying and updating the database while an area is checked out, it should be "locked" by the main database this will allow other users to read the data, but not to check it out themselves for modification this resolves problems which might occur e.g. user A checks out a sheet at 8:00 am and starts updating user B checks out the same sheet at 9:00 am and starts a different set of updates from the same base if both are subsequently allowed to check the sheet back in, then the second checkin may try to modify an object which no longer exists the area is unlocked when the new version is checked in and modifies the database the amount of time required for checkout and checkin must be no more than a small part of a shift Determining extent of data locking how much data needs to be locked during a transaction? changing one item may require other changes as well, e.g. in indexes in principle all data which may be affected by a transaction should be locked it may be difficult to determine the extent of possible changes e.g. in a GIS user is modifying a map sheet because objects on the sheet are "edgematched" to objects on adjacent sheets, contents of adjacent sheets may be affected as well e.g. if a railroad line which extends to the edge of the mapsheet is deleted, should its continuation on the next sheet be affected? if not, the database will no longer be effectively edgematched should adjacent sheets also be locked during transaction? levels of data locking: entire database level "view" level lock only those parts of the database which are relevant to the application''s view record type level lock an entire relation or attribute table record occurrence level lock a single record data item level lock only one data item Deadlock is when a request cannot continue processing normally results from incremental acquisition of resources e.g. request A gets resource 1, request B gets resource 2 request A now asks for resource 2, B asks for resource 1 A and B will wait for each other unless there is intervention e.g. user A checks out an area from a spatial database, thereby locking the contents of the area and related contents user B now attempts a checkout - some of the contents of the requested area have already been locked by A therefore, the system must unlock all of B''s requests and start again - B will wait until A is finished this allows other users who need the items locked by B to proceed however, this can lead to endless alternating locking attempts by B and another user - the "accordion" effect as they encounter collisions and withdraw it can be very difficult for a DBMS to sense these effects and deal with them D. SECURITY AGAINST DATA LOSS the cost of creating spatial databases is very high, so the investment must be protected against loss loss might occur because of hardware or software failure operations to protect against loss may be expensive, but the cost can be balanced against the value of the database because of the consequences of data loss in some areas (air traffic control, bank accounts) very secure systems have been devised the database must be backed up regularly to some permanent storage medium, e.g. tape all transactions since the last backup must be saved in case the database has to be regenerated unconfirmed transactions may be lost, but confirmed ones must be saved two types of failure: interruption of the database management system because of operating errors, failure of the operating system or hardware, or power failures these interruptions occur frequently - once a day to once a week contents of main memory are lost, system must be "rebooted" contents of database on mass storage device are usually unaffected loss of the storage medium, due to operating or hardware defects ("head crashes"), or interruption during transaction processing these occur much less often, slower recovery is acceptable database is regenerated from most recent backup, plus transaction log if available E. UNAUTHORIZED USE some GIS data is confidential or secret, e.g. tax records, customer lists, retail store performance data contemporary system interconnections make unauthorized access difficult to prevent e.g. "virus" infections transmitted through communication networks different levels of security protection may be appropriate to spatial databases: keeping unauthorized users from accessing the database - a function of the operating system limiting access to certain parts of the database e.g. census users can access counts based on the census, but not the individual census questionnaires (note: Sweden allows access to individual returns) restricting users to generalized information only e.g. products from some census systems are subjected to random rounding - randomly changing the last digit of all counts to 0 or 5 - to protect confidentiality Summary flexibility, complexity of many GIS applications often makes it difficult to provide adequate security REFERENCES Standard database texts listed under unit 43 Abel, D.J., 1989. "SIRO-DBMS: a database tool-kit for geographical information systems," International Journal of Geographical Information Systems 3:103-116. An extension of the relational model for spatial data. Frank, A.U., 1984. "Requirements for database systems suitable to manage large spatial databases," Proceedings, International Symposium on Spatial Data Handling, University of Zurich, pp. 38-60. Nyerges, T.L., 1989. "Schema integration analysis for the development of GIS databases," International Journal of Geographical Information Systems 3:153-184. Looks at formal procedures for comparing and merging spatial database schemas. EXAM AND DISCUSSION QUESTIONS 1. In what ways are the database issues of GIS different from those of databases generally? 2. What is meant by data integrity in a spatial database? Give examples. 3. Give examples of the ways in which the integrity of a spatial database can degrade without adequate access controls. 4. Examine the database access controls which exist in any GIS to which you have access. Would they be adequate for a large, production-oriented agency application? ACCURACY OF SPATIAL DATABASES A. INTRODUCTION B. DEFINITIONS Accuracy Precision Components of data quality B. POSITIONAL ACCURACY How to test positional accuracy? C. ATTRIBUTE ACCURACY How to test attribute accuracy? How to summarize the matrix? E. LOGICAL CONSISTENCY F. COMPLETENESS G. LINEAGE H. ERROR IN DATABASE CREATION Positional measurement error Attribute errors Compilation errors Processing errors I. DATA QUALITY REPORTS USGS British Ordnance Survey US National standards REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES UNIT 45 - ACCURACY OF SPATIAL DATABASES Compiled with assistance from Nicholas R. Chrisman, University of Washington and Matt McGranaghan, University of Hawaii A. INTRODUCTION the course thus far has looked at technical issues in: georeferencing, i.e. describing locations data structures - how to create digital representations of spatial data algorithms - how to process these digital representations to generate useful results among other technical issues in GIS, accuracy is perhaps the most important - it covers concerns for data quality, error, uncertainty, scale, resolution and precision in spatial data and affects the ways in which it can be used and interpreted all spatial data is inaccurate to some degree but it is generally represented in the computer to high precision need to consider: how well do these digital structures represent the real world? how well do algorithms compute the true values of products? B. DEFINITIONS Accuracy defined as the closeness of results, computations or estimates to true values (or values accepted to be true) since spatial data is usually a generalization of the real world, it is often difficult to identify a true value, and we work instead with values which are accepted to be true e.g., in measuring the accuracy of a contour in a digital database, we compare to the contour as drawn on the source map, since the contour does not exist as a real line on the surface of the earth the accuracy of the database may have little relationship to the accuracy of products computed from the database e.g. the accuracy of a slope, aspect or watershed computed from a DEM is not easily related to the accuracy of the elevations in the DEM itself Precision defined as the number of decimal places or significant digits in a measurement precision is not the same as accuracy - a large number of significant digits doesn''t necessarily indicate that the measurement is accurate a GIS works at high precision, mostly much higher than the accuracy of the data itself since all spatial data are of limited accuracy, inaccurate to some degree, the important questions are: how to measure accuracy how to track the way errors are propagated through GIS operations how to ensure that users don''t ascribe greater accuracy to data than it deserves Components of data quality recently a National Standard for Digital Cartographic Data (see reference) was developed by a coordinated national effort in the US this is a standard model to be used for describing digital data accuracy similar standards are being adopted in other countries this standard identifies several components of data quality: positional accuracy attribute accuracy logical consistency completeness lineage each of these will now be examined B. POSITIONAL ACCURACY defined as the closeness of locational information (usually coordinates) to the true position conventionally, maps are accurate to roughly one line width or 0.5 mm equivalent to 12 m on 1:24,000, or 125 m on 1:250,000 maps within a database, a typical UTM coordinate pair might be: Easting 579124.349m Northing 5194732.247m if the database was digitized from a 1:24,000 sheet, the last four digits in each coordinate (units, tenths, hundredths and thousandths) would be spurious How to test positional accuracy? use an independent source of higher accuracy find a larger scale map use the Global Positioning System (GPS) use raw survey data use internal evidence unclosed polygons, lines which overshoot or undershoot junctions, are indications of inaccuracy - the sizes of gaps, overshoots and undershoots may be used as a measure of positional accuracy compute accuracy from knowledge of the errors introduced by different sources, e.g 1 mm in source document 0.5 mm in map registration for digitizing 0.2 mm in digitizing if sources combine independently, we can get an estimate of overall accuracy by summing the squares of each component and taking the square root of the sum: (12 + 0.52 + 0.22)0.5 = 1.14 mm C. ATTRIBUTE ACCURACY defined as the closeness of attribute values to their true value note that while location does not change with time, attributes often do attribute accuracy must be analyzed in different ways depending on the nature of the data for continuous attributes (surfaces) such as on a DEM or TIN: accuracy is expressed as measurement error e.g. elevation accurate to 1 m for categorical attributes such as classified polygons: are the categories appropriate, sufficiently detailed and defined? gross errors, such as a polygon classified as A when it should have been B, are simple but unlikely e.g. land use is shopping center instead of golf course more likely the polygon will be heterogeneous: e.g. vegetation zones where the area may be 70% A and 30% B worse, A and B may not be well-defined, may not be able to identify the class clearly as A or B e.g. soils classifications are typically fuzzy at the center of the polygon, may be confident that the class is A, but more like B at the edges How to test attribute accuracy? prepare a misclassification matrix as follows: take a number of randomly chosen points determine the class according to the database then determine the class in the field by ground check complete the matrix: Class in Class on ground database A B C D A . . . . B . . . . C . . . . D . . . . ideally, want all points to lie on the diagonal of the matrix - this indicates that the same class was observed on the ground as is recorded in the database an error of omission occurs when a point''s class on the ground is incorrectly recorded in the database the number of class B points incorrectly recorded is the sum of column B row A, column B row C and column B row D, i.e. the number of points that are B on the ground but something else in the database that is, the column sum less the diagonal cell an error of comission occurs when the class recorded in the database does not exist on the ground e.g. the number of errors of comission for class A is the sum of row A column B, row A column C, row A column D, i.e. the points falsely recorded as A in the database that is, the row sum less the diagonal cell How to summarize the matrix? the percent of cases correctly classified is often used this is the percent of cases located in the diagonal cells of the matrix however, even in the worst case we would expect some cases in the diagonal cells by chance an index kappa (Cohen''s kappa) adjusts for this by subtracting the number expected by chance the number expected by chance in each diagonal cell is found by multiplying the appropriate row and column totals and dividing by the total number of cases overhead - Calculating Kappa then: k = (d-q)/(N-q) where d is the number of cases in diagonal cells q is the number of cases expected in diagonal cells by chance N is the total number of cases kappa is 1 for perfectly accurate data (all N cases on the diagonal), zero for accuracy no better than chance compare a map with a few large polygons to one with a large number of smaller polygons is it easier to get a high kappa in the first case? if so, is there a way of adjusting kappa to account for this difference? we expect attribute accuracy to vary over the map, so it would be useful to have an indication of the spatial variation in misclassification probability, not just a summary statistic the remaining aspects of data quality apply to the database as a whole, rather than to the objects, attributes or coordinates within it ACCURACY OF SPATIAL DATABASES E. LOGICAL CONSISTENCY refers to the internal consistency of the data structure, particularly applies to topological consistency is the database consistent with its definitions? if there are polygons, do they close? is there exactly one label within each polygon? are there nodes wherever arcs cross, or do arcs sometimes cross without forming nodes? F. COMPLETENESS concerns the degree to which the data exhausts the universe of possible items are all possible objects included within the database? affected by rules of selection, generalization and scale G. LINEAGE a record of the data sources and of the operations which created the database how was it digitized, from what documents? when was the data collected? what agency collected the data? what steps were used to process the data? precision of computational results is often a useful indicator of accuracy H. ERROR IN DATABASE CREATION error is introduced at almost every step of database creation what are these steps, and what kinds of error are introduced? Positional measurement error Geodetic Control and GPS the most accurate basis of absolute positional data is the geodetic control network, a series of points whose positions are known with high precision however, it is often difficult to tie a dataset to one of these high quality monuments global positioning systems is a powerful way of augmenting the geodetic network Aerial Photography and Satellite Imagery most positional data is derived from air photos here accuracy depends on the establishment of good control points data from remote sensing is more difficult to position accurately because of the size of each pixel Text Descriptions some positional data comes from text descriptions old surveys tied in to marks on trees boundary follows watershed, or midline of river this type of source is often of very poor positional accuracy Digitizing digitizers encode manuscript lines as sets of x-y coordinate pairs see Units 7 and 13 for introductions to digitizing resolution of coordinate data is dependent on mode of digitizing: point-mode digitizing operator specifically selects and encodes those points deemed "critical" to represent the geomorphology of the line or politically-significant coordinate pairs requires intelligence, knowledge about the line representation that will be needed stream-mode digitizing device automatically selects points on a distance or time parameter generally, an unnecessary high density of coordinate pairs is selected. two types of errors normally occur in stream-mode digitizing: physiological errors are caused by involuntary muscular spasms that tend to parallel the longitudinal axis of the centerline these errors are caused by agitations as the operator''s hand twitches and jerks when digitizing three specific types may be identified: spikes, switchbacks and polygonal knots (loops) diagram these are fairly simple to remove automatically software has been developed to clean the initial digital data of duplicate coordinate pairs and simple physiological errors a related problem in point mode digitizing is duplicate coordinate pairs which occur when the button is hit twice psychological errors are caused by psychomotor problems in line-following the digitizing operator either cannot see the line or cannot properly move the crosshairs along the line results in the diagonal line being displaced laterally from the intended position may also involve misinterpretation, too much generalization these are not easy to remove automatically in spite of the above, digitizing itself is not a major source of positional error it is not difficult for a digitizer operator to follow a line to an accuracy equal to the line''s width typical error in 0.5 mm range a common test of digitizing accuracy is to compare the original line with its digitized and plotted version, and to see if daylight can be seen between the two errors in registration and control points affect the entire dataset errors are also introduced because of poor stability of base material paper can shrink and stretch significantly (as much as 3%) with change in humidity Coordinate transformation coordinate transformation introduces error, particularly if the projection of the original document is unknown, or if the source map has poor horizontal control Attribute errors attributes are usually obtained through a combination of field collection and interpretation categories used in interpretation may not be easy to check in the field concepts of "diversity" and "old growth" used in current forest management practice are highly subjective attributes obtained from air photo interpretation or classified satellite images may have high error rates for social data, the major source of inaccuracy is undercounting e.g. in the Census undercount rates can be very high (>10%) in some areas and in some social groups Compilation errors common practices in map compilation introduce further inaccuracies: generalization aggregation line smoothing separation of features e.g. railroad moved on map so as not to overlap adjacent road however, many of these may also be seen as improving the usefulness and meaning of the data Processing errors processing of data produces error misuse of logic generalization and problems of interpretation mathematical errors accuracy lost due to low precision computations rasterization of vector data e.g., true line position is somewhere in the cell boundary cells may actually contain parts of all adjacent cells I. DATA QUALITY REPORTS because there are so many, diverse sources of error it is probably not possible to measure the error introduced at each step independently - the strategy of combining errors arithmetically probably won''t work USGS require that no more the 10% of the points tested be in error by more than 1/30 inch, measured at publication scale (scale >1:20,000) question are "How far out are the 10%?" "Where are the 10%?" e.g. in a particularly bad case, all of the 10% might be accounted for by one boundary line which is out by several inches British Ordnance Survey carry out an ongoing accuracy assessment and re-survey to verify a survey, a large number of points (typically n = 150 to 500 of a single type) are used to calculate: root mean total square displacement: e = / ( S (xi2) / n) where xi is displacement at each point i systematic error: s = S (xi)/n standard error: se = / (e2 - s2) if the error is "excessive", then the survey is carefully reviewed see Merchant (1987) for an implementation US National standards National Map Accuracy Standards from the Bureau of the Budget, 1947 not completed current standards developed by the National Committee for Digital Cartographic Data Standards chaired by Hal Moellering purpose: to set standards for compatibility of: definitions of cartographic objects interchange formats DATA QUALITY documentation dates: 1982 Jan NCDCDS formed 1985 Jan Interim Proposed Standard 1988 Jan Proposed Standard 1988 Testing in the field handout - Interim Proposed Standard for Digital Cartographic Data Quality (2 pages) REFERENCES Bureau of the Budget, 1947. National Map Accuracy Standards, Washington DC, GPO, reprinted in M.M.Thompson, 1979, Maps for America, USGS, Reston VA, p 104. Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment, Clarendon Press, Oxford. See pp. 103-135. DCDSTF, 1988. "The Proposed Standard for Digital Cartographic Data," The American Cartographer 15(1):entire issue. Federal Geodetic Control Committee, 1974. Classification, Standards of Accuracy, and General Specifications of Geodetic Control Surveys, Washington DC, GPO, 1980-0- 333-276 (also NOAA--S/T 81-29). Harley, J. B., 1975. Ordnance Survey Maps: A Descriptive Manual, Ordnance Survey, Southampton, England. Merchant, D.C., 1987. "Spatial accuracy specification for large scale topographic maps," Photogrammetric Engineering and Remote Sensing 53:958-61. Reports a recent effort by ASPRS to revise the US National Map Accuracy Standard. National Committee for Digital Cartographic Data Standards, Moellering, H., ed, 1985. Digital Cartographic Data Standards: An Interim Proposed Standard, Report #6. DISCUSSION AND EXAM QUESTIONS 1. Explain the difference between accuracy and precision, and show how these ideas apply to GIS. 2. "In manual map analysis, precision and accuracy are similar, but in GIS processing, precision frequently exceeds the accuracy of the data". Discuss 3. Design an experiment to measure the accuracy achieved by an agency in its digitizing operations. How would you measure the accuracy with which lines are being digitized? 4. What is meant by data lineage, and why is it important in understanding the accuracy of spatial databases? MANAGING ERROR A. ERROR PROPAGATION Example application Error analysis Sensitivity analysis B. ARTIFACTS OF ERROR Raster data Vector data Digitizing artifacts Strategies used to avoid problems: Polygon overlay artifacts C. STORING ACCURACY INFORMATION Raster data Vector data Positional uncertainty Attribute uncertainty REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES UNIT 46 - MANAGING ERROR A. ERROR PROPAGATION in GIS applications we combine data from different sources, with different levels of accuracy What impact does error in each data layer have on the final result? Example application Problem: find the best route for a power transmission corridor from a given origin point to a given destination point about 150 km away, across an area of the Midwest with comparatively high densities of agriculture and settlement the study area has been divided up into about 30,000 raster cells, each 500 m on a side have identified about 100 factors likely to influence the choice of route, including: agricultural productivity (dollars per hectare) settlement (presence or absence) existing rights of way for power lines (presence or absence) the 100 factors have been combined, or cascaded, to a single measure of suitability on a scale of 0 through 6 the cascading rules group factors into composites such as "social impact", "agricultural impact" and then weight each group against the others the rules used in cascading include weighted addition: suitability = w1x1 + w2x2 as well as simple conditions: suitability = 0 if settlement = "present" and reclassifications: suitability = 3 if x1 = A and x2 = d suitability = 4 if x1 = B and x2 = d Error analysis the effects of cascading on error will be complex do errors get worse, i.e. multiply? do errors cancel out? are errors in each layer independent or are they related? suppose two maps, each with percent correctly classified of 0.90 are overlaid studies have shown that the accuracy of the resulting map (percent of points having both of the overlaid classes) is little better than 0.90x0.90=0.81 when many maps are overlaid the accuracy of the resulting composite can be very poor however we''re more interested in the accuracy of the composite suitability index than in the overlaid attributes themselves for some types of operations the accuracy of suitability is determined by the accuracy of the least accurate layer this is true if reclassification and the and operator are used extensively, or if simple conditions are used based on inaccurate layers in other cases the accuracy of the result is significantly better than the accuracy of the least accurate layer this is true if weighted addition is used, or if reclassification uses the or operator e.g. suitability = 4 if x1 = A or x2 = d Sensitivity analysis how to determine the impact of inaccuracy on the results? two types of answers are needed: the impact of error on the suitability map the impact of error on the best route the answers will likely be very different it will also be useful to ask the question the other way: what accuracy is needed in each layer in order to produce a required level of accuracy in the result? sensitivity is the response of the result (suitability, or the route location) to a unit change in one of the inputs easy to see what a unit change means for agricultural productivity in dollars per acre, but what does it mean for vegetation class? sensitivity can be defined for: 1. the data inputs: how much does result change when data input changes? 2. the weights how much does result change when the weight given to a factor changes? error in determining weights may be just as important as error in the database may be better to use full observed range to test sensitivity i.e. response of the result to a change in one of the inputs from its minimum observed value to its maximum e.g. suppose one layer is settlement (present/absent) set the entire layer to settlement=present and recompute suitability and the best route then set the entire layer to settlement=absent and recompute the difference will be a measure of the sensitivity of the analysis to the settlement layer layers which are important but nevertheless do not show geographical variation over the study area will not have high sensitivity in this definition this serves to point up the distinction between sensitivity in principle and in practice a layer may be important in principle, but have no impact in this study area e.g. in principle the agricultural productivity layer may be very important in the decision framework, but if all the land is equally productive, then it will not be important in practice in practice, only a few layers (out of our original 100) will have much impact on the final route it is critical to know which these are in order to defend the methodology effectively (or to attack it!) must examine both the decision rules and the value ranges to determine which layers have the highest impact in the suitability product this information can be used in assessing the level of input accuracy that is needed e.g. if the additional accuracy will not change the results, it may be unnecessary to carry out costly detailed surveys can also use sensitivity analysis to assess the effects of uncertainty in the data compute the impact of values at each end of the uncertainty range and compare the results provides a measure of the "confidence interval" of the results sensitivity may also refer to spatial resolution would increasing resolution give a better result? would cost of additional data collection at higher resolution be justified? can we put a value on spatial resolution? MANAGING ERROR B. ARTIFACTS OF ERROR artifacts are unwanted effects which result from using a high- precision GIS to process low-accuracy spatial data usually result from positional errors, not attribute errors Raster data since raster data has finite resolution, determined by pixel size as long as pixel size is greater than the positional accuracy of the data, we have no risk of unwanted effects or artifacts Vector data often have precision different than accuracy significant problems occur in two areas: digitizing polygon overlay Digitizing artifacts a digitizer operator will not be able to close polygons exactly, or to form precise junctions between lines a tolerance distance must be established, so that gaps and overshoots can be corrected (lines snapped together) as long as they fall within the tolerance distance most digitizer operators can work easily to a tolerance of 0.02 inches or 0.5 mm problems arise whenever the map has real detail at this resolution or finer e.g. polygon with a narrow isthmus: diagram e.g. two lines close together - which one to snap to? diagram e.g. removing overshoot - must look back along line to form correct topology: diagram Strategies used to avoid problems: essentially, we try to find a balance between: 1. asking the operator to resolve problems, which slows down the digitizing, and 2. having the system resolve problems, which requires good software and lots of CPU usage each system establishes its own ways of avoiding or reducing these problems some are more successful than others 1. require the user to enlarge the map photographically increases the scale of the map while holding tolerance constant, so problem detail is now bigger than the tolerance difficult or impossible to get error-free enlargement cheaply and easily 2. require the user to digitize each arc separately e.g. if the following is digitized as one arc then it there is no intersection diagram program then only needs to check for snaps and overshoots at ends of arcs tedious for the digitizer operator 3. require the user to identify snap points press a different digitizer button when a point needs to be snapped wait for system response indicating successful snap diagram 4. have the system check for snaps continuously during digitizing requires fast, dedicated processor computing load gets higher as database accumulates requires continuous display of results no good for imported datasets 5. use rules to assist CPU in making decisions e.g. two labels in a polygon indicates that it''s really two polygons, not one with a narrow isthmus might use expectations about polygon shape puts heavy load on the processor the best current solutions use a combination of strategies 3 and 4 it is almost always useful to keep track of digitizing by marking work done on a transparent overlay a cursor in the form of a pen is a good practical solution Polygon overlay artifacts covered algorithms for dealing with sliver polygons earlier another strategy for avoiding the sliver polygon problem is to allow objects to share primitives this departs from the database model in which every set of polygons is thought of as a different layer e.g. suppose a woodlot (polygon) shares part of its boundary with a road (line) the shared part becomes a primitive object which is stored only once in the database, and shared by the two higher level features by using shared primitives, can avoid artifacts which might result when comparing or overlaying the two versions of the woodlot/road line, one belonging to the road object and one to the woodlot object to identify shared primitives during digitizing they must be on the same document need an operation which allows two separate primitives to be identified as shared and replaced by one need a converse operation to unshare a primitive if one version of the line must be moved and not the other diagram C. STORING ACCURACY INFORMATION how to store information on accuracy in a database? Raster data uncertainty in each cell''s attributes might be stored by giving each cell a set of probability attributes, one for each of the possible classes in classified remote sensing images this information can come directly from the classification procedures uncertainty in elevation in a DEM is more likely constant over the raster and can be stored as part of the descriptive or metadata for the raster as a whole positional uncertainty is also likely constant for the raster can be stored once for the whole map Vector data there are five potential levels for storage of uncertainty information in a vector database: map class of objects polygon arc point Positional uncertainty positional accuracy at one level may not imply similar accuracy at other levels positional accuracy about a point says little about the positional accuracy of an arc diagram similarly, positional accuracy at the polygon level may cause confusion along shared arcs diagram for lines and polygons, accuracy can be stored as an attribute of: arc (e.g. width of transition zone between two polygons) class of objects (e.g. error in position of railroads) map as a whole (e.g. all boundaries and lines on the map have been digitized to specified accuracy) for points, can be stored as an attribute of point, class or map Attribute uncertainty uncertainty in an object''s attributes can be stored as: an attribute of the object (e.g. polygon is 90% A) an attribute of the entire class of objects (e.g. soil type A has been correctly identified 90% of the time) REFERENCES Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment. Clarendon, Oxford. Chapter 6 on error in GIS. Chrisman, N.R., 1983. "The role of quality information in the long-term functioning of a geographic information system," Cartographica 21:79. Goodchild, M.F. and S. Gopal, editors, 1989. The Accuracy of Spatial Databases, Taylor and Francis, Basingstoke, UK. Edited papers from a conference on error in spatial databases. EXAM AND DISCUSSION QUESTIONS 1. Define the difference between sensitivity to error in principle and in practice. 2. Imagine that you represent a community trying to fight the proposed route of the powerline discussed in this unit. What arguments would you use to attack the power utility company''s methods? 3. Compare the methods available in any digitizing system to which you have access, to those discussed in this unit. Does your system offer any significant advantages? 4. Some GIS processes can be very sensitive to small errors in data. Give examples of such processes, and discuss ways in which the effects of errors can be managed. FRACTALS A. INTRODUCTION Why learn about fractals? Length of a cartographic line Where did the ideas originate? B. SOME INTRODUCTORY CONCEPTS Euclidean geometry C. SCALE DEPENDENCE Determining fractal dimension Some questions D. SELF-SIMILARITY AND SCALING Self-similarity Scaling E. ERROR IN LENGTH AND AREA MEASUREMENTS REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES UNIT 47 - FRACTALS Compiled with assistance from Brian Klinkenberg, University of British Columbia A. INTRODUCTION Why learn about fractals? fractals are not so much a rigorous set of models as a set of concepts these concepts express ideas which have been around in cartography for a long time they provide a framework for understanding the way cartographic objects change with generalization, or changes in scale they allow questions of scale and resolution to be dealt with in a systematic way Length of a cartographic line if a line is measured at two different scales, the second larger than the first, its length should increase by the ratio of the two scales areas should change by the square of the ratio volumes should change by the cube of the ratio yet because of cartographic generalization, the length of a geographical line will in almost all cases increase by more than the ratio of the two scales new detail will be apparent at the larger scale "the closer you look, the more you see" is true of almost all geographical data in effect the line will behave as if it had the properties of something between a line and an area a fractal is defined, nontechnically, as a geometric set - whether of points, lines, areas or volumes - whose measure behaves in this anomalous manner this concept of the scale-dependent nature of cartographic data will be discussed in more detail later Where did the ideas originate? term was introduced by Benoit Mandelbrot to the general public in his 1977 text Fractals: Form, Chance and Dimension a second edition in 1982 is titled The Fractal Geometry of Nature some of Mandelbrot''s earliest ideas on fractals came from his work on the lengths of geographic lines in the mid 1960s fractals may well represent one of the most profound changes in the way scientists look at natural phenomena fractal-based papers represent over 50% of the submissions for some physics journals many of the studies of the fractal geometry of nature are still at the early stages (especially those in geomorphology and cartography) the results presented in some fields are very exciting (e.g., see Lovejoy''s (1982) early work on the fractal dimensions of rain and cloud areas) B. SOME INTRODUCTORY CONCEPTS Euclidean geometry in traditional Euclidean geometry we work with points, lines, areas and volumes Euclidean dimensions (E) are all positive whole numbers the Euclidean dimension represents the number of coordinates necessary to define a point to specify any point on a profile requires two coordinates, thus a profile has a Euclidean dimension of two to define a point on a surface requires three dimensions, therefore a surface has a Euclidean dimension of three closely allied with Euclidean dimensions are the topological dimensions (DT) of phenomena on a flat piece of paper (which has a Euclidean dimension of 2) you can draw a two-dimensional figure (DT= 2), a one-dimensional line (DT= 1), and a zero-dimensional point (DT= 0) (compare 0-cell, 1- cell and 2-cell notation) in fractal geometry we work with points, lines, areas and volumes, but instead of restricting ourselves to integer dimensions, we allow the fractal dimension (D) to be any real number the limits on this real number are that it must be at least equal to the topological dimension of the phenomenon, and at most equal to the Euclidean dimension (i.e., 0<=DT<=D<=E) a line drawn on a piece of paper can have a fractal dimension anywhere from one to two the term fractals is derived from the same Latin root [fractus] as fractions; therefore: fractional dimensions the fractal dimension summarizes the degree of complexity of the phenomenon, the degree of its ''space-filling capability'' overhead - Lines of different fractal dimensions straight line will have equivalent topological and fractal dimensions of 1 slightly curved line will still have a topological dimension of 1, but a fractal dimension slightly greater than 1 highly curved line (DT= 1) will have a much higher fractal dimension line which completely ''fills in'' the page will have a fractal dimension of 2 many natural cartographic lines have fractal dimensions between 1.15 and 1.30 a surface can have a fractal dimension anywhere from 2 (perfectly flat) to 3 (completely space-filing) fractal dimension indicates how measures of the object change with generalization e.g. a line with a low fractal dimension (straight line) keeps the same length as scale changes a line with fractal dimension 1.5 loses length rapidly if it is generalized topological dimension tells us little about how shapes differ e.g. all coastlines have the same topological dimension however, sections of many coastlines have been found to have very different fractal dimensions fractal dimension quantifies the metric information in lines and surfaces in a new and unique manner FRACTALS C. SCALE DEPENDENCE the scale dependent nature of measurements (especially those made on maps) has been observed by many people e.g. as you measure the length of a natural boundary on maps of larger scales, or make your measurements with more precise instruments, the length appears to increase this is known as the "Steinhaus Paradox" Richardson (1961) made an extensive study of the cartographic representation of international borders suggested overhead Richardson plot, see Mandelbrot 1982, p. 33 he observed that there was a predictable relationship between the scale at which the measurement was made, and the length of the line even though the length increased when the borders were measured on maps of larger scale, the increase was predictable plots illustrating the relationship between measurement scale and length have since become known as Richardson plots Mandelbrot subsequently placed Richardson''s (and others) work within the framework of fractal geometry, and showed that such behavior is predicted in a fractal world Determining fractal dimension an example of how to determine the fractal dimension of a cartographic line: 1. step a pair of dividers (step size s1) along the line; say it takes n1 steps to span the line 2. the length of the line is equal to s1n1 3. repeat the process, but decrease the step size (to s2); it now takes n2 steps to span the line 4. the length of the line is now s2n2 5. the fractal dimension can be calculated as: D = log (n2/n1) / log (s1/s2) worked example: dividers size: 10 m number of steps: 100 dividers size: 5 m number of steps: 220 D = log (220/100) / log (10/5) = log (2.2) / log (2.0) = 0.3424 / 0.3010 = 1.14 here used logs to base 10, but any base could be used the more irregular the line, the greater the increase in length between the two estimates, and the greater the fractal dimension Mandelbrot''s texts, the book by Peitgen and Saupe (1988), and the papers by Goodchild and Mark (1987) and Milne (1988) discuss other methods of determining the fractal dimension there are a large number of ways of determining the fractal dimensions of points, lines, areas, and volumes Some questions 1. what is the "true" length of a line? 2. how can you compare curves whose lengths are indeterminate? 3. of what value are indices based on length measurements? the perimeter of an area object increases steadily with scale, but the area of an area object deviates up and down by much smaller amounts are analyses based on area less scale-dependent than ones based on perimeter? what does this indicate about measures of shape based on the ratio of perimeter to the square root of area? there is no complete solution to these (and similar types of) problems however, use of fractal geometry (especially the fractal dimension) does allow us to make reasonably meaningful comparisons and indices (as illustrated in Woronow, 1981) these questions are of special interest to cartographers interested in digital representations of cartographic features (e.g. Buttenfield, 1985) there are implications with respect to: 1. digitizing determination of the appropriate sampling interval 2. generalizing lines the best method for generalizing lines may be that method which best retains the fractal dimension of the line 3. displaying lines at a scale greater than that at which the line was collected introduce additional "information", by adding artificial detail to the line, detail which is a function of the fractal dimension of the original line); 4. incorporating the fractal dimension into traditional cartometry measures see Woronow (1981) D. SELF-SIMILARITY AND SCALING Self-similarity indicates that some aspect of a process or phenomenon is invariant under scale-changing transformations, such as simple zooming in or out can be expressed in two ways: overhead - Self-similarity 1. geometric self-similarity, in which there is strict equality between the large and small scales not found in natural phenomena the Morton order, quadtrees use this idea in replicating the same pattern at every level 2. statistical self-similarity, in which the equality is expressed in terms of probability distributions this type of (random) self-similarity is the more common, and is the type found in many natural phenomena, such as coastlines, soil pH profiles, river networks (Burrough, 1981; Peitgen and Saupe, 1988; etc.) the simplest test of self-similarity is visual if a phenomenon is self-similar, any part of it, if suitably enlarged, should be indistinguishable from the whole or from any other part if a natural scene is self-similar, it should be impossible to determine its scale e.g. it should be impossible to tell whether a picture of self-similar topography shows a mountain range or a small hill - there are no visual cues as to the picture''s scale since many scale cues are cultural, geological or geomorphological, self-similar topographies are most common on lunar or recent volcanic landscapes Scaling not necessarily equivalent to self-similarity, although the two terms are often used interchangably in the literature consider a landscape, as represented by a surface and a contour map on the contour map (coordinates in 2 dimensions only) the axes can be switched without fundamentally changing the characteristics of the landscape, i.e. the characteristics of the contour lines contour lines are therefore examples of simple scaling fractals in the case of the surface, with coordinates in 3 dimensions, we cannot interchange the z axes with either of the x or y axes without fundamentally altering the characteristics of the landscape since the z axis has a different scaling parameter than the x or y axes, a three- dimensional representation of the Earth''s surface is therefor an example of a non-uniform (or multiple) scaling representation shapes that are statistically invariant under transformations that scale different coordinates by different amounts are known as self-affine shapes (Peitgen and Saupe, 1988) the Earth''s surface is an example of a self-affine fractal, but it is not an example of a self-similar fractal contour lines, which represent horizontal cross- sections of the land surface, are examples of statistically self-similar scaling phenomenon (because the contour has a constant z value) because the land surface is self-affine and not self- similar, those techniques which determine the fractal dimension of the land surface itself produce values which are different than the values produced by those techniques which determine the fractal dimension of the contours derived from that land surface E. ERROR IN LENGTH AND AREA MEASUREMENTS scale, through its relationships with generalization and resolution, significantly influences length and area measurements problems in estimating line lengths, areas, and point characteristics can be related to the phenomenon''s fractal dimension (Goodchild, 1980) estimates of area are frequently based on pixel counts, especially in raster-based systems the error in the area estimate is a function of the number of pixels cut by the boundary of the object boundaries with a fractal dimension greater than one will appear more complex as the pixel size decreases (as the resolution increases) the more contorted the boundary, or the higher its dimension, the less rapid the increase in error with cell size diagram error in a pixel-based area estimate will also be a function of how the phenomenon is distributed about the landscape: the error in area associated with a highly compact phenomenon will be much less than the error in area associated with a widely dispersed, patchy phenomenon Goodchild and Mark (1987, p. 268) show that: the standard error as a percentage of the area estimate is proportional to a(1-D/4) where a is the area of a pixel and D is the fractal dimension of the boundary standard error will thus depend on a1/2 for highly scattered phenomenon and a3/4 for single, circular patches with smooth boundaries REFERENCES Only a very small portion of the literature is presented here. For further references you should refer to the Goodchild and Mark (1987) paper; recent issues of Water Resources Research and Science also contain relevant papers Burrough, P.A., 1981. "Fractal dimensions of landscapes and other environmental data," Nature 294:240-242. Buttenfield, B., 1985. "Treatment of the cartographic line," Cartographica 22:1-26. Goodchild, M.F., 1980. "Fractals and the accuracy of geographical measures," Mathematical Geology 12:85-98. Goodchild, M.F., and Grandfield, A.W., 1983. "Optimizing raster storage: An evaluation of four alternatives," Auto-Carto 6(2):400-407. Goodchild, M.F., and Mark, D.M., 1987. "The fractal nature of geographic phenomena," Annals AAG 77(2):265-278. Hakanson, L., 1978. "The length of closed geomorphic lines," Mathematical Geology 10:141-167. Lovejoy, S., 1982. "Area-perimeter relation for rain and cloud areas," Science 216:185-187. Mandelbrot, B.M., 1977. Fractals: Form, Chance and Dimension, Freeman, San Francisco. Mandelbrot, B.M., 1982. The Fractal Geometry of Nature, W.H. Freeman and Co., New York. Milne, B.T., 1988. "Measuring the fractal geometry of landscapes," Applied Mathematics and Computation 27:67- 79. Peitgen, H.-O. and D. Saupe (Eds.) 1988. The Science of Fractal Images, Springer-Verlag, New York. Richardson, L.F., 1961. "The problem of contiguity," General Systems Yearbook 6:139-187. Unwin, D., editor, 1989. Special issue on fractals. Computers and Geosciences 15(2). Woronow, A., 1981. "Morphometric consistency with the Hausdorff-Besicovich dimension," Mathematical Geology 13:201-216. EXAM AND DISCUSSION QUESTIONS 1. Although fractal concepts are important in understanding the error associated with pixel-based area estimates, little has been said about the relationship between fractals and area estimates obtained from vector-based systems. Why? (i.e., would the area of an enclosed figure change significantly? It is expected that the area shouldn''t change significantly, as the self-similar detail should increase the area as much as it decreases the area.) 2. Define "fractal". Include in your description terms such as scale dependency, self-similarity and scaling. 3. Discuss some of the ways in which fractals have changed our way of looking at phenomena. Based on your readings, provide examples from a variety of fields. 4. Theoretically, fractal behavior applies to a phenomenon across all scales. Practically, of course, there are limits to the application of self-similarity to natural phenomena. Where do you think some of these limits occur? (i.e., between what scales do you think portions of coastlines, for example, exhibit self-similar behavior.) What are the implications with respect to the generalization of cartogrpahic lines, if we observe definite limits to the self-similar behavior of cartographic features? LINE GENERALIZATION A. INTRODUCTION B. ELEMENTS OF LINE GENERALIZATION Simplification Smoothing Feature Displacement Enhancement/Texturing Merging C. JUSTIFICATIONS FOR SIMPLIFYING LINEAR DATA Reduced plotting time Reduced storage Problems with plotter resolution when scale is reduced Processing D. LINEAR SIMPLIFICATION ALGORITHMS Independent Point Routines Local processing routines Unconstrained extended local processing routines Constrained extended local processing routines Global routines E. MATHEMATICAL EVALUATION OF SIMPLIFICATION F. LINEAR SMOOTHING REFERENCES DISCUSSION/EXAMINATION QUESTIONS NOTES UNIT 48 - LINE GENERALIZATION Compiled with assistance from Robert McMaster, Syracuse University A. INTRODUCTION generalization is a group of techniques that allow the amount of information to be retained even when the amount of data is reduced e.g. when the number of points on a line are reduced, the points to be retained are chosen so that the line does not change its appearance in some cases generalization actually causes an increase in the amount of information e.g. generalization of a line representing a coastline is done best when knowledge of what a coastline should look like is used this unit looks at line generalization line generalization is only a small part of the problem of generalization in cartography - the larger problem includes e.g. generalization of areas to points the focus of the unit is on line simplification simplification is only one approach to generalization (see below) B. ELEMENTS OF LINE GENERALIZATION generalization operators geometrically manipulate the strings of x-y coordinate pairs Simplification simplification algorithms weed from the line redundant or unnecessary coordinate pairs based on some geometric criterion, such as distance between points or displacement from a centerline Smoothing smoothing routines relocate or shift coordinate pairs in an attempt to "plane" away small perturbations and capture only the more significant trends of the line Feature Displacement displacement involves the shifting of two features at a reduced scale to prevent coalescence or overlap most computer algorithms for feature displacement in vector mode concentrate on an interactive approach where the cartographer positions displacement vectors in order to initialize the direction for shifting another method uses a smaller-scale version of the feature to drive the displacement process Enhancement/Texturing enhancement allows detail to be regenerated into an already simplified data set e.g. a smooth curve may not look like a coastline so the line will be randomly textured to improve its appearance one technique is to fractalize a line by adding points and maintaining the self-similarity of the original version this produces fake (random) detail Merging merging blends two parallel features at a reduced scale e.g. the two banks of a river or edges of a highway will merge at small scales, an island becomes a dot algorithms for merging fuse the two linear features together C. JUSTIFICATIONS FOR SIMPLIFYING LINEAR DATA Reduced plotting time plotting time is often a bottleneck in many GISs as the number of coordinate pairs is reduced through the simplification process, the plotting speed is increased Reduced storage coordinate pairs are the bulk of data in many GISs simplification may reduce a data set by 70% without changing the perceptual characteristics of the line this results in significant savings in memory Problems with plotter resolution when scale is reduced as the scale of a digital map is reduced, the coordinate pairs are shifted closer together with significant scale reduction, the computed resolution could easily exceed the graphic resolution of the output device e.g. a coordinate pair (0.1, 6.3) reduced by 50% to (0.05, 3.15) could not be accurately displayed on a device having an accuracy of 0.1. Simplification would weed out such coordinate pairs before reduction Processing faster vector-to-raster conversion faster vector processing the time needed for many types of vector processing including translation, rotation, rescaling, cartometric analysis will be greatly reduced with a simplified data set many types of symbol-generation techniques will also be speeded up e.g. many shading algorithms calculate intersections between shade lines and polygonal boundaries a simplified polygonal boundary will reduce both the number of boundary segments and also the number of intersection calculations required LINE GENERALIZATION D. LINEAR SIMPLIFICATION ALGORITHMS overhead - Linear Simplification Algorithms Independent Point Routines these routines are very simple in nature and do not, in any way, account for the topological relationship with the neighboring coordinate pairs 1. nth point routine every nth coordinate pair (i.e, 3rd, 10th) is retained 2. randomly select 1/nth of the coordinate set Local processing routines these utilize the characteristics of the immediate neighboring points in deciding whether to retain coordinate pairs 1. Euclidean distance between points 2. Angular change between points overhead - Perpendicular distance and angular change 3. Jenks''s simplification algorithm overhead - Jenk''s simplification algorithm diagram three input parameters: MIN1 = minimum allowable distance from PT 1 to PT 2 MIN2 = minimum allowable distance from PT 1 to PT 3 ANG = maximum allowable angle of change between two vectors connecting the three points algorithm: IF distance from PT 1 to PT 2 < MIN1, OR distance from PT 1 to PT 3 < MIN2 THEN PT 2 is removed ELSE IF angle 123 < ANG THEN PT 2 is removed Unconstrained extended local processing routines these algorithms search beyond the immediate neighboring coordinate pairs and evaluate sections of the line the extent of the search depends on a variety of criteria, including: the complexity of the line the density of the coordinate set the beginning point for the sectional search Reumann-Witkam simplification algorithm overhead - Reumann-Witkam simplification algorithm the algorithm uses two parallel lines to define a search region after calculating the initial slope of the search region, the line is processed sequentially until one of the edges of the search corridor intersects the line Constrained extended local processing routines these algorithms are similar to those in the previous category, however, they are restricted in their search by: 1. coordinate search regions and 2. distance search regions Opheim simplification algorithm overhead - Opheim simplification algorithm same as the Reumann-Witkam routine, except the algorithm is constrained by a minimum and maximum distance check, much like the Jenks''s routine after the initial search region is set up which is similar to that of Reumann-Witkam, any points within DMIN are eliminated as soon as the line escapes from the search region on any side, including DMAX at the end, a new search corridor is established and the last point within the region is saved the behavior of this routine around a curve is represented in C and D Lang simplification algorithm Johannsen simplification algorithm Global routines consider the line in its entirety while processing Douglas simplification algorithm overhead - Douglas simplification algorithm I and II select a tolerance band or corridor (shaded area on slide) this corridor is computed as a distance, t1 in length, on either side of a line constructed between the first and last coordinate pairs, in this example 1 and 40 point 1 is the anchor point and point 40 is the floater after the establishment of a corridor, perpendicular distances between the line connecting points 1 and 40 to all intermediate points (coordinate pairs 2- 39) are calculated to determine which point is farthest from the line this maximum distance is to point 32, which is positioned well outside the corridor the position of this coordinate pair (pair 32) is now saved in the first position of a stack next, a new corridor is calculated between points 1 and 32 and point 23 is found as the farthest from the centerline here, point 32 is the floater this process continues until all points are within the corridor after the search has backed up to point 4, a new anchor and floater are established between points 4 and 23--the last position saved within the stack in this fashion the Douglas algorithm processes the entire line, backing up when necessary until all intermediate points are within the corridor and then selecting from the stack the position of the next saved coordinate pair thus eventually the segment of the line between coordinate pairs 23 to 32 will be evaluated and the corridor from coordinate pair 32 to the end of the line will be the final computed segment E. MATHEMATICAL EVALUATION OF SIMPLIFICATION many different types of measures may be used to evaluate the simplification process one type is simple attribute measures another type are displacement measures simple attribute measurements are those which may be applied to a single line, such as line length, angularity, and curvilinearity. these apply to either the base line or a simplification displacement or comparative measurements, on the other hand, evaluate differences between the base line and simplification overhead - Measures for linear simplification overhead - Areal displacement it appears that some of the algorithms are much better than others in maintaining the critical geometric characteristics of the data Douglas, Lang, Reumann-Witkam, and Opheim all appear to be reasonable choices the two best are Douglas and Lang F. LINEAR SMOOTHING smoothing is applied to digital line data in order to improve the aesthetical qualities of the line and to eliminate the effects of the digitizing device in general, it is felt that smoothing improves the quality of these data smoothing increases the number of coodinates needed, so is normally used only for output REFERENCES Buttenfield, B.P., 1985. "Treatment of the Cartographic Line," Cartographica 22(2):1-26. Douglas, D.H. and T.K. Peucker, 1973. "Algorithms for the Reduction of the Number of Points Required to Represent a Line or Its Character," The American Cartographer 10(2):112-123. McMaster, R.B., 1987, "Automated Line Generalization," Cartographica 24(2):74-111. McMaster, R.B., 1987. "The Geometric Properties of Numerical Generalization," Geographical Analysis 19(4):330-346. McMaster, R.B.,1989. "The Integration of Simplification and Smoothing Algorithms," Cartographica 26(1). Peucker, T.K., 1975. "A Theory of the Cartographic Line," Proceedings, Second International Symposium on Computer- Assisted Cartography, AUTO-CARTO-II, September 21-25, 1975 U.S. Dept. of Commerce, Bureau of Census and ACSM, pp. 508-518. White, E., 1985. "Assessment of Line-Generalization Algorithms Using Characteristic Points," The American Cartographer 12(1):17-27. DISCUSSION/EXAMINATION QUESTIONS 1. Discuss the differences between sequential and global approaches to line simplification. 2. What are the five generalization operators for digital line data? Discuss each one of these and give examples. 3. Using a series of diagrams, discuss the procedure used by the Douglas algorithm. 4. Discuss the different approaches you might use to evaluate the effectiveness of line simplification procedures and the advantages and disadvantages in each case. VISUALIZATION OF SPATIAL DATA A. INTRODUCTION Maps Computer-generated displays B. CARTOGRAPHIC BACKGROUND Visualization What is the image supposed to show? To whom? Ideal display C. GRAPHIC VARIABLES 1. Location 2. Value 3. Hue 4. Size 5. Shape 6. Spacing 7. Orientation D. PERCEPTUAL AND COGNITIVE LIMITATIONS E. GRAPHIC LIMITS F. REPRESENTING UNCERTAINTY Explicit uncertainty codes Graphic ambiguity Examples G. TEMPORAL DEPENDENCE Basic strategies H. SHOWING A THIRD DIMENSION Contours Hypsometric mapping Simulating oblique views of surface REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES UNIT 49 - VISUALIZATION OF SPATIAL DATA Compiled with assistance from Matt McGranaghan, University of Hawaii A. INTRODUCTION Maps are limited to two-dimensions must show 3-D data projected onto a flat surface give a distorted impression of spatial distributions on the globe are static, cannot show change through time or animate have difficulty showing interactions or flows between places are limited by the tools used to make maps pens of constant width constant color or tone the airbrush adds flexibility but is difficult to use, control have difficulty showing uncertainty in data give a false impression of accuracy Computer-generated displays include screens, plots, printer output include raster and vector can be animated can show continuous gradations of color, texture, tone can show 3-D using stereoscopic technology and pairs of images the computer is a powerful tool for visualizing spatial information this unit looks at some of the issues involved in combining the knowledge of cartography with the power of digital technology all too often these issues are ignored when output maps and displays are created from GIS although GIS display and mapping has much to learn from principles of cartographic design, it also provides entirely new possibilities B. CARTOGRAPHIC BACKGROUND must consider the objective of display Visualization process for putting (complex) images into minds examples: the shape of a mountain - poorly conveyed by contours pattern of growth of an urban area - may need animation to show changes through time effectively air-flows over a patch of terrain - needs 3-D capabilities plus animation to show true pattern of directions, speeds of flow movements of people in an area - needs ability to generalize individual movements into meaningful aggregate patterns components of visualization system: database containing information hardware device used to generate display human visual system processing of perceived image in the brain correct perception depends on functioning of all of these components What is the image supposed to show? what impressions does the analyst wish to create in the mind? what relationship do these have with the contents of the database? database contents are abstract version of geographical reality system should create an impression of reality, not of the contents of the database aspects of relationship between database and reality, e.g. accuracy, should be important part of display geography is complex display is a filter removing unwanted complexity to show trends, patterns display must show level of detail required by user, from general overview to detailed insights To whom? effective visualization may require familiarity with symbols on the part of the user some people may never master skills of map-reading, i.e. using maps to visualize geography how much familiarity should be assumed? it may generally be better to assume low familiarity people can learn to work with complex displays, but may lose interest and look for alternative sources of information Ideal display communicates intended message perfectly to all users is not mis-understood offers complete design flexibility put any symbol anywhere, at any size, etc. C. GRAPHIC VARIABLES classes of symbols correspond to classes of objects point line area visual differences among map symbols convey information 1. Location where the symbol is determined primarily by geography the primary means of showing spatial relations the brain computes relations like "is within", "crosses" on the fly from the eye''s perceived image of the map compare GISs some compute these relationships on the fly, others store them in the database to avoid the processing required to compute them compared to the brain, current GIS technology is amazingly crude 2. Value lightness or darkness of a symbol very important visually - the eye tends to be led by patterns of light and dark usually used to represent quantitative differences tradition suggests darker symbols should mean "more" - however this may reverse on dark backgrounds which are common on computer displays - on dark backgrounds, lighter may mean "more" 3. Hue color important aesthetically usually represents qualitative differences - continuous grading of color is difficult and expensive to achieve on printed maps 4. Size how large the symbol is conveys quantitative difference brain has difficulty inferring quantity accurately from the size of a symbol if proportional circles are used to portray city population, doubling the radius of a circle (quadrupling its area) is perceived as indicating more than twice the population, but not four times i.e. the brain infers population from some mixture of the radius and the area of the symbol 5. Shape geometric form of the symbol used to differentiate between object classes used to convey nature of the attribute, e.g. population indicated by images of people, housing by house symbols 6. Spacing arrangement, density of symbols in a pattern used to show quantitative differences, e.g. dot density to show population 7. Orientation of a pattern, to show qualitative differences of a linear symbol, to show quantitative (directional differences) VISUALIZATION OF SPATIAL DATA D. PERCEPTUAL AND COGNITIVE LIMITATIONS symbol differences must be perceptible to be of use JND - just noticeable difference - the smallest difference which can be reliably perceived between symbols, sizes, colors, shapes etc. LPD - least practical difference - the smallest difference which can be produced by the cartographic process eye''s sensitivity to various graphic codes some codes "get through" better e.g. use of yellow for fire trucks allows them to stand out better in the visual field sensitivity varies across visual field "peripheral vision" is enhanced by movement, varies among individuals cognitive aspects indications that perception is dependent on cognition - knowledge understanding of phenomena color categories/nameability - certain colors may have associations with names, concepts E. GRAPHIC LIMITS digital devices provide finite resolution spatial - where symbols might be and their shapes display device has a set screen or paper size display pixels have a set size, finite number of spatial locations aliasing - line (or point) mapped onto closest pixel(s) produces stepped (straight) lines color - what colors things might be limit on number of colors available (palette) - plotter may have only 8 pen colors - screen may have millions of possible colors limit on range of luminance and contrast how many colors displayable at one time - 2n where n is the number of bit planes what the colors are temporal limits data retrieval from mass storage or from core memory? how much data processing needed to compute display? writing to the display device speed limited by communication overhead & bus contention (competition from other activities) these factors may preclude using some types of display image animation requires fast through-put complex images require fast data retrieval acceptable response time people don''t like to sense a pause in the system typical goal: maximum of two seconds for complex operations, instantaneous for all others how long should something remain visible to be noticed? F. REPRESENTING UNCERTAINTY have to use SOME graphic code don''t want its meaning confused with something else e.g. line drawn wide to represent uncertain position confused with wide highway or braided stream Explicit uncertainty codes mark things which are uncertain with a color e.g. red or yellow to suggest caution in using the information Graphic ambiguity use graphic ambiguity to create cognitive/visual ambiguity e.g. multiple positions for an uncertainly located item dot density or color could be used to show varying probability, e.g. a cloud with highest density in the center absence of "hard" lines or edges where they are uncertain Examples show uncertain area with a red tint overlay show uncertain lines as multiple lines (like a braided stream) fuzzy line vary the value or saturation of the line across its width blending between adjacent areas to show zones of transition blend the colors choose such that the blend works psychologically red <-> purple <-> blue blue <-> aqua <-> green NOT red <-> yellow <-> orange large set of possible colors are needed to show the appearance of a smooth transition transition can be simulated with a small set of colors by spatially blending pixel colors ("dithering") G. TEMPORAL DEPENDENCE Basic strategies static maps show a single slice of time show several states at once by careful choice of symbols indicate amount or rate of change dynamic maps real time is compressed or scaled into changing display non-moving occurrences - events added and deleted at places through time moving objects - movement is animated on the screen - symbol is deleted at one location, regenerated at adjacent location H. SHOWING A THIRD DIMENSION Contours calculated contours (calculated by contouring algorithms) starting with a grid of elevations, thread contours and display the lines visual contours with elevation grid cells (contours are perceived but not computed explicitly) given a sufficiently dense raster of elevations shade pixels according to the elevation value of the central point using specified elevation ranges result is apparently (not analytically) a contour map Hypsometric mapping set each pixel to a color dependent on its height this is easily implemented as table look-up range of colors is conventional - dark green for low elevations, through green, yellow, brown, then white at highest elevations Simulating oblique views of surface each pixel''s illumination computed from its slope relative to simulated "sun" sun must be placed at top of image for correct visual perception - if sun is at bottom, eye sees surface inverted requires assumptions about reflectance of surface lakes, ice, some building materials produce highlights single light source makes the surface too "stark" assume light source infinitely far away from surface may assume viewer is also infinitely far away to avoid complex perspective calculations with TINs or coarse grids, edges of plane patches may be visible because of sharp change of slope discontinuities can be eliminated by varying intensity of illumination continuously over facets many 3-D display systems supply this capability - called Gouraud rendering REFERENCES Standard texts on map design: Cuff, D.J., and Mattson, M.T., 198. Thematic Maps: Their design and Production, Methuen, New York. Dent, B.D., 1985. Principles of Thematic Map Design. Addison- Wesley, New York. Tufte, E.R., 1983. The Visual Display of Quantitative Information. Graphics Press, Cheshire, CT. A fascinating discussion including many cartographic examples. Texts on computer graphics: Durrett, H.J. ed., 1987. Color and the Computer. Academic Press, New York. Foley, J.D., and Van Dam, A., 1982. Fundamentals of Interactive Computer Graphics. Addison-Wesley, New York. Myers, R.E., 1982. Microcomputer Graphics. Addison-Wesley, Reading, MA. Design for digital maps: Monmonier, M., 1982. Computer-Assisted Cartography: Principles and Prospects. Prentice-Hall, Englewood Cliffs NJ. Techniques for displaying topography: Kennie, T.J.M., and McLaren, R.A., 1988. "Modelling for digital terrain and landscape visualisation," Photogrammetric Record 12(72):711-45. EXAM AND DISCUSSION QUESTIONS 1. Summarize the ways in which digital displays offer greater flexibility for visualizing spatial data. 2. The visual system is not the only way in which spatial information might be conveyed to a user. Discuss the prospects for using other methods of communication, either alone or in combination with visual methods. What kind of user interface would be appropriate for a GIS for visually impaired users, and what applications might such a system have? 3. Review the methods of visualization available in any GIS to which you have access. How limited are they, and how could they be improved? 4. How would you adapt the concept of an atlas to a digital system with capabilities for animation? COLOR A. INTRODUCTION What is color? What gives an object its color? B. COMPONENTS OF COLOR VISION C. COLOR MEASUREMENT D. PHYSICAL COLOR SPECIFICATION SYSTEMS CIE Uniform color spaces E. PERCEPTUAL COLOR SPECIFICATION SYSTEMS Munsell color system F. CRT COLOR SPECIFICATION SYSTEMS RGB system HLS system HVC (hue, value, chroma) system REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES The slide set contains images to illustrate this unit (#31 to 40). UNIT 50 - COLOR Compiled with assistance from Jon Kimerling, Oregon State University A. INTRODUCTION What is color? a complex eye-brain response to electromagnetic radiation in the visible portion of the electromagnetic spectrum, commonly called "light" the average person perceives solar radiation from approximately 400 nm to 700 nm (1nm = 10-9m) in wavelength this range can be visualized as a series of six "spectral" colors grading from violet through red colors such as red cover a greater proportion of the spectrum than others such as yellow other colors are mixtures of these in varying proportions and a few colors, like fluorescent pink, are "non- spectral" since they cannot be created as a spectral mixture What gives an object its color? colors of most objects we see are a function of: the spectral properties of the illumination source, i.e., the amount of light at each wavelength coming from the source the ability of the object to reflect light at each wavelength, often graphically portrayed as a spectral reflectance curve the sensitivity of the cones in our eyes to each wavelength a CRT generates color by selectively exciting dots of three different phosphors - red, green and blue the spectral emittance characteristics of the phosphors and our sensitivity to light emitted by them determine the colors we see the gamut of a device is the range of colors which it is capable of generating generally, it is difficult to match the gamuts of different devices or media (e.g. CRT and paper), so colors tend to change when an image is displayed on a different device or medium B. COMPONENTS OF COLOR VISION differences in spectral sensitivities of receptors in the eye''s retina give us color vision Maxwell trichromatic theory of color vision is based on the fact that cone cells in our retinas, termed b, c and q, are primarily sensitive to blue, green, and red light, respectively color seen is a function of the relative amount of blue, green and red light striking the closely packed mixture of cone cells that, along with rod cells sensitive only to light intensity, form the retina visual signal transmission from the rod and cone cells appears not to be carried out by four different types of nerve fibre, but rather by nerve cells connected so as to produce only three different signals in the fibre the Opponent Process theory of color vision postulates that these signals interact to produce four perceptually unique "pole" colors: blue, green, yellow, and red all other colors will be seen as mixtures of these maximally discriminable "poles" color constancy refers to the ability of our visual system to adapt to light sources of different intensities and colors so that object colors remain the same e.g. skier''s experience of again seeing snow and trees as white and green after wearing goggles of a different color for a few seconds e.g. colors on CRT monitors appear the same as the screen brightness is lowered the Retinex theory, proposed by Edwin Land of Polaroid fame, explains color constancy by saying that our eyes do not function as cameras, since we do not perceive color by wavelength alone our mind does not determine the color of an object in isolation, but by comparing the object with its surround and continually adjusting to light source differences so that the object color appears the same perceptual dimensions of color describe the three basic ways in which we see variation in color, that is, color varies by: 1. hue - the attribute of color whereby an area appears similar to an opponent process "pole" color (red, yellow, green, or blue) or a mixture of any two "pole" colors 2. lightness - the brightness of an area relative to the brightness of a similar area that appears white in color 3. chroma - the colorfulness of an area relative to the brightness of a similar area that appears white in color - the strength or weakness of a color COLOR C. COLOR MEASUREMENT light measuring devices called spectrophotometers are employed to measure the light reflected from or emitted by an object, giving data needed for physical color specification slide 31 - spectrophotometer for CRT screen a spectrophotometer is a device that detects visible light either reflected from a surface lit by a "standard illuminant" or emitted by a CRT screen with a known "white point", disperses the light into a spectrum, and measures the amount of light at small wavelength intervals along the spectrum relative to the standard light source amount of light per wavelength interval is recorded in graphical or digital form suitable for subsequent use in color specification systems D. PHYSICAL COLOR SPECIFICATION SYSTEMS methods of specifying color used in optics CIE Commission International de l''Eclairage color system widely used allows precise numerical specification of color, based on spectrophotometric measurements a numerical way to match colors to a standard and to determine color differences colors are defined by (x,y,Y) coordinates that give a location on a chromaticity diagram slide 32 - CIE chromaticity diagram plotting the (x,y) coordinates of the color and the illumination source, one finds that a straight line drawn from the light source to the color and continued to the edge of the diagram gives the color''s dominant wavelength, a numerical description of hue the straight line distance on the chromaticity diagram between the light source and color, divided by the distance from the light source, through the color, and to the diagram edge gives the color''s purity, which is similar in concept to chroma the Y coordinate gives the color''s luminosity, this being a mathematical counterpart to lightness the true nature of the CIE system is best illustrated by a three dimensional figure, that for process color printing and CRT monitors resembles a six sided crystal with black and white tips slide 33 - 3-D perspective diagram of CIE color gamut for process color printing all colors that can be created by a display device, such as a plotter or CRT monitor, will fall within the boundaries of this type of solid figure, which normally encompasses only part of the entire CIE color space. slide 34 - the vertical dimension of the CIE color space Uniform color spaces equal differences in coordinates signify equal perceptual differences desirable when color progressions are to be determined based on physical color measurements the CIE (x,y,Y) system is not a uniform color space, but the related CIE (L*,u*,v*) color space is (L*,u*,v*) is a non-linear transformation of (x,y,Y) coordinates E. PERCEPTUAL COLOR SPECIFICATION SYSTEMS Munsell color system differs from CIE by being based on perceptual experiments to determine equal appearing steps of hue, value (perceived lightness), and chroma color of a surface is determined by comparing it visually to a set of painted color chips colors specified by 0-100 hue range, 0-10 value range, and 0-20+ chroma range complex mathematical procedures exist to convert CIE to Munsell colors, based on a table look-up approach slide 35 - the Munsell color system color progressions for quantitative areal data displayed using the choropleth or dasymetric mapping method are often based on Munsell value and/or chroma steps, whereas qualitative data often are portrayed with a series of Munsell hues F. CRT COLOR SPECIFICATION SYSTEMS color CRT displays are fundamentally different from color printers and plotters electrons in red, green, and blue (RGB) phosphor atoms are excited to higher energy levels by a moving electron beam, only to give off photons of the corresponding wavelengths upon return to their normal state after the beam has passed the monitor screen is made up of hundreds of thousands of tiny red, green, and blue phosphors arranged as rows and columns of triads RGB system the RGB color system is closest to the physical design of monitors, since colors are specified by amounts of red, green, and blue which can be directly translated into the electron beam strengths to be delivered to each phosphor in a triad system can be viewed as a cube with red, green, and blue axes slide 36 - RGB cube cube corners are white, black, red, yellow, green, cyan, blue, and magenta all possible RGB combinations are within the cube number of colors within the cube which are actually displayable depends upon the number of bit planes in the display driver or color monitor adaptor card e.g. many adaptors (EGA, VGA) have 4 bit planes in normal modes, use 3 for colors, 1 for lightness - 3 colors give the eight corners of the RGB cube 24 bit plane driver gives 224, or over 16 million different colors per pixel, organized so that there are 28 or 256 levels of red, green, and blue, plus all combinations thereof slide 37 - RGB cube diagram (0,0,0) gives black, (255,255,255) gives white, and the 254 intermediate triplets form a progression of grey tones running diagonally through the cube colors along the white-yellow [(255,255,255)- (255,255,0)], white-magenta and white-cyan cube edges, as well as diagonal rows from white to red, green, and blue form "tint" progressions opponent process "pole" colors and mixtures thereof can be easily specified, since RGB components change smoothly e.g. 254 gradations between blue (0,0,255) and green (0,255,0) can be created by holding red at 0, incrementing green by 1, and decrementing blue by 1 HLS system Tektronix developed the HLS system to simplify the selection of color progressions similar to tints and shades slide 38 - HLS color solid a double cone with the central axis forming a lightness progression identical to the black to white diagonal line through the RGB cube hues are specified by angle, starting with blue at 0o and progressing around the perimeter in the same order as found in the CIE chromaticity diagram when its boundary is traversed counterclockwise lightness and saturation vary from 0 to 1 the triangular slice for each hue can also be viewed as a plane cut from the RGB cube and deformed into the HLS triangle slide 39 - RGB - HLS deformation the transformation is linear, and hence simple equations can be used to transform HLS specifications into RGB values, and vice-versa. HVC (hue, value, chroma) system slide 40 - HVC system Tektronix has worked for several years to develop a color specification system essentially identical to the Munsell, the HVC system being the end product created by making spectrophotometric measurements of thousands of RGB combinations, determining the CIE chromaticity coordinates for each, transforming all (x,y,Y) coordinates to their (L*,u*,v*) counterparts, and determining equal increments of hue, value, and chroma in this uniform color space closely resembles the Munsell system - an irregular solid with vertical axis forming the value scale hues progress from 00 to 3600 around the axis, with red at 00 each vertical slice into the solid exposes a page of value-chroma combinations for a particular hue. the HVC-RGB transformation is far more difficult than the HLS-RGB, requiring a computer program of several hundred statements REFERENCES Dent, B.D., 1985. Principles of Thematic Map Design, Addison- Wesley, Reading, MA, pp. 353-357. Eastman, J.R. 1986. "Opponent Process Theory and Syntax for Qualitative Relationships in Quantitative Series," The American Cartographer. 13(4):324-333. Hunt, R.W.G., 1987. Measuring Color, John Wiley & Sons, New York, pp. 1-102. Murch, G.M. and J.M. Taylor, 1988. "Sensible Color," Computer Graphics World, July 1988:69-72. Niblack, Wayne, 1986. An Introduction to Digital Image Processing. Prentice-Hall, Englewood Cliffs, NJ. Robinson, A.H., R.D. Sale, J.L. Morrison, and P.C. Muehrcke, 1984. Elements of Cartography, 5th edition, John Wiley & Sons, New York, pp. 170-177. EXAM AND DISCUSSION QUESTIONS 1. How has the Munsell color system been adapted for display screen color specification? 2. What is the relationship between bit planes and the number of colors possible on a CRT monitor? 3. How is it that we see objects as the same color under different sources of illumination? 4. Explain the relationship between physical, perceptual and CRT color specification schemes, and give examples of each. 5. Explain the meaning of the term "gamut", and the problems which occur because of differences in gamuts between different display devices and media. GIS APPLICATION AREAS A. INTRODUCTION Functional classification GIS as a decision support tool Core groups of GIS activity B. CARTOGRAPHY Computers in cartography Organizations Adoption C. SURVEYING AND ENGINEERING Recent advances in technology Characteristics of application area Organizations D. REMOTE SENSING Characteristics of application area Organizations E. SCIENCE AND RESEARCH Analogy to statistical packages Characteristics of application area Organizations REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES This begins a 6 part section which reviews the spectrum of different applications of GIS. We have tried to include examples from all the areas in which GIS is currently actively employed. You may want to rearrange, enhance or revise major portions of these units to suit the needs and interests of your students. UNIT 51 - GIS APPLICATION AREAS Compiled with assistance from David Cowen, University of South Carolina and Warren Ferguson, Ferguson Cartotech A. INTRODUCTION GIS technology, data structures and analytical techniques are gradually being incorporated into a wide range of management and decision-making operations numerous examples of applications of GIS are available in many different journals and are frequent topics of presentations at conferences in the natural and social sciences in order to understand the range of applicability of GIS it is necessary to characterize the multitude of applications in some logical way so that similarities and differences between approaches and needs can be examined an understanding of this range of needs is critical for those who will be dealing with the procurement and management of a GIS Functional classification one way to classify GIS applications is by functional characteristics of the systems this would include a consideration of: 1. characteristics of the data such as: themes precision required data model 2. GIS functions which of the range of possible GIS functions does the application rely on? e.g. address matching, overlay? 3. products e.g. does the application support queries, one-time video maps and/or hardcopy maps? a classification based on these characteristics quickly becomes fuzzy since GIS is a flexible tool whose great strength is the ability to integrate data themes, functionality and output GIS as a decision support tool another way to classify GIS is by the kinds of decisions that are supported by the GIS several definitions of GIS identify its role in decision- making decision support is an excellent goal for GIS, however: decisions range from major (which foreign aid project to support with limited budget?) to minor (which way to turn at next intersection?) difficult to know when GIS was used to make decisions except in cases of major decisions decision support is a good basis for definition of GIS, but not for differentiating between applications since individual GIS systems are generally used to make several different kinds of decisions Core groups of GIS activity GIS field is a loose coalescence of groups of users, managers, academics and professionals all working with spatial information each group has a distinct educational and "cultural" background each has associated societies, magazines and journals, conferences, traditions as a result, each identifies itself with particular ways of approaching particular sets of problems interactions occur between groups through joint memberships, joint conferences, umbrella organizations these groups or cultures, then, are another basis for characterizing application areas the core groups of GIS activity can be seen to be comprised of: 1. mature technologies which interact with GIS, sharing its technology and creating data for it surveying and engineering cartography remote sensing 2. management and decision-making groups resource inventory and management urban planning (Urban Information Systems) land records for taxation and ownership control (Land Information Systems) facilities management (AM/FM) marketing and retail planning vehicle routing and scheduling 3. science and research activities at universities and government labs this and the next 5 units (Units 52-56) examine each of these groups of GIS activity seeking to find distinctions and similarities between them begin in this unit with a quick review of the relationship between the mature technologies and GIS and finish with a look at the role of GIS in science GIS APPLICATION AREAS B. CARTOGRAPHY there are two areas of GIS application in cartography: 1. automation of the map-making process 2. production of new forms of maps resulting from analysis, manipulation of data the second is closer to the concept of GIS although both use similar technology Computers in cartography first efforts to automate the map-making process occurred in early 1960s major advantage of automation is in ease of editing objects can be moved around digital map without redrafting scale and projection change are relatively easy differences between automated mapping and GIS are frequently emphasized mapping requires: knowledge of positions of objects, limited number of attributes GIS requires: knowledge of positions of objects, attributes, relationships between objects hence distinction between "cartographic" and "topological" databases "analytical" cartography involves analysis of mapped data has much in common with some aspects of GIS analysis cartography plays a vital role in the success of GIS supplies principles of design of map output products - how to make them easy to read and interpret? see: Units 17 and 49 represents centuries of development of expertise in compiling, handling, displaying geographical data widespread feeling that conversion to digital technology: is inevitable will revolutionize the field through new techniques Organizations both professional and academic organizations in most countries International Cartographic Association (ICA) well-developed training and education programs, journals, continuing research Adoption now is some use of digital technology in almost all aspects of the map production process the term "desktop mapping" emphasizes the accessibility of one form of automated cartography in the same way that page formatting programs have led to the success of "desktop publishing" C. SURVEYING AND ENGINEERING surveying is concerned with the measurement of locations of objects on the Earth''s surface, particularly property boundaries all 3 dimensions are important - vertical as well as horizontal positions accuracy below 0.1 m is necessary the locations of a limited number of sites are fixed extremely accurately through precision instruments and measurements these sites are monuments or benchmarks - the geodetic control network this is the function of geodesy or geodetic science using these accurate benchmarks for reference, large numbers of locations can then be accurately determined relative to the fixed monuments surveying is an important supplier of data to GIS however, it is not directly concerned with role of GIS as a decision-making tool some civil engineers now use GIS technology, especially digital elevation models and associated functionality, to assist in planning construction e.g. to make calculations of quantities of earth to be moved in construction projects such as building highways e.g. to visualize the effects of major construction projects such as dams Recent advances in technology instruments: locations captured by measuring device in digital form, downloaded to database - the "total station" new GPS (global positioning system) instruments determine location from satellites, supplementing the geodetic control network direct linkage of surveying instruments to spatial databases thus suppliers of surveying equipment have entered the GIS field as vendors Characteristics of application area scale: large - surveying often accurate to mm engineering calculations require high DEM resolution data model: survey data is exclusively vector lineage: for legal reasons the source of survey data is important e.g. instruments, benchmarks used, name of surveyor, date most systems do not yet allow such lineage information to be stored directly with the data Organizations surveying and engineering are mature professional fields based on scientific methods, with organizations, conferences, courses, journals, systems of accreditation introduction of GIS technology has not radically altered the profession D. REMOTE SENSING like surveying, is a data producing field acquires knowledge about the Earth''s surface from airborne or space platforms elaborate, well-developed technology and techniques instruments for data capture - high spatial and spectral resolution transmission of data, processing, archiving interpreting and classifying images two major roles for GIS concepts: quality and value of product is enhanced by use of additional ("ancillary") data to improve accuracy of classification e.g. knowledge of ground elevation from a DEM allows shadows to be removed from images to be useful in decision-making, product needs to be combined with other layers less readily observed from space e.g. political boundaries remote sensing continues to be an active research area new instruments need to be evaluated for applications in different fields careful research is needed to realize the enormous potential of the technology volume of accumulated data is increasing rapidly Characteristics of application area scale: a full range of spatial resolutions, depending on altitude, characteristics of instrument data model: data is captured exclusively in raster form (pixels) classified images may be converted to vector form for output, or for input to GIS systems interfacing with GIS is a current development direction both areas have developed extensive software systems in remote sensing, systems include image processing functionality interfacing is not difficult technically - however, there may be substantial incompatibilities in data models, format standards and spatial resolution many GIS vendors include functions to convert data from remote sensing systems and to display vector data on satellite image backdrops true integration of vector GIS and raster image processing systems is not yet available Organizations because of continuing emphasis on research, there is heavy representation from government and academic research the growth curve of remote sensing occurred about a decade earlier than GIS E. SCIENCE AND RESEARCH growing interest in using GIS technology to support scientific research to support investigations of global environment - global science to search for factors causing patterns of disease - epidemiology to understand changes in patterns of settlement, distributions of population groups within cities - anthropology, demography, social geography to understand relationships between species distribution and habitats - landscape ecology GIS has been called an enabling technology for science because of the breadth of potential uses as a tool Ron Abler (Pennsylvania State University) has compared GIS to tools like microscopes, Xerox machines, telescopes in its potential for support of research Analogy to statistical packages major statistical packages - SAS, SPSS, BMD, S etc. - developed over past 20 years primarily developed to apply statistical tools in scientific research subsequent applications in consulting, business recent introduction of graphics, mapping capabilities for display of results, e.g. SAS/GRAPH unlike statistical packages, GIS development has been driven by applications other than scientific research lack of tools for spatial analysis has meant that the role of location in explaining phenomena has been difficult to evaluate locational information has been available in map libraries but hard to interface with other information, not part of digital research environment potential for GIS to play an important role in scientific research GIS supports spatial analysis as statistical packages support statistical analysis Characteristics of application area scale: very large (archaeology) to very small (global science) functionality: overlay to combine, correlate different variables ability to interface GIS with complex modeling packages, statistical packages interpolation visualization of data potential for 3D, time-dependent applications Organizations no forum for exclusive discussion of role of GIS in science (similar problems in statistics) particularly in the non-technical fields in the social sciences discussion confined to individual disciplines geography is the only discipline with a general concern for spatial analysis and supporting tools however, in most US universities geography is a small, relatively weak and unknown discipline in other countries, (e.g. UK) geography is recognized as a strong traditional discipline, with distinguished roots in social and physical science research REFERENCES Abler, R.F., 1987. "Awards, rewards and excellence: keeping geography alive and well," Professional Geographer 40:135-40. Source of the reference in Section E. Bylinsky, Gene, 1989. "Managing with electronic maps," Fortune, April, 1989. Important popular review of GIS as a decision tool. EXAM AND DISCUSSION QUESTIONS 1. Some have argued that the best way to classify GIS applications is through the data they use. How would the results differ from the taxonomy proposed in this Unit? 2. What significant groups are missing from this taxonomy of GIS applications? What areas of application might develop in the future? 3. Do you accept the analogy between GIS and statistical packages presented in this Unit? In the long term, which would you expect to have the more significant role in supporting scientific activity? Why? 4. Which branches of science would have most use for a GIS as an enabling technology? Which would have least use for it? 5. It has been argued that GIS is an extremely dangerous tool in epidemiology, because of its potential for identifying all sorts of spurious correlations between environmental factors and the occurrence of disease. Do you agree, and if so, what steps would you recommend to reduce the potential for misuse? RESOURCE MANAGEMENT APPLICATIONS A. INTRODUCTION Characteristics of applications Functionality Adoption Organizations B. EXAMPLE - BIG DARBY CREEK PROJECT Big Darby Creek characteristics AGNPS - Agricultural Nonpoint Source Pollution Model The GIS GIS-Model Link C. DATABASE Slides D. SAMPLE RESULTS Management strategies tested Example of output E. ASSESSMENT OF SYSTEM EXAM AND DISCUSSION QUESTIONS NOTES The slide set contains twelve slides (#41 to 52) to illustrate this unit. As in many of these practical applications, widely accessible documentation is not available. UNIT 52 - RESOURCE MANAGEMENT APPLICATIONS Compiled with assistance from John Bossler, Ohio State University A. INTRODUCTION resource inventory and management was one of the earliest uses of GIS these applications dominated sales by vendors in the early 1980s many systems installed by state and federal governments and resource industries, particularly forestry, oil and gas most successful resource applications: forestry - timber inventory, watershed management, development of infrastructure (roads), forest regeneration agriculture - studies of agricultural pollution, inventories of land capability, productivity studies land use - planning use of land, zoning, evaluating impacts wildlife - management of habitat, evaluation of impact less successful subsurface resources - requires 3D approach, technology is predominantly 2D oceans - requires 3D, problems are time-dependent, lack of suitable data sources water resources - good for integration over watersheds, but 2D approaches are not ideal for linear surface watercourses or 3D groundwater Characteristics of applications layers: typically requires many coverages of an area - resources and relevant management factors are multi- dimensional mixture of data models - raster and vector with vector model, heavy use of polygons to represent homogeneous areas scale: varied but uncommon above 1:10,000 data quality: many layers are result of interpretation, classification quality is variable, often unevaluated Functionality simple map analysis: overlay, measurement of area, buffer zone generation, calculation of viewshed modeling: many include the use of external models based on multiple variables obtained from different layers e.g. models to simulate drainage basin runoff, fire spread Adoption most forest management agencies by mid 1980s most resource management agencies by late 1980s Organizations numerous conferences sponsored by federal and state agencies no major organization clearly devoted to GIS applications in resource management discipline-based organizations focus applications, e.g. forestry, ecology B. EXAMPLE - BIG DARBY CREEK PROJECT demonstrates an application of GIS to natural resource management illustrates the role of a GIS in linking with an existing analytical package GIS provides data input, storage, output and some analytic capabilities existing package provides specialized modeling, interfaced with the GIS funded by Nature Conservancy, NASA, Ohio EPA, Ohio Department of Natural Resources 2 year project combines a GIS (ERDAS) with a nonpoint source pollution model (AGNPS) additional software was developed to link the two existing packages goal to provide a low-cost, user-friendly system and database to support land use planning and management for the basin purpose of this project is to evaluate effects of changes in management practice model with GIS provides capability to evaluate "what-if" scenarios - observe and quantify effects of changes role of model is to simulate effects of natural processes, e.g. if x changes by an amount a, what is the corresponding effect on y? model is only useful if it predicts such effects accurately an additional role of the GIS in this case is to integrate spatially if changes are made in certain parts of a drainage basin, GIS can be used to integrate results of changes over the whole basin and give user the total Big Darby Creek characteristics Watershed contains 370,000 acres (580 mi2, 1,500 km2) in central Ohio includes parts of 7 counties State Scenic River one of the region''s last remaining free flowing streams not dammed for flood control or water supply over 60 of 100 Ohio freshwater fish species "exceptional water quality" (Ohio Environmental Protection Agency) Heritage elements 107 "heritage element" occurrences heritage elements are rare plant and animal species, champion trees protected by state and federal laws Sediment production however, is "highest sediment yielding watershed in Ohio" (Soil Conservation Service) percentages of land use - 71% cropland, 9% forest, 9% pasture, 9% fallow, 1% urban Typical management questions what would be the water quality effects of a 10 m conservation easement along the river? which soil types or fields are contributing the most siltation to the river and should be targeted for some kind of conservation action? which combination of crop/field management practices yields the most benefit to water quality? effective management requires quick and accurate answers to these and other questions AGNPS - Agricultural Nonpoint Source Pollution Model developed by US Department of Agriculture simulates impact of agricultural land use on water quality calculates for watershed as a whole, or for 40 acre units, the erosion and siltation and the nitrogen, phosphorus and chemical-oxygen demand generated by a storm results provided in tabular form The GIS low cost, microcomputer-based uses the GIS module marketed by ERDAS ERDAS product is normally associated with image processing thus these capabilities are also available provides: easy data entry and manipulation flexible graphics for output report generation GIS-Model Link GIS provides data entry and manipulation interface for the AGNPS program once the database has been created by the GIS it is reformatted and fed to the AGNPS model by a simple series of user commands after the model tabulates the results, output is fed back to the GIS to be displayed in map form RESOURCE MANAGEMENT APPLICATIONS C. DATABASE 21 variables required by AGNPS were entered through GIS overhead - Variables used in AGNPS model include: current management practices obtained by survey of 200 farmers soil type, slope from Soil Conservation Service surveys land cover from remote sensing (Thematic Mapper) 40 acre raster cells, 400 m on each side 210 rows and 148 columns Slides slide 41 - Regional setting slide 42 - Landsat scene Columbus is light blue area in lower right, Darby Creek watershed is centered on greenish area to left of Columbus note proximity to major metropolitan area, population 1.4 million slide 43 - surface hydrology 1 = Big Darby, 2 = Little Darby, 3 = major streams slide 44 - photograph of Big Darby Creek slide 45 - land use 1 = cropland, 2 = fallow, 3 = pasture, 4 = forest, 6 = urban, 7 = water note 88% of watershed is in agricultural use, only 9% is forest and 1% developed slide 46 - photograph of cropland in the watershed another layer identifies slope 50% of watershed is <2% slope, only 3% has slope >12% note that estimation of slope depends on size of raster cells the mean slope in a cell 400 m by 400 m is not the same as the maximum slope definition of slope used is unclear despite low slopes, much of basin has high soil erodibility according to SCS''s rating system 28% of basin qualifies for SCS Conservation Reserve Program (CRP) evidence of critical need for soil conservation practices slide 47 - distribution of CRP soils clustered in areas of higher slopes and along watercourses slide 48 - the 107 Heritage Element occurrences in the watershed identifies rare plant and animal species and "champion" trees protected by state and federal laws large number of occurrences indicates watershed''s ecological diversity and significance slide 49 - subwatersheds subsequent results will be for subwatershed 1 in northern extremity D. SAMPLE RESULTS slide 50 - nitrogen levels predicted by AGNPS model using data from GIS and displayed by GIS 1. (upper left) historical baseline - complete forest cover, virtually no erosion 2. (lower left) assuming complete compliance with CRP for eligible soils (28% of basin) low levels of erosion, only in limited areas 3. (upper right) current conditions - red indicates extremely high soil erosion several areas of very high erosion 4. (lower right) assumes implementation of a conservation easement on both sides of river, with forest cover erosion is reduced within the easement, but not outside number of raster cells in lowest category of erosion is increased from 459 under current conditions to 531 Management strategies tested overhead - Management strategies tested conservation easements of various widths on both sides of river use of no-till or conservation tillage practices on critical areas conversion of critical areas to non-agricultural (forested) use various combinations of the above, determined by likely acceptability to local farmers and government agencies Example of output given limited resources for erosion abatement, where should effort be concentrated? model can identify areas where greatest reduction in erosion rate can occur for given change in management practice slide 51 - critical areas for sediment reduction shows where change in management practice will produce greatest reduction 12% reduction in sediment yield can be achieved by changing management of these cells these are only 3% of area E. ASSESSMENT OF SYSTEM user-friendly GIS provides easy display of results, colorful graphics, standardized reports, easy input of data slide 52 - specialized linkage required between GIS and erosion model (AGNPS) such linkages will become unnecessary if data transfer formats can be standardized about 30 minutes required to test a scenario fully and obtain results system runs on readily available PC hardware under DOS system is comparatively portable and could be used for decision support in local planning meetings EXAM AND DISCUSSION QUESTIONS 1. What types of standards would be useful in interfacing packages such as AGNPS and ERDAS? Who should develop them and how should they be promulgated? 2. Discuss the role of spatial resolution in the Big Darby Creek study and its effects on the results. What arguments might have been used to justify a 40 acre cell? 3. Why was a raster data model used in this study rather than a vector data model? 4. The results quoted in this unit were based on counts of raster cells. Discuss the issue of accuracy in the Big Darby Creek study, and its implications for implementation of the study''s results. URBAN PLANNING AND MANAGEMENT APPLICATIONS A. INTRODUCTION Characteristics of applications Adoption Organizations B. EXAMPLE - ASSESSING COMMUNITY HAZARDS Anticipatory hazard management Hazard zone geometries US Superfund Amendments and Reauthorization Act Case study C. DATABASE Hazardous materials Demographic information Urban infrastructure Physiography D. ANALYSIS Simple spatial analysis Cartographic modeling Risk assessment model E. POTENTIAL IMPROVEMENTS TO MODEL REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES The slide set contains eight slides (#53 to 60) to illustrate this unit. UNIT 53 - URBAN PLANNING AND MANAGEMENT APPLICATIONS Compiled with assistance from Robert McMaster, Syracuse University A. INTRODUCTION involve the use of computers to carry out functions of urban government history of use extends back to first introduction of computers in cities in early 1960s major involvement of US Bureau of the Census as provider of data development of DIME files (locations of street centerlines, address ranges for each block, hooks to census reporting zones) for 1970 census series of city case studies in late 1960s/early 1970s in US comparable studies in many countries case studies designed to demonstrate simple GIS capabilities for urban government: planning using social statistics for small areas, e.g. crime data simple record-keeping problems associated with primitive state of hardware and software at that time Characteristics of applications scale: scale of DIME and TIGER (derived from USGS mapping at 1:24,000, 1:50,000, 1:100,000) sufficient to show street center lines but not parcels adequate for transportation planning, vehicle routing, general development strategies at this scale GIS can interface with existing records from census increasing interest in parcel level data for land records, zoning, services, subdivision plans at this scale can interface with assessor''s tax records functionality: many installed systems used for mapping, e.g. updating subdivision plans limited use for inventory, e.g. identifying parcels impacted by proposal little use for modeling - modeling applications more likely supported by specific software not linked to GIS - e.g. school bus routing packages Adoption early adoption by federally funded case study cities, others with adequate budgets now almost all local governments have some level of involvement in many states the state government plays a coordinating role Organizations Urban and Regional Information Systems Association (URISA) organized in late 1960s similar organizations in many countries membership drawn from local, state and federal government, consultants, academics sustained interest in GIS, particularly in recent years Spatially Oriented Referencing Systems Association (SORSA) provides an international forum B. EXAMPLE - ASSESSING COMMUNITY HAZARDS this example describes modeling of community vulnerability to hazardous materials there is an increasing concern with the manufacture, storage, transportation, disposal of hazardous materials recent EPA study revealed an average of 5 incidents per day over past 5 years where hazardous materials were released into the environment from small and large production facilities Anticipatory hazard management crucial component in mitigating potential impacts determines exact hazard distribution in an area exact locations of sources and zones of potential impact determines what can be done to prevent or reduce serious accident identify population distribution, social and economic characteristics needs daytime locations of population as well as residential (night-time) locations identify communication resources and transportation plan for evacuating area this example deals with airborne toxic releases occur rapidly, disperse over large area with immediate health effects evacuation more likely needed than for spills into soil or water population at risk may depend on specific substance released needs detailed socioeconomic information - e.g. age of population is a factor in evacuation planning, assessing potential impact because of possible mobility impairment Hazard zone geometries regions defined by level of risk to population, based on proximity to hazards combination of hazard zones produces a potential "contoured risk surface" overhead - Hazard zone geometries specific geometries include: areas of risk due to production of hazardous materials lines of risk due to hazards of transportation and transmission points of risk produced by consumption US Superfund Amendments and Reauthorization Act US Superfund Amendments and Reauthorization Act (SARA), 1986 Title III - The Emergency Planning and Community Right- to-know Act, covers four aspects of hazards mitigation: emergency planning emergency notification community right-to-know and reporting requirements reporting of chemical releases third component (community right-to-know) requires companies, organizations to submit emergency and hazardous chemical inventory information - including quantities and general locations Case study Santa Monica, CA selected as case study location is a separate administrative entity within Los Angeles basin city population of 88,300 suited the scale of the prototype study community had initiated a community right-to-know law fire department must be informed of any production or storage of over 50 gallons or 500 pounds or 2,000 sq ft of any hazardous material records stored by Police Department explores use of GIS for assessing community vulnerability three levels - simple spatial analysis, cartographic modeling and risk assessment modeling C. DATABASE constructed for MAP (Map Analysis Package) uses 100 m resolution pixels difficulty of estimating population data for finer resolution because of confidentiality restrictions adequate for airborne toxics soil or water-borne would require finer resolution, different data models (3D and linear objects respectively) database includes: hazardous materials locations and descriptions demographic data infrastructure - transportation, sewer lines, landuse physical geography - geologic faults, topography Hazardous materials records maintained by Police Department''s Toxic Chemical Coordinator hundreds of different types of chemicals reported overhead - On-site hazardous materials some sites had only one chemical - e.g. solvent chemical company has many toxic chemicals on site genetic engineering company with assorted radioactive materials study used UN Classification of Hazardous Materials overhead - UN Classification of hazardous materials categories added to UN classification by the city include: PCBs gunshops slide 53 - composite map showing presence of hazardous materials by class in 100 m cells Demographic information from 1980 census, includes: age structure - includes classes under 5, 5-15, 15- 65, over 65 ethnicity includes classes percent black, white, asian percent non-English speaking population density assigned from census tracts to cells assuming uniform density Urban infrastructure includes: locations of all public institutions schools, colleges, hospitals, theaters, shopping centers major street network traffic flow densities storm sewer network includes numbers of catchbasins per 100 m cell major oil pipeline detailed land use map Physiography terrain model at 100 m resolution from 1:24,000 topographic sheet allows: tracing of chemicals flushed into storm sewer network use of wind dispersion model D. ANALYSIS Simple spatial analysis slide 54 - create composite map of all hazardous materials, construct 500 m buffer zones (MAP command SPREAD) slide 55 - composite map of services slide 56 - overlay of 500 m buffers on services to identify those services in close proximity to hazardous materials could identify specific services and specific classes of hazardous materials, e.g. schools and radioactive materials Cartographic modeling cartographic modeling was used to model effects of hazardous materials incidents for example, consider the event of a liquid spill: control measures by the fire department would likely include washing the effluent into the storm sewer network during similar previous incidents, vapors within the storm sewer network have risen into buildings modeling strategy for assessing impact on schools model flow through storm sewer network using terrain data buffer around network identify impacted schools falling in the buffer slide 57 - topography of Santa Monica slide 58 - sewers "draped" over topography (MAP command COVER) slide 59 - flow forced downhill (under gravity) through storm sewers from assumed origin (beginning of red line in slide) to Santa Monica Bay uses the MAP command STREAM with the constraint DOWNHILL slide 60 - buffer zone of 300 m on either side of path Risk assessment model this represents a first step in developing a comprehensive spatial method for evaluating community vulnerability overhead - Conceptual risk-assessment model note: GIS functions named in this overhead refer to OSU MAP commands risk zones were identified: within 500 m of hazardous material site (HAZZONE) within 500 m of Santa Monica Freeway (FREEZONE) within 300 m of underground storage tank (TANKZONE) appropriate distances were determined by consulting toxic chemical information and emergency planning personnel uniform distances assumed the remainder of the city not in risk zones was eliminated from further consideration (leaves RISKZONEs) next, examined two components of risk assessment: human component hazardous materials component human component has four variables: average population density within 500 m (HAZDEN) need to give greatest weight to areas of highest density number of residents under 5 or over 65 within 500 m (HAZMOBIL) these age groups will need special attention if evacuation is required percent not speaking English as primary language within 500 m (HAZLANG) difficulty of managing evacuation of non- English-speaking minorities adjacency to school site (HAZSCHOL) the first three human component variable were weighted based on the original classed data e.g. census classification for percent Hispanic ("non-English speaking" group for this analysis) assigns classes of: 0 - outside database 1 - 1-4% 2 - 5-8% etc these class values were used as the weights for each human component these four human component variables then added to create a human hazard potential map (HAZHUMAN) problem with lack of adequate basis for weighting e.g. little research on relative difficulty of evacuating schools, elderly and non-English- speaking populations hazardous materials component has four variables, for each cell it is determined: number of hazardous materials within 500 m diversity of materials within 500 m number of underground storage tanks within 500 m maximum traffic flow within 500 m used as a surrogate for transportation hazard of hazardous materials variables weighted and added to create composite (HAZSCORE) first three variables weighted directly by value e.g. if a cell had 16 occurrences of hazardous materials within 500 m it had a weight of 16 on the first variable traffic flow weighted by class finally, human and hazardous materials components added to create composite risk map (SCOREMAP), reclassified from an original range of 1-75 into five categories highest risks along major traffic arteries due to concentration of industrial sites as well as transportation risk note: this analysis was not intended for use in evacuation planning, it was designed only as a planning tool E. POTENTIAL IMPROVEMENTS TO MODEL relative weighting of components in human risk score should be based on research into relative difficulty of evacuating different groups, also relative susceptibility to materials relative weighting of components in hazardous materials score should be based on history of previous incidents involving each material, also toxicity of material needs plume dispersion model score assumes impact within 500 m in all directions actual impact will depend on wind dispersion of plume need for model to assess likely dispersion based on atmospheric conditions, nature of incident materials have different dispersion characteristics based on e.g. density of vapor socio-economic data was based on census tract level errors introduced by assuming uniform density within tract needs finer resolution data for human component needs evacuation model which incorporates actual road network, assigns traffic to it and estimates congestion areas should be prioritized based on difficulty of evacuation, size of population and level of risk many of these capabilities are available in CAMEO, developed by NOAA for the Macintosh and now widely implemented in US emergency response organizations CADASTRAL RECORDS AND LIS A. LAND SURVEYS AND LAND RECORDS Public need for accurate land information The cadaster B. GEOMETRY OF CADASTRAL MAPS Plane surveys and geodetic control Absolute versus relative accuracy Coordinate geometry (COGO) C. THE TAX ASSESSOR AND CADASTRAL SURVEYS Assessor''s parcel maps Parcel numbers and Tax Roll D. EXAMPLES OF THE NEED FOR MPC/LIS Prince William County, Virginia Louisville/Jefferson County, Kentucky Los Angeles County, CA E. ADDING MULTIPURPOSE LAND INFORMATION LAYERS Geographic layers Role of CAD systems in early LIS development Non-geographic land attributes F. GIS AND THE MULTIPURPOSE CADASTER Integration of graphic and non-graphic information Spatial operations for LIS applications REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES Since subdivision and other parcel maps are often hand drafted they do not reproduce well. Try to get an example from your local land record office to show in class, replacing the overhead provided here. UNIT 54 - CADASTRAL RECORDS AND LIS Compiled with assistance from Frank Gossette, California State University, Long Beach A. LAND SURVEYS AND LAND RECORDS Public need for accurate land information governments, land developers, and property owners need and use land information daily land information is the basis of property rights in most countries land information must be used to resolve disputes must be accessed when property changes hands most of the information that a municipal government stores is tied to specific geographic locations within its jurisdiction: property lines, easements, utility and sewer lines, and many categories of spatial data the ability to store, retrieve, analyze, report and display this public land information efficiently and accurately is of great importance requests for information from a land information database can number thousands per day land information is of variable quality the legal description of land properties relies on accurate survey measurements, monuments with accurately known location, but also problematic descriptions such as "middle of river" (river may change course), marks on trees (tree may have died) etc. in resolving disputes, the source of land information and its accuracy may be as important as the information itself a land information database may need to include more than just coordinates in the UK: base mapping at 1:1,250 scale exists for all urban and many rural areas over 250,000 sheets regular program of maintenance and update currently being converted to digital form in the US: largest scale base mapping is 1:24,000 or 1:50,000, too small for property boundaries approximately 108 million parcels of taxable real property records on these are maintained by 83,216 state and local government agencies in local governments, 75% of daily transactions involve land information e.g. address verification, parcel identification, ownership, budget summaries, delivery of services records are held in unrelated formats e.g. property record books, paper files, microfiche, maps, charts, computer databases methods of information management are often as old as the system of land rights itself - which dates to before the Constitution land data held by one agency are frequently unavailable to another - not because of jurisdiction, but because of the method of record keeping leads to unnecessary confusion, cost and duplication The cadaster the cadaster is an official register of the ownership, extent and assessed value of land for a given area cadastral refers to the map or survey showing administrative boundaries and property lines cadastral information is usually the largest-scale (most detailed) land information available for an area as such, cadastral information can provide a large-scale base to which other layers of data can be added for specific purposes this is the concept of the multipurpose cadaster or MPC the ideas of integration of spatial data inherent in the MPC are found in many other areas of GIS application the MPC is an ideal - the actual state of cadastral information varies widely within the US and from country to country, despite wide acceptance that the arguments for MPC are very persuasive LIS is a generic term for information systems that deal with land records B. GEOMETRY OF CADASTRAL MAPS Plane surveys and geodetic control most cadasters are based on plane surveys surveyors have measured the boundaries and property lines as planar distances from known locations or benchmarks or monuments many, but not all, benchmarks are tied to actual geodetic control points (longitude/latitude or State Plane Coordinates) conflicts occur when boundaries plotted from survey data overlap or fail to meet Absolute versus relative accuracy absolute accuracy refers to the relationship of a point on the map to its actual location on the globe relative accuracy refers to the relationship of one point on the map to another point on the same map e.g. a property line may be 400 feet from a USGS marker which has been globally positioned to be at 112 degrees West Longitude and 34 degrees North Latitude either or both of these measurements could be inaccurate the property line might only be 398 feet away and the benchmark might be shown to be several hundred feet off, when measured by GPS or adjusted to the new North American datum Coordinate geometry (COGO) overhead - Portion of a parcel map land surveyors record subdivisions in terms of geometric distances and angles from control points (benchmarks) legal descriptions are made up of distances and bearings that trace the boundaries of the land unit special computer programs have been devised which accept this coordinate geometry (COGO) and translate the instructions into X-Y coordinates on the plane this gives the maps created by this process better "relative accuracy," in most cases, than maps created by digitizing the boundaries from existing basemaps overhead - Coordinate geometry vs digitizing C. THE TAX ASSESSOR AND CADASTRAL SURVEYS originally, cadastral maps and surveys were used exclusively to develop parcel maps for taxation purposes based on the Original Surveys of the land area (county, city, sub-division, etc.) however, these maps are not necessarily the legal authority for taxation or ownership the actual surveyor''s notes and legal description provide this authority Assessor''s parcel maps basic unit of land is the parcel parcels are usually contiguous and are owned by a single entity (family, individual, corporation, etc.) Tax Assessor (usually a county official in the US) assigns a number (identifier) to each parcel on the map Parcel numbers and Tax Roll working from the Parcel Maps, the Tax Assessor makes a list of parcels and their taxable value the value of land depends on many things, including the size of the property (area) and the actual or permitted uses (agriculture, industry, residential, etc.) of the land tax rolls may also include the legal ownership, the size of the parcel and the improvements made to the property important to note that tax rolls and parcel maps contain significant amounts of data that can be used for many purposes beyond tax assessment however, many problems arise when they are used for other purposes since they were compiled at an accuracy and detail that is required for tax only e.g. boundaries shown may not be accurate enough for city planning purposes CADASTRAL RECORDS AND LIS D. EXAMPLES OF THE NEED FOR MPC/LIS the following examples illustrate the need for geographic information systems to handle this type of information (this section quotes from and relies heavily on materials prepared for the US Department of the Interior, Bureau of Land Management''s Study of Land Information, mandated under Public Law 100-409, 1989) Prince William County, Virginia a mid-size Virginia county land deeds are filed with the Clerk of the Court office, and microfilmed a copy of the microfilm is given to the Real Estate Assessment Office certain information is abstracted from the deed and becomes an assessment record on the county''s mainframe computer, accessible to all departments a copy of the deed is used to update a parcel database new parcels and subdivisions are entered into an automated mapping system using COGO digital and mylar maps are updated weekly since the Assessment Office defines parcels in its own way for tax purposes there is not a one-to-one correspondence between the parcel database and the assessment records the geographic data in the parcel database cannot be linked effectively with the non-geographic assessment records the county is developing a LIS which will implement a single database with no duplication of data elements Louisville/Jefferson County, Kentucky 26 governmental units and local utilities produce or modify 111 sets of maps at annual costs of $3.2 million of the 111 sets, 59 are used by more than one organizational unit and 20 by more than five parcels and subdivisions are routinely mapped at least six times by government and utilities, often at different scales and levels of accuracy area agencies maintain some 95 automated geographic databases and 110 manual databases wide divergence in types and capabilities of computers communication of data is complicated replacing current practices with an automated system will save as much as $5.7 million over a 10 year period conservative estimates are that staff efficiency will increase by at least a third plan will include users and data collectors: Metropolitan Sewer District, local government agencies, utilities Los Angeles County, CA government consists of over 40 departments, plus committees, commissions and special districts 4084 sq mi area approximately 50% of all information is geographically related 7 problems common to all departments: lack of structured communication regarding sources, availability of georeferenced information lack of timely and convenient access information is not always current or accurate information is duplicated, independently maintained existing system is time consuming, difficult, labor intensive limited ability to relate geographic and non- geographic records difficulties of different scales, standards, accuracy, coordinate systems etc. LA county presents enormous problems, not only related to size complexity of jurisdiction - many of the incorporated cities within the county provide their own services, county government services the residual area management of elections is a major potential application of LIS - there is one election on average every 2 days in LA county - each election has its own set of districts with complex definitions a CAD parcel database alone is estimated at 300 Gbytes plan to achieve a county-wide LIS by target date of 1997 E. ADDING MULTIPURPOSE LAND INFORMATION LAYERS a Land Information System can be seen to be the result of adding more "layers" of information (geographic features) and including more attribute data to the cadastral map the base map or cadaster now becomes an MPC (or LIS) these data are useful for other, related functions of land management, planning and administration Geographic layers overheads - City map overlays (9 pages) additional geographic features can be registered to the parcel basemap e.g. street centerlines, public rights-of-way, "footprints" of public buildings, and other information for which the graphic representation is useful by itself other examples include: Infrastructure and Public Facilities infrastructure may include water lines, sewer lines, fire hydrants, power poles or other "utilities"-type information Hydrography and Topography streams, ponds, underground aquifers, and the 50 year floodplain are all geographic features which could be useful adjuncts to basic land information Role of CAD systems in early LIS development early LIS development stressed the cadastral map as the main system product ability to add layers of graphic information to the base map was a major incentive because of the availability of Computer-Aided Design and Drafting (CAD) tools, early automation of land information was often done on such systems since basic parcel boundaries, street information and some infrastructure information is immediately useable in graphic form, CAD systems provided LIS basemaps which could be easily updated and quickly produced the capabilities of these systems do not generally extend beyond simple production of maps - do not support sophisticated queries or analysis Non-geographic land attributes geographic features may be associated with an infinite number of characteristics parcel not only has ownership, area, and value, but can be distinguished on the basic of the allowable uses to which it can be put, the school district to which it belongs, or the age of the head-of- household typical LIS attribute data include: Land Use and Land Cover Zoning and Administration Demographics as the attribute or tabular data become an increasingly important component of the system, the ability of simple, "flat-file" databases which are a part of CAD systems represent a serious impediment to system growth more powerful data managers and GIS software may be needed F. GIS AND THE MULTIPURPOSE CADASTER many early LIS were created using CAD systems and relatively simplistic data managers as the volume of information increases and more sophisticated applications are attempted, the functionality of full-featured Geographic Information Systems may be required powerful, relational DBMS and topologically- structured, vector GIS software can handle the types of land-information management tasks which are typical of contemporary LIS example areas in which GIS capabilities are essential: Integration of graphic and non-graphic information general queries retrieval of administrative records using geographical keys (pointing at map, using topological relations such as adjacency, outlining query polygon etc.) Urban and Regional Planning: thematic mapping ability to merge geographic boundaries with statistical information - rapid creation of thematic maps in support of planning activities Community Development: zoning changes rapid update of zoning records, rapid display in map form using parcel boundaries Spatial operations for LIS applications Urban and Regional Planning: notifications use of buffering operation to identify property owners within fixed distance of proposed project Planning: feasibility studies use of overlay, modeling to support spatial search for feasible areas meeting requirements for project Public Works: roadwork surface modeling use of 3D capabilities to make engineering calculations Utilities: hydrologic modeling use of network modeling capabilities to predict urban runoff, effects of changes in storm water system Schools: population models and districting forecasting school populations by small areas based on demographic, migration, housing development models redistricting to achieve balanced school populations Fire: optimal routing use of network models for routing emergency vehicles, site selection for stations REFERENCES ACSM-ASPRS Joint Cadaster Task Force, 1985. "Implementing a National Multipurpose Cadaster," ACSM Bulletin 97:17-21. ACSM Geographic Information Management Systems Committee, 1988. "Multi-Purpose Geographic Database Guidelines for Local Governments," ACSM Bulletin 114:19-30. Chrisman, N.R., and B.J. Niemann, 1985. "Alternative Routes to a Multi-Purpose Cadaster," Proceedings Auto-Carto 7, ASPRS/ACSM, Falls Church, VA, pp. 84-94. Donahue, J.G., 1988. "Land Base Accuracy: Is It Worth the Cost?," ACSM Bulletin 117:25-27. Niemann, B.J. and J.G. Sullivan, 1987. "Results of the Dane County land records project: implications for conservation planning," Proceedings AutoCarto 8, ASPRS/ACSM, Falls Church, VA, pp. 445-455. Reports on the Need for Multi-purpose Cadaster National Research Council, 1980. Need for a Multipurpose Cadaster. Washington, DC. National Research Council, 1982. Federal Surveying and Mapping: An Organizational Review, Washington, DC. National Research Council, 1982. Modernization of the Public Land Survey System, Washington, DC. National Research Council, 1983. Procedures and Standards for a Multipurpose Cadaster, Washington, DC. Wisconsin Land Records Committee, 1987. Final Report: Modernizing Wisconsin''s Land Records, Institute of Environmental Studies, University of Wisconsin, Madison, WI. FACILITIES MANAGEMENT (AM/FM) A. INTRODUCTION B. AUTOMATED MAPPING Automated mapping capabilities Automated mapping shortcomings C. FACILITIES MANAGEMENT SYSTEMS Facilities management systems capabilities Facilities management systems shortcomings D. AM/FM AM/FM examples Benefits of AM/FM systems E. CHARACTERISTICS OF AM/FM Functionality Organizations F. EXAMPLE - EASTERN MUNICIPAL WATER DISTRICT Background System development System configuration Map products Applications development REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES The slide set contains a few slides that could be used to illustrate this unit. UNIT 55 - FACILITIES MANAGEMENT (AM/FM) Compiled with assistance from Warren Ferguson, Ferguson Cartotech, San Antonio A. INTRODUCTION facilities management is a very influential, well organized GIS application area has major representation from utility companies - telephone, electricity, gas projects tend to be very large, well funded and critical to the efficient operation of the utility umbrella term used by these organizations is AM/FM - Automated Mapping and Facilities Management AM/FM is primarily distinguished by the context of applications: utilities, urban facilities management AM/FM is an information management tool data used for day-to-day decisions only, is not an analytical tool e.g. maintenance crews use information to locate and repair breaks in service e.g. construction drawings are produced and sent to the field for installation AM/FM is the integration of two tools automated mapping produces maps facilities management provides digital inventories of facilities AM/FM links the two to provide geographical access to facility inventories B. AUTOMATED MAPPING with control of different layers of information, provides a variety of ways to output from a single database e.g. by turning on or off layers, a street light map or electrical feeder map could be produced from the same database Automated mapping capabilities overhead - Automated mapping better map maintenance is a major benefit of automated mapping productivity increases 2 to 10 times over manual methods no problem with physical or content deterioration of maps since they can be produced as needed or as updated centralized control is a major benefit to major corporations paper documents are replaced by a central digital store copies can be produced and distributed as and when necessary computerization provides easier but better controlled access in the paper world, when a document was checked out no one else could access the information - elaborate systems were set up to ensure return of the document in a digital world we can control who can access and for what purpose (read only, edit etc.) Automated mapping shortcomings provides only graphic output, no means of query e.g. cannot obtain attributes of objects e.g. cannot access objects by their attributes because objects are not connected topologically, cannot carry out sophisticated analysis of networks cannot relate map information to other records C. FACILITIES MANAGEMENT SYSTEMS exist in many organizations Facilities management systems capabilities overhead - Facilities Management consist of computerized inventories of the organization''s facilities capabilities for sorting, maintaining and reporting information e.g. many utilities have pole files containing information on each pole e.g. date of installation many types of reports can be generated can maintain a digital representation of the facility network to allow engineering, network analysis in tabular, numeric form (not spatial) Facilities management systems shortcomings no geographic capabilities can generate only alphanumeric reports cannot access records geographically cannot generate geographic reports (maps) redundancy must arise if both automated mapping and facilities management systems are maintained, one for mapping and the other for inventory D. AM/FM overhead - Automated mapping/Facilities management combine automated mapping and facilities management into one system geographic information provides a new window into the facilities database information can be retrieved by pointing to a map image e.g. point to an electrical cable and retrieve kVA (kilovolt-ampere) rating, length, mortality, or list of transformers connected to it AM/FM is a very successful marriage of two traditional concepts AM/FM examples locating pole or facility item by street address generate reports on street lighting - does it meet standards in specified area? generate maps of electrical circuits or feeders at prescribed scale produce continuing reports on property provide reports for tax purposes Benefits of AM/FM systems reduces the cost to maintain information no physical maps to deteriorate, get lost, misfiled data is more accessible and secure impact the organization by integrating operations departments must cooperate because they now share data reduces potential duplication between departments ensures consistency of information base across departments new forms of report available new information provides basis for new forms of management E. CHARACTERISTICS OF AM/FM scale: service maps are needed at a scale of 1" to 100'' general systems planning may require scales down to 1:1,000,000, e.g. for electrical utilities data sources: data generally collected during construction or maintenance, using sketches on standard basemaps data quality: high data quality is desirable, e.g. accurate positioning of underground facilities, but not always attainable in practice much urban infrastructure (e.g. water, sewer pipes) may be more than 100 years old and many historical records may be missing Functionality AM/FM systems stress addition of geographical access to existing databases database likely to remain on mainframe geographical access may be from workstation with geographical data maintained locally non-geographical data characterized by frequent transactions - requires access to database from many workstations geographical data input independently using specialized graphics workstation backcloth used for input backcloth is a basemap showing the facility locations to be digitized as well as other geographic details, e.g. streets, parcels digitizing may be done on screen with backcloth displayed in raster form using video technology however basemap itself is not entered into database some vendors supplying the AM/FM market argue that: AM/FM applications are literally "geographic information systems" - providing geographically based access to information systems which provide analysis and modeling functions are better described as "spatial analysis systems" Organizations AM/FM International - mostly utilities with strong representation by vendors, governments little involvement as yet in education, research branches in many countries F. EXAMPLE - EASTERN MUNICIPAL WATER DISTRICT Background the Eastern Municipal Water District (EMWD) of Riverside County, California, provides agricultural and domestic water, sewer collection and treatment and water reclamation services to a service area of 534 square miles, population of over 250,000 land use in the service area is a mix of very rapidly growing urban and suburban areas as well as rural farm land, mountains and desert has 50,000 domestic water customers supplied with water imported from the Colorado and California Aqueduct Systems as well as from 54 local ground water wells 33,600 sanitary sewer customers served by 5 regional water reclamation plants treating more than 24 million gallons of sewage per day has an annual operating budget of over $60 million area is developing very rapidly population in the service area is anticipated to reach between 600,000 to 1 million by the year 2010 number of customers is expected to triple in that time number of company employees will increase from 340 to 800 this extremely rapid growth has made it very difficult for the company to keep up-to-date on service maps and to plan properly for the installation of new services System development initially the interest in automation was simply a recognition of the immediate need for automated mapping as a way to deal with the backlog of mapping and record updates however, during the process of system planning, several other potential information and engineering applications were also identified therefore, the purpose of the AM/FM is: on the short-term, to map and manage facilities in the high growth environment on the long-term, to incorporate planning and sewer and water engineering analysis into the system System configuration with the assistance of a consultant the EMWD developed a plan for implementation of a major AM/FM system based on Intergraph equipment and software overhead - EMWD proposed AM/FM system configuration Map products map products were the initial purpose of the system and their production is critical to the immediate success of the system overhead - EMWD Map products lists the maps which will be produced once the database is complete Applications development the current Facilities Master Plans identifies and recommends computer programs for engineering analysis in the long range planning of new facility construction and operating procedures therefore, the system is designed to allow storage and interactive access to information for flow analysis of sewer and water models, using existing engineering analysis programs during the development of the AM/FM database, designers needed to identify and incorporate additional attributes that would be used in these models for long-range planning of facilities the system is designed to: provide spatial analysis capabilities to allow projection of future resource requirements based on demographic and economic data provide tools for the generation of construction work orders and detailed mechanical and electrical design drawings system designers also ensured that the final system will support the inclusion of topographic data which can be used in several anticipated applications, including hydraulic network analysis groundwater modeling identifying locations for radio telemetry facilities that will be used to provide real time data on flow and water levels customized report generation will assist the maps and records department provide inventory and facility asset information for the County Tax Assessor records will be generated by facility type, geographic area or any combination of attributes requested digital tax rolls from the Assessor''s office can be quickly checked against the property owner data maintained by EMWD customer service department will use the system to provide integrated access to meter reading, customer billing, facility locating and other inquiry processes REFERENCES Many examples of AM/FM installations are described in publications from AM/FM International including the annual conference proceedings and their trade journal, The Scribe. Wagner, M.W., 1989. "The Eastern Municipal Water District AM/FM/GIS project," Proceedings, Conference XII, AM/FM International, New Orleans, April 1989, pp. 526-541. Describes in detail the planning and implementation plan for the EMWD system reviewed in this unit. DEMOGRAPHIC AND NETWORK APPLICATIONS A. INTRODUCTION B. MARKETING, RETAILING AND ELECTORAL REDISTRICTING Characteristics of application area Types of applications Organizations C. EXAMPLE - REDISTRICTING Background Objectives Technical requirements Current districts Redistricting Proposals D. VEHICLE ROUTING AND SCHEDULING Technology Databases Functionality Data quality E. EXAMPLE - VEHICLE NAVIGATION SYSTEMS F. HIGHWAYS PLANNING AND MANAGEMENT REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES UNIT 56 - COMMERCIAL APPLICATIONS Compiled with assistance from David Cowen, University of South Carolina A. INTRODUCTION this unit looks at some of the more specialized applications of GIS demographic analysis spatial information plays a major role in many marketing and retailing decisions which involve decisions about the location of new stores, shopping centers, etc., and for evaluating the demographic characteristics of present and future trade areas similar applications in the government sector include redistricting - changing electoral boundaries in response to changing distributions of population network analysis delivery and emergency vehicles benefit from up-to- date information on the condition of the transportation network as well as real-time route planning B. MARKETING, RETAILING AND ELECTORAL REDISTRICTING location factors are critical to success of retailing accurate knowledge of spatial distributions is essential for advertising, direct mail campaigns GIS technology useful in designing sales areas, analyzing trade areas of stores similar applications occur in politics design of voting districts (apportionment, gerrymandering) has enormous impact on outcome of elections major interest in reapportionment after 1990 census GIS applications in these areas are still at early stage Characteristics of application area scale: street centerline, census reporting zones - i.e. 1:24,000 and smaller data at block group/enumeration district scale (250 households) is needed for locating smaller commercial operations like gas stations and convenience stores data at census tract scale (2,000 households) is good for the location of larger facilities like supermarkets and fast food outlets data sources: much reliance on existing sources of digital data especially TIGER and DIME similar data available in other countries additional data added to standard datasets by vendors e.g. updating TIGER files by digitizing new roads, correcting errors e.g. adding ZIP code boundaries, locations of existing retailers functionality: dissolve and merge operations, e.g. to build voting districts out of small building blocks modeling, e.g. to predict consumer choices, future population growth overlay operations, e.g. to estimate populations of user- defined districts, correlate ZIP codes with census zones point in polygon operations, e.g. to identify census zone containing customer''s residence mapping, particularly choropleth and point maps of consumers geocoding, address matching data quality: more concern with accuracy of statistics, e.g. population counts, than accuracy of locations Types of applications districting designing districts for sales territories, voting objective is to group areas so that they have a given set of characteristics "geographical spreadsheets" allow interactive grouping and analysis of characteristics e.g. Geospreadsheet program from GDT site selection evaluating potential locations summarizing demographic characteristics in the vicinity e.g. tabulating populations within 1 km rings searching for locations that meet a threshold set of criteria e.g. a minimum number of people in the appropriate age group are within trading distance market penetration analysis analyzing customer profiles by identifying characteristics of neighborhoods within which customers live targeting identifying areas with appropriate demographic characteristics for marketing, political campaigns Organizations many data vendors and consulting companies active in the field, many large retailers no organization unique to the field American Demographics is influential magazine C. EXAMPLE - REDISTRICTING GIS has applications in design of electoral districts, sales territories, school districts each area of application has its own objectives, goals this example looks at designing school districts Background the Catholic school system of London, Ontario, Canada provides elementary schools for Kindergarten through Grade 8 to a city of approx. 250,000 about 25% of school children attend the Catholic system 27 elementary schools were open prior to the study population data is available for polling subdivisions from taxation records approx. 700 polling subdivisions have average population of 350 each forecasts of school age populations are available for 5, 10, 15 years from the base year (see Taylor et al., 1986) at the polling subdivision level children are bussed to school if their home location is more than 2 miles away, or if the walking route to school involves significant traffic hazard Objectives minimal changes to the existing system of school districts minimal distances between home and school, and minimal need for bussing long-term stability in school district boundaries preservation of the concepts of community and parish - if possible a school should serve an identifiable community, or be associated with a parish church maintenance of a viable minimal enrollment level in each school, defined as 75% of school capacity and > 200 enrollment Technical requirements digitized boundaries of the polling subdivision "building blocks" an attribute file of building blocks giving current and forecast enrollment data for forecasting, we must include developable tracts of land outside the current city limits, plus potential "infill" sites within the limits overhead - London polling subdivisions, development tracts and infill sites 748 polygons development tracts are the isolated areas outside the contiguous polling subdivisions infill sites are shown as points the ability to merge building blocks and dissolve boundaries to create school districts school districts are not required to be coterminous - if necessary a school can serve several unconnected subdistricts a table indicating whether walking or bussing is required for each building-block/school combination Current districts overhead - Current allocation of students "starbursts" show allocations of building blocks to 29 current schools (includes two special education centers) note bussed areas in NW and SW - separate enclaves of recent high-density housing allocated to distant schools this strategy allows an expanding city to deal with dropping school populations in the core leading to an excess of capacity rising school populations in the periphery but lack of funds for new school construction without constantly adjusting boundaries overhead - Enrollment projections overhead shows projections of enrollment based on current school districts note rapid increase in developing areas e.g. St Joseph''s (#3), St Thomas More (#4) NW note decrease in maturing areas of periphery e.g. St Jude''s (#8) - SW area note rejuvenation in some inner-city schools due to infilling e.g. St Martin''s (#15) - lower center note stagnation in other inner-city schools e.g. St Mary''s (#17), decline e.g. St John''s (#14) - center Redistricting general strategy - begin with current allocations, shift building blocks between districts in order to satisfy objectives requires interaction between graphic display and tabular output quick response to "what if this block is reassigned to the school over here?" implementation allowed School Board members to make changes during meetings, observe results immediately using map on digitizer tablet, tables on adjacent screen Proposals overhead - Summary statistics for closure plan shows one alternative plan developed note: assumes closure of 6 schools rise in enrollment as percent of capacity stability of projections through time reduction in number of "non-viable" schools (<200 enrollment) increase in percent not assigned to nearest school increase in average distance traveled D. VEHICLE ROUTING AND SCHEDULING includes systems to aid in vehicle navigation, systems for routing emergency vehicles, scheduling delivery vehicles important actors include: automobile industry - vehicle navigation aids parcel services - express, courier emergency services - ambulance, fire rapid development of technology, databases Technology systems in vehicles e.g. ETAK navigator small processor, database on cassette tape or optical disk (CD ROM), display showing location of vehicle and surrounding streets, also best route to destination similar systems under development in Japan, Europe e.g. Macintosh Hypercard systems installed in fire trucks - Cameo developed by NOAA information on route to fire, layout of buildings, nearby hazardous materials car rental agencies systems at airport checkin counters offering driving instructions to user-defined places vehicle scheduling systems which automate vehicle routing given locations which have to be visited on call, e.g. parcel delivery systems to assign optimum routes to e.g. school buses Databases heavy reliance on TIGER and DIME problems with update these products are geared to the 10-year census cycle problems with completeness DIME for urban areas only, lack of addresses in rural TIGER problems with attributes simple street layout is not sufficient for detailed vehicle routing e.g. TIGER lacks data on one-way streets, no left turns, temporary road construction problems problems with topology e.g. roads which cross but do not intersect growing interest among vendors in adding value to TIGER by dealing with some of these problems lack of standards no organization responsible for developing standards no responsibilities of Bureau of the Census beyond census itself Functionality simple retrieval and display for vehicle navigation systems finding optimum route requires fast, intelligent algorithm address matching essential to identify location from street address Data quality street centerline, i.e. 10-20 m accuracy is adequate attribute accuracy may be important because of risk of lawsuits in cases of accidents E. EXAMPLE - VEHICLE NAVIGATION SYSTEMS considerable research is currently being conducted to develop vehicle navigation systems overhead - Automatic vehicle location systems these systems require databases that have: topological information methods for determining position in the network street attributes such as width, number of lanes, direction, surface condition, perhaps even usage information keyed to time of day identification information like street names and other local names for special grades, bridges, landmarks these systems need: technology for determining current location, may be: automatic determination from use of GPS and similar technology dead reckoning based on distance travelled in the network and map-matching (snapping location to coordinates of links and intersections) computer hardware and databases, may be: internal to vehicle or at a central location with transmission of data to the vehicle input for identifying starting location and destination output to provide route instructions must be able to generate maps for any location in the network at a speed that is compatible with the rate of movement of the vehicle may be visual or verbal driving instructions F. HIGHWAYS PLANNING AND MANAGEMENT other transportation applications involve the use of network GIS for the planning and management of highways and roads Nyerges and Dueker (1988) outline three levels at which GIS can play a role in State Transportation functions handout - GIS and State Departments of Transport Level I are planning applications that generally relate to the state as a whole the data needed at this level is coarse and spatial accuracy is not important aggregated data are preferred to illustrate major trends Level II are management applications focusing on smaller areas such as a county this is the level at which traffic safety and pavement management activities are conducted e.g. pavement data is often collected by taking vertical photographs of the road surface from a moving vehicle every few meters locations can now be determined using GPS photos can be accessed by tying them to a GIS of the road network Level III are engineering applications requiring very large scale data and high accuracy projects at this level would cover small project or corridor areas at this level the GIS would provide input to the preliminary engineering design as-built plans from completed projects could be added to the state highway database at this scale REFERENCES Briggs, D.W., and B.V. Charfield, 1987. "Integrated highway information systems," NCHRP Synthesis 133, Transportation Research Board, National Research Council, Washington, DC. Fletcher, D., 1987. "Modeling GIS Transportation Networks," Proceedings of URISA 1988, Los Angeles, CA, Vol. 2:84-92. Golden, B.L. and L. Bodin, 1986. "Microcomputer-based vehicle routing and scheduling software," Computers and Operations Research 13:277-85. Reviews the availability of network analysis modules for microcomputers. Jones, K. and J.W. Simmons, 1987. Location, Location, Location: Analyzing the Retail Environment, Methuen, New York. A recent volume on spatial analysis techniques in retailing. Krakiwsky, E.J., H.A. Karimi, C. Harris, J. George, 1987. "Research into electronic maps and automatic vehicle location," Proceedings AutoCarto 8, Baltimore, MD, pp. 572-583. McGranaghan, M., D.M. Mark and M.D. Gould, 1987. "Automated provision of navigation assistance to drivers," The American Cartographer 14:121-38. Reviews current technology and examines the issues in design of effective user interfaces. Nyerges, T.L., and K.J. Dueker, 1988. "Geographic Information Systems in Transportation," US Department of Transportation, Washington, DC. Report describes the potential use of GIS in State Transportation offices and the types of data and functionality that would be required. H.W. Taylor, W.R. Code and M.F. Goodchild, 1986. "A housing stock model for school population forecasting," Pr DECISION MAKING USING MULTIPLE CRITERIA A. INTRODUCTION Goals of this unit B. SPATIAL DECISION MAKING Examples of spatial decision making General steps involved in traditional approach Assumptions involved with this type of analysis Example 1: The fire station location problem Example 2: Land suitability assessment General observations Conclusion C. MULTIPLE CRITERIA AND GIS D. THE CONCEPT OF NONINFERIORITY E. BASIC MULTIPLE CRITERIA SOLUTION TECHNIQUES F. GOAL PROGRAMMING Choose criteria and assign weights Build a concordance matrix Summary G. WEIGHTING METHOD H. NORTH BAY BYPASS EXAMPLE Impact factors Alternative routes Combination of factors Weighting Concordance analysis Results REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES This unit begins a three part module introducing concepts and techniques of spatial decision-making. Although it is far from a complete coverage of the topic, it will provide students with a sampling of the kinds of decision-making activities GIS will be required to support. UNIT 57 - DECISION MAKING USING MULTIPLE CRITERIA Compiled with assistance from C. Peter Keller, University of Victoria, Canada A. INTRODUCTION an introduction to the topic of multiple criteria analysis deals with the potential integration of quantitative multiple criteria analysis and GIS GIS has the potential to become a very powerful tool to assist in multiple criteria spatial decision making and conflict resolution some GIS have already integrated multiple criteria methods with reasonable success (for example TYDAC''s SPANS system) it is anticipated that other vendors will integrate multiple criteria methods in the near future Goals of this unit to introduce students to the concept of multiple criteria decision making to outline some of the simpler strategies developed to solve multiple criteria problems to demonstrate the potential applicability of GIS B. SPATIAL DECISION MAKING Examples of spatial decision making identify shortest path that connects a specified set of points e.g. for power line route, vehicle scheduling identify optimal location of a facility to maximize accessibility e.g. retail store, school, health facility identify parcel of land for commercial development which maximizes economic efficiency General steps involved in traditional approach 1. identify the issue 2. collect the necessary data 3. define the problem rigorously by stating: objectives assumptions constraints if there is more than one objective: define the relationship between objectives by quantifying them in commensurate terms, i.e. express each objective in the same units, usually in dollars e.g. wish to minimize both cost of construction and impact on environment must express environmental impact in dollars, e.g. cost of averting impact then collapse the objectives into one objective e.g. minimize sum of construction and environmental costs 4. find appropriate solution procedure 5. solve the problem by finding an optimal solution Assumptions involved with this type of analysis the objectives can be expressed in commensurate terms the problem can be collapsed and simplified into a single objective for analysis decision makers agree on the relative importance of the commensurable objectives however, these assumptions don''t necessarily hold, consider the following examples: Example 1: The fire station location problem Problem: to locate a new fire station in a city (Schilling, 1976) Objectives: maximize coverage of population maximize coverage of real estate something is "covered" if it is within an established response time of a fire station, e.g. 3 minutes Conflict: most valued real estate is not necessarily located where most people reside most valued real estate in downtown and industrial areas people live in the suburbs objectives are in spatial conflict Solution: traditional approach requires that the two objectives be collapsed into one by defining a relationship between the value of real estate and the value of life but the two objectives are noncommensurate can''t place a monetary value on a human life Example 2: Land suitability assessment Problem: suitability evaluation of a number of sites for commercial development Objectives: maximize economic efficiency minimize environmental impact Conflict: decision makers have to express environmental quality in terms of economic efficiency (monetary values) different interest groups will value environment differently no consensus, therefore can''t assess environmental quality in monetary terms objectives are again noncommensurate General observations in the real world, decision making problems rarely collapse into a neat single objective diagram in this classification of real world spatial decision- making problems, most fall in the bottom right cell real world problems are inherently multiobjective in nature consensus rarely exists concerning the relationships between the various objectives Conclusion more appropriate to identify and maintain the multiple criteria nature of real world problems for analysis and decision making decision makers are frequently interested in the trade off relationship between the various criteria this allows them to make the final decisions in a political environment e.g. trading total population covered for total value of real estate covered Example 2: Land suitability assessment Solution: Identify and map the different land uses, land assessments and environmental impacts on separate layers construct several combinations of overlays based on various priorities derive suitability surfaces for the different combinations of priorities let politicians make the ultimate choice C. MULTIPLE CRITERIA AND GIS a GIS is an ideal tool to use to analyze and solve multiple criteria problems GIS databases combine spatial and non-spatial information a GIS generally has ideal data viewing capabilities - it allows for efficient and effective visual examinations of solutions a GIS generally allows users to interactively modify solutions to perform sensitivity analysis a GIS, by definition, should also contain spatial query and analytical capabilities such as measurement of area, distance measurement, overlay capability and corridor analysis D. THE CONCEPT OF NONINFERIORITY overhead - Noninferiority the figure shows the objective space for a two objective problem - the fire station problem two objectives, real estate and population coverage, are represented by the two axes of the graph the shaded area represents the set of all possible feasible locations (subject to constraints of cost, distance etc.) P1 represents the solution which optimizes coverage of population alone P2 represents the solution which optimizes coverage of real estate a site is noninferior if there exists no alternative site where a gain could be obtained in one objective without enforcing a loss in the other P3 represents a feasible solution which is NOT noninferior P3 can move vertically to improve population coverage without changing real estate coverage solutions exist which are better than P3 on one axis (one objective) without necessarily being worse on the other axis the dark curved line represents the set of noninferior solutions P4 is an example of a noninferior solution to improve on P4 for one objective requires a loss on the other objective the set of noninferior solutions is the set of best compromise solutions or the "trade-off curve" in welfare economics any point on the "trade-off curve" represents a point of Pareto optimality a solution point where no one objective can be improved upon without a sacrifice in another objective P4 cannot move vertically to improve population coverage must slide along trade-off curve movement upwards along the curve will imply a change (loss) in the real estate objective P4 therefore is a Pareto optimal or a noninferior solution point Example 1: Fire station location problem Solution: Identify the set of all possible sites for the new fire station that represent noninferior solutions for each noninferior solution, examine the trade off between covering more lives relative to more real estate make the final and informed decision in the political environment E. BASIC MULTIPLE CRITERIA SOLUTION TECHNIQUES are a number of possible approaches to defining the noninferior solution set 1. Preference oriented approaches: derive a unique solution by specifying goals or preferences this technique assumes the set of possible solutions is known and small an example is goal programming 2. Noninferior solution set generating techniques: derive the entire set of noninferior solutions and leave the choice to the decision-maker these techniques are used when a very large number of options exist many of these many not be part of the noninferior set, thus this allows the number of options to be reduced to a limited set an example is the weighting method F. GOAL PROGRAMMING one of the oldest and most well-known multiobjective research methods generally utilized where there are a number of competing goals or objectives Example 2: Land suitability assessment given a set of parcels of land, identify which best suits a set of development or search criteria the overall aim is to meet all the criteria or goals to the greatest extent possible, to choose the most desirable plan from a set of possible options Choose criteria and assign weights overhead - Goal programming example - criteria weights handout - Goal programming example (2 pages) suppose there are 4 sites to be evaluated 8 criteria have been identified these likely reflect opinions of different experts, different schools of thought, different objectives e.g. may wish to maximize profit (developer), to minimize cost (engineer) and to minimize environmental impact (environmentalist) weights have been given to each criterion to identify its importance weights must sum to 1 e.g. the developer''s criteria may have a weight equal to the engineer''s and less than the environmentalist''s each site has been ranked on each of the criteria (see overhead) Build a concordance matrix overhead - Goal programming example - Building a concordance matrix take each ordered pair of alternatives - e.g. sites A and B, pair AB for each criterion, assign the pair to one of three sets: where A beats B (concordance set) e.g. criteria 2 (wt=.1), 4 (.2), 6 (.1), 8 (.1) where B beats A (discordance set) e.g. criteria 1 (wt=.1), 3 (.1), 7 (.1) where A and B tie (tie set) e.g. criteria 5 (wt=.2) add up the weights of the cases in each set if A always beats B on all criteria, all 10 cases will be in the concordance set - total weight will be 1 actual weights for pair AB: concordance set: 0.5 discordance set: 0.3 tie set: 0.2 concordance for each pair is determined by summing the weights for criteria assigned to concordance set plus half sum of wts for criteria in tie set for pair AB: 0.5 + 0.1 = 0.6 indicates a slight preference for A over B across all criteria create a matrix of concordance for each pair overhead - Goal programming example - Full concordance matrix row is first in pair, column is second row total yields index of preferability the larger the index, the more preferred the option over all criteria, site D is preferred to site C which is preferred to site A which is preferred to site B note: an example of this process is provided later in this unit Summary decision maker is asked to specify goals and relative weightings for the different criteria use relative weightings to find most preferred site change weighting to assess sensitivity of solution or to reflect different opinions G. WEIGHTING METHOD used when the set of possible solutions is extremely large identifies or reduces the number of solutions that need to be considered solution of multi-criteria problem is easier if the contents of the noninferior set are known this method finds the complete noninferior solution set rather than a single solution final selection is left to decision-makers strategy: combine the criteria using a range of different weightings for each criteria - range from 100% on only one criteria to 100% on the other find best solutions for each combination due to the number of combinations that must be evaluated, this is not generally practical for more than 2 criteria note the weighting method does not guarantee that all solutions in the noninferior set will be found number found depends on how many combinations of weights are used H. NORTH BAY BYPASS EXAMPLE this section is drawn from B.H. Massam''s book Spatial Search which includes many examples of complex spatial decision-making a new route is needed for Ontario Highway 11 around the city of North Bay this study conducted by Ontario Ministry of Transportation and Communications is similar in methodology to many highway routing studies many of these studies use GIS or automated mapping systems to analyze multi-layer databases routing studies follow a common strategy: identify factors which are important in evaluating impact of route identify a small number of feasible routes evaluate each route on each of the impact factors reach a decision by combining impact factors on some systematic basis this study is a particularly good example of the general strategy Impact factors total of 35 criteria grouped into 7 clusters overhead - North Bay bypass study - Criteria clusters "Direct Cost" cluster includes construction and property costs "Traffic Service" cluster evaluates effectiveness of route from a traffic engineering viewpoint, includes number of miles with >2% grade "Community Planning" cluster evaluates routes against common planning criteria, including amount of land for potential development which will have improved access as a result of the highway "Neighborhood and Social Impact" cluster includes many factors measuring impact on local communities Alternative routes 9 alternatives identified each alternative is a complete route, evaluated as such two or more alternatives may share long stretches of common route, differ only in sections Combination of factors factors evaluated by a Technical Advisory Committee all major clusters represented by different members e.g. direct cost cluster represented by engineers, accountants, managers e.g. neighborhood an LOCATION-ALLOCATION ON NETWORKS A. INTRODUCTION Network problems Location-allocation problems Objectives Applications B. EXAMPLE - OIL FIELD BRINE DISPOSAL Brine disposal Disposal options The location-allocation problem C. COSTS Pipe cost Truck cost Disposal well cost D. GIS IMPLEMENTATION E. LOCATION-ALLOCATION ANALYSIS MODULE Sensitivity analysis Problems with link-node models REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES UNIT 58 - LOCATION-ALLOCATION ON NETWORKS A. INTRODUCTION Network problems a network can be represented digitally by nodes (junctions) and links (connections between nodes) common networks include streets in a city, airline routes, railroads a GIS is a convenient way of storing information about a network a large number of analytical problems have been developed for networks, e.g.: "shortest path problem" - algorithms to find the shortest route through the network between given origin and destination "traveling salesman problem" - algorithms to find the shortest tour through a given set of destinations, beginning and ending at a given origin "transportation problem" - find the pattern of shipments of goods from a number of factories to a number of outlets which will minimize total shipping cost "traffic assignment problems" - given the numbers of trips to be made between origins and destinations, predict how traffic will allocate itself to a network, i.e. how many vehicles will use each route numerous other problems in vehicle routing and scheduling some of these, e.g. shortest path problems, have been incorporated into GIS products, e.g. ARC/INFO''s NETWORK, Caliper''s TRANSCAD others can be used as stand-alone packages in conjunction with a GIS the GIS provides the input, output, display, simple analysis functions the stand-alone package provides the algorithm to solve the particular problem this unit examines an example of network problems Location-allocation problems concern the provision of a service to satisfy a spatially dispersed demand demand for the service exists at a large number of widely dispersed sites impossible to provide the service everywhere e.g. every household needs a source of groceries, but impossible to provide a grocery store at each household for reasons of cost (economies of scale) service must be provided from a few, centralized locations ("sites") sometimes the number of sites is known in advance, e.g. McDonalds wishes to locate 3 restaurants in city x in other cases the optimum number of sites is one aspect of the solution two elements to the problem: 1. Location where to put the central facilities (and possibly how many, how big) 2. Allocation which subsets of the demand should be served from each site ("trade areas", "service areas") Objectives important components: cost of operating the facilities - includes construction, operating costs - may be independent of locations chosen cost of travel to and from facilities - may be absorbed by the consumer or the provider depending on the context quality of service e.g. important in providing emergency fire service which is dependent on the response time of the fire truck different objectives define different versions of the location-allocation problem Applications retailing - locations of stores, restaurants emergency facilities - ambulances, fire stations schools warehouses regional offices of government departments recreation facilities - public pools B. EXAMPLE - OIL FIELD BRINE DISPOSAL this is an example of both a location-allocation problem and the use of a network model concerns waste disposal for the Petrolia, Ontario oil field which has been producing oil since 1850s oil extraction from the field generates large quantities of waste fluid waste fluid has been increasing as the field has become depleted waste fluid or "brine" is a salty, smelly fluid brine may be 90%-97% of total volume extracted, only 3%-10% oil 14 active producers in the field each producer may operate up to 30 wells each producer operates an oil collection facility to which all liquids from that producer''s wells are piped oil and brine are separated by each producer at the collection facility using simple gravity separation oil is shipped to the refinery by truck Brine disposal brine disposed of by individual producers some of the methods used may lead to violations of provincial pollution standards brine may run onto fields or into surface watercourses thus need a better disposal method only effective method of disposal is by pumping to a geological formation below the oil producing layer alternative methods are too expensive or impractical, e.g. purification by reverse osmosis, evaporation in holding ponds Disposal options options include: 1. a single, central disposal facility minimum capital cost maximum transport cost 2. requiring each producer to install a facility maximum capital cost zero transport cost 3. some intermediate configuration of shared facilities The location-allocation problem find locations for one or more central facilities and allocate producers to them in order to minimize the total of capital and transport costs two alternatives for transport of waste brine to central facilities: pipe and truck assume that both transport routes would follow the same network C. COSTS handout - Brine disposal study (2 pages) overhead - Brine disposal study costs Pipe cost must pay for pipe over its expected lifetime, plus cost of pumping brine through pipe Truck cost must pay for holding tanks for brine, with sufficient capacity to allow for delays in winter, plus cost of loading and unloading truck, and estimated driving time Disposal well cost includes cost of installing disposal well and running pump porosity of formation varies, so there is a risk of failure in a drilled disposal well new well - $50-$75,000 success rate 60-80% brine contains dense hydrocarbons - waxes - which will build up over time and block the well problem with corrosion of pipes due to high acidity of brine D. GIS IMPLEMENTATION data structure defines network of streets and rights of way - potential routes for trucks/pipes links with attributes of length nodes with attributes of volume produced - nodes include producer sites plus other potential well locations GIS database with nodes and links and associated attributes provides: data input functions (editing) data display - graphics, plots storage of geographic data data to be passed to the analysis module analysis module interacting with GIS database obtains nodes and links from the GIS performs analysis, reports results directly to the user includes several heuristic methods for solving the optimization problem allows the user access to the display/analysis functions of the GIS an analysis module supported in this way by a GIS database provides a primitive spatial decision support system (SDSS) tailored to this specific, advanced form of spatial analysis see Unit 59 for more on spatial decision support systems E. LOCATION-ALLOCATION ANALYSIS MODULE overhead - Location-allocation analysis module 1. Finds shortest paths between points on network (could be a GIS function) 2. Defines and modifies model parameters (e.g. components of pipe and truck cost equations) 3. Uses shortest paths and parameters to calculate transport costs by each mode 4. Searches for optimum solution using add, drop and swap heuristics add - start with no facilities, at each step place facilities in location which best improves objective drop - start with facilities at every node, at each step drop the facility which produces least deterioration in the objective swap - try to improve the objective by moving facilities from one node to another 5. Evaluates solutions and displays results overhead - Brine disposal options costs Sensitivity analysis many parameter values are uncertain e.g. cost of installing pipe, lifetime of pipe and wells important to know effect of uncertainty on results e.g. if pipe cost doubles, what will be impact on results? in sensitivity analysis, parameter values are changed one at a time to determine its effect on solutions overhead - Sensitivity analysis in each case, first line gives value assumed for option d, wells at producer locations subsequent lines give effect of changing the parameter e.g. increasing pipe cost leads to greater number of facilities Problems with link-node models some spatial decisions involving networks do not work well with the standard link-node model may need to put a facility or event anywhere on the network not just at intersections thus need the ability to identify a location along links this may be done by: identifying location by its distance along a link from a node thus network is not a set of links and nodes but an addressing system using link number and distance breaking a link at a given location to form a new node and 2 links e.g. "dynamic segmentation" if the break is temporary REFERENCES Ghosh, A. and G. Rushton, 1987. Spatial Analysis and Location- Allocation Models, Van Nostrand, Reinhold, New York. Includes many applications of location-allocation methods. Golden, B.L. and L. Bodin, 1986. "Microcomputer-based vehicle routing and scheduling software," Computers and Operations Research 13:277-85. Reviews the availability of network analysis modules for microcomputers. Goodchild, M.F. and J.A. Donnan, 1987. "Optimum location of liquid waste disposal facilities: formation fluid in the Petrolia, Ontario oilfield," in M. Chatterji, Editor, Hazardous Materials Disposal: Siting and Management, Gower, Aldershot, UK, pp 263-73. SPATIAL DECISION SUPPORT SYSTEMS A. INTRODUCTION B. DEFINITIONS AND CHARACTERISTICS Decision support systems C. SPATIAL DECISION-MAKING Example: site selection for a retail store D. SDSS ARCHITECTURE Data Base Management System Model Base Management System Graphical and Tabular Report Generators User Interface E. DEVELOPMENT OF DSS Three levels of technology Five functional roles F. CURRENT STATUS OF SDSS REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES UNIT 59 - SPATIAL DECISION SUPPORT SYSTEMS Compiled with assistance from Paul Densham, State University of New York at Buffalo A. INTRODUCTION multiple criteria methods allow for the presence of more than one objective or goal in a complex spatial problem however they assume that the problem is sufficiently precise that the goals and objectives can be defined many problems are ill-structured in the sense that the goals and objectives are not completely defined such problems require a flexible approach the system should assist the user by providing a problem-solving environment spatial decision support systems (SDSS) are designed to help decision-makers solve complex spatial problems GISs fall short of the goals of SDSS for a number of reasons: analytical modeling capabilities often are not part of a GIS many GIS databases have been designed solely for cartographic display of results - SDSS goals require flexibility in the way information is communicated to the user the set of variables or layers in the database may be insufficient for complex modeling data may be at insufficient scale or resolution GIS designs are not flexible enough to accommodate variations in either the context or the process of spatial decision-making SDSS provide a framework for integrating: 1. analytical modeling capabilities 2. database management systems 3. graphical display capabilities 4. tabular reporting capabilities 5. the decision-maker''s expert knowledge GISs normally provide 2, 3 and 4 the addition of 1 and 5 create a SDSS B. DEFINITIONS AND CHARACTERISTICS Decision support systems spatial decision support systems have evolved in parallel with decision support systems (DSS) DSS developed for business applications (corporate strategic planning, scheduling of operations, etc.) DSS literature contains a substantial body of theory and a large number of applications literature can be used to guide the design, development, implementation and use of SDSS texts on DSS include: Bonczek, Holsapple and Whinston, 1981; Sprague and Carlson, 1982; and House, 1983 many definitions of DSS require the presence of certain characteristics e.g. Geoffrion''s definition requires 6 characteristics: 1. designed to solve ill- or semi-structured problems, i.e. where objectives cannot be fully or precisely defined 2. have an interface that is both powerful and easy to use 3. enable the user to combine models and data in a flexible manner 4. help the user explore the solution space (the options available to them) by using the models in the system to generate a series of feasible alternatives 5. support a variety of decision-making styles, and easily adapted to provide new capabilities as the needs of the user evolve 6. problem solving is an interactive and recursive process in which decision making proceeds by multiple passes, perhaps involving different routes, rather than a single linear path these characteristics also define a SDSS in addition, in order to effectively support decision- making for complex spatial problems, a SDSS will need to: provide for spatial data input allow storage of complex structures common in spatial data include analytical techniques that are unique to spatial analysis provide output in the form of maps and other spatial forms C. SPATIAL DECISION-MAKING many spatial problems are complex and require the use of analysis and models many spatial problems are semi-structured or ill-defined because all of their aspects cannot be measured or modelled Example: site selection for a retail store objective is to pick the site which will maximize economic return to the company return is affected by: number of potential customers within market area accessibility of the site (e.g. is it on a main street? is it possible to turn left into the site?) visibility, signage, appearance cost of site and construction some of these factors are difficult to evaluate or predict relative impacts of each of these factors on return may be unknown (except the last - direct cost) impossible to structure the problem completely - i.e. define and precisely measure the objective for every possible solution retail site selection problem is ill-structured a system to support retail site selection must be flexible allow new factors to be introduced allow the relative importance of factors to be changed to evaluate sensitivity or to reflect differences of opinion display results of analysis in informative ways solutions to this class of problems often are obtained by generating a set of alternatives and selecting from among those that appear to be viable thus, the decision-making process is iterative, integrative and participative iterative because a set of alternative solutions is generated which the decision-maker evaluates, and insights gained are input to, and used to define, further analyses participative because the decision-maker plays an active role in defining the problem, carrying out analyses and evaluating the outcomes integrative because value judgements that materially affect the final outcome are made by decision-makers who have expert knowledge that must be integrated with the quantitative data in the models D. SDSS ARCHITECTURE Armstrong and Densham (1990) suggest that five key modules are needed in a SDSS: 1. a database management system (DBMS) 2. analysis procedures in a model base management system (MBMS) - defined later 3. a display generator 4. a report generator 5. a user interface to the programmer, this modularity facilitates software development to the SDSS user, the system appears to be a seamless entity overhead - SDSS architecture one architecture for an SDSS is shown the five software modules are represented by the boxes on the left of the diagram with the user interface, an expert system shell, encompassing the other modules the arrows between the modules depict flows of data and information the right-hand part of the diagram shows the interaction with the user who receives and evaluates output (alternative solutions) from the system which is either accepted as a solution or used to define new analyses Data Base Management System GIS database management systems are designed to support cartographic display and spatial query database of an SDSS must support cartographic display, spatial query and analytical modelling by integrating three types of data: 1. locational (spatial primitives such as coordinates and chains) 2. topological (attribute-bearing objects, e.g. points, nodes and lines, and relationships between them) 3. thematic (attributes of the topological objects, including population, elevation, and vegetation) database must permit the user to construct and exploit complex spatial relations between all three types of data at a variety of scales, degrees of resolution and levels of aggregation database management systems found in many GIS use the relational data model however, alternative data models have proved effective in applications of DSS e.g. the extended network model is an enhanced form of the network model and is effective for representing the links and nodes of transportation networks transportation networks are a popular base for developing SDSS because of the importance of applications for site selection and the abundance of methods of analysis handout- Database for site selection shows the implemented database for a site selection problem locational component consists of COORD (coordinates), NODE and CHAIN topological objects are the records POINT, L.A. NODE (possible site), LINE, STATE and CITY thematic data are the six records on the extreme left of the diagram (LINE DISTANCE, LINE FEATURE, STATE DATA, CITY DATA, POINT FEATURE and NODE DATA) arrows between the records indicate relationships, both spatial and non-spatial, e.g.: the 1:1 relation between NODE and COORD means that each node "owns" one coordinate the 1:N relation between L.A. NODES and NODE DATA indicates that each possible site owns one or more sets of data the N:M relation between CHAIN and COORD means that each chain is made up of many coordinates and that each coordinate can be part of more than one chain multiple relations of a given type are indicated by numbers beside the relevant arrows L.A. NODE owns LINE in two relations, one indicates links to possible sites with lower identifiers, the other to possible sites with higher identifiers the system set is a construct that provides direct access to records so defined - there is no need to traverse intermediate record types as in other data models e.g. it is possible to access a coordinate pair record (COORD) directly without accessing any other type of record Model Base Management System one approach to incorporating analytical models in geoprocessing systems is to develop libraries of analytical sub-routines permits large numbers of models to be made accessible very quickly, because existing programs can be patched into a system wasteful in terms of replicated code second approach, used in business applications of DSS, is to develop a model base management system (MBMS) consists of small pieces of code, each of which solves a step in an algorithm as many of these steps are common to several algorithms, this approach saves large amounts of code the system developer only has to modify one piece of code to update a step in several algorithms the MBMS also contains information about how steps are sequenced to execute a given algorithm using an MBMS facilitates rapid development and testing of new algorithms implementation may be achieved simply by adding a new formula to the MBMS in other cases new code for additional steps also may be added to the model-base Graphical and Tabular Report Generators should provide the following capabilities: high-resolution cartographic displays general-purpose statistical graphics, including two and three-dimensional scatter plots and graphs specialized graphics for depicting the results from analytical models and sophisticated statistical techniques the full range of tabular reports normally associated with each of the above User Interface must be easy to use if they are to be effective in decision- making interfaces of many current GIS systems are modelled on those of business systems, using command lines, pull-down menus and dialogue boxes the move to graphical interfaces for operating systems provides an opportunity for system designers to develop more intuitive interfaces for geoprocessing systems by using a graphical display for communication between the decision-maker and the system: icons can be used to represent system capabilities the user can select parameters, data, output, etc., easily and intuitively the user may be able to more easily visualize the processes represented within the model SPATIAL DECISION SUPPORT SYSTEMS E. DEVELOPMENT OF DSS Sprague (1980) presents a development framework three levels of technological development five functional roles overhead - DSS development framework depicts the three levels of technology and the five functional roles Three levels of technology DSS technology ranges from simple, specific applications to broadly applicable systems: 1. a specific DSS is a system being used to address a specific problem 2. a DSS generator is a set of mutually compatible hardware and software modules used to implement the specific DSS 3. a DSS toolbox is a set of individual hardware and software items which can be used to build both DSS generators and specific DSS system vendors and consulting houses who must develop many different decisions systems of broadly similar nature on a recurring basis will build generators and toolboxes that can be adapted for individual clients with specific problems Five functional roles the decision-maker is responsible for choosing, implementing and managing the solution the intermediary sits at a console and interacts physically with the system the DSS builder configures the specific DSS from the modules in the DSS generator the technical supporter adds capabilities or components to the DSS generator the DSS toolsmith develops new hardware and software tools these five roles may be filled by any number of people, individuals may have more than one function during the decision-making process, the decision-maker uses output from the system to evaluate interim solutions the result of this evaluation may be a desire to investigate other aspects of the problem which may require new capabilities to be added to the SDSS the system is updated as required by people filling the technical functional roles using the three levels of technology thus a process of system adaptation and evolution occurs rapidly during the decision-making process itself F. CURRENT STATUS OF SDSS at this point, SDSS as defined here remains a conceptual framework rather than an implemented strategy some systems approach a partial implementation of its concepts several implementations of GIS in forestry have been described as SDSS but do not satisfy the full definitions used in this unit SDSS is an important standard against which to measure spatial decision-making tools REFERENCES Armstrong, M.P. and P.J. Densham, 1990. "Database organization alternatives for spatial decision support systems," International Journal of Geographical Information Systems, Vol 3(1): . Describes the advantages of the extended network model for network-based problems. Bonczek, R.H., C.W. Holsapple, and A.B. Whinston, 1981. Foundations of Decision Support Systems, Academic Press, New York. Basic text on DSS. Densham, P.J. and G. Rushton, 1988. "Decision support systems for locational planning," in R. Golledge and H. Timmermans, editors, Behavioural Modelling in Geography and Planning. Croom-Helm, London, pp 56-90. Geoffrion, A.M., 1983. "Can OR/MS evolve fast enough?" Interfaces 13:10. Source for six essential characteristics of DSS. Hopkins, L., 1984. "Evaluation of methods for exploring ill- defined problems," Environment and Planning B 11:339-48. House, W.C. (ed.), 1983. Decision Support Systems, Petrocelli, New York. Basic DSS text. Sprague, R.H., 1980. "A framework for the development of decision support systems," Management Information Sciences Quarterly 4:1-26. Source for DSS development model. Sprague, R.H., and Carlson, E.D., 1982. Building Effective Decision Support Systems, Prentice-Hall, Englewood Cliffs NJ. Basic DSS text. REFERENCES Ghosh, A. and G. Rushton, 1987. Spatial Analysis and Location- Allocation Models, Van Nostrand, Reinhold, New York. Includes many applications of location-allocation methods. Golden, B.L. and L. Bodin, 1986. "Microcomputer-based vehicle routing and scheduling software," Computers and Operations Research 13:277-85. Reviews the availability of network analysis modules for microcomputers. Goodchild, M.F. and J.A. Donnan, 1987. "Optimum location of liquid waste disposal facilities: formation fluid in the Petrolia, Ontario oilfield," in M. Chatterji, Editor, Hazardous Materials Disposal: Siting and Management, Gower, Aldershot, UK, pp 263-73 Impact factors total of 35 criteria grouped into 7 clusters overhead - North Bay bypass study - Criteria clusters "Direct Cost" cluster includes construction and property costs "Traffic Service" cluster evaluates effectiveness of route from a traffic engineering viewpoint, includes number of miles with >2% grade "Community Planning" cluster evaluates routes against common planning criteria, including amount of land for potential development which will have improved access as a result of the highway "Neighborhood and Social Impact" cluster includes many factors measuring impact on local communities Alternative routes 9 alternatives identified each alternative is a complete route, evaluated as such two or more alternatives may share long stretches of common route, differ only in sections Combination of factors factors evaluated by a Technical Advisory Committee all major clusters represented by different members e.g. direct cost cluster represented by engineers, accountants, managers e.g. neighborhood and social impact cluster by representatives of community groups each member begins by selecting the cluster most easily understood by him/her reviews supporting text, maps, tables documenting evaluation of routes on factors in selected cluster scores each route on each of the factors in the cluster - scale of 0 to 10, 10 is best score, 0 is worst each member moves to a new cluster, scores it, eventually scores all routes on all factors in all clusters scores are totaled for each cluster and each route result is a 7 by 9 matrix for each member of the committee big differences depending on background of committee member now total over all members to get one 7 by 9 matrix implies that all members get equal weight - so membership of committee is crucial Weighting how to combine scores from different clusters to get overall evaluation of each route? overhead - North Bay bypass study - Weighting schemes results in 9 routes, 7 clusters of evaluation factors, 6 weighting schemes Concordance analysis evaluate routes separately for each of the 6 weighting schemes results in a 9x9 concordance matrix for each of the 6 weighting schemes gives a matrix of concordances for all pairs of plans repeat for each weighting scheme Results routes 2,7,9 consistently best over all weighting schemes, 8 consistently worst order of 2,7,9 changes from one scheme to another - 2 is best when cluster 6 is given a high weight this provides the decision-makers with a limited set of routes to consider now can proceed with more formal evaluation and public hearings to assess the significance of other factors REFERENCES General introduction to multicriteria decision-making: Cohon, Jared L., 1978. Multiobjective Programming and Planning, Academic Press, Mathematics in Science and Engineering, Vol. 140 Massam, B.H., 1980. Spatial Search. Pergamon, London. Gives many examples of applications of multicriteria methods, in addition to the North Bay study used in this unit. Rietveld, P. 1980. Multiple Objective Decision Methods and Regional Planning, Studies in Regional Science and Urban Economics; Volume 7, North Holland Publishing Company. Goal Programming: Lee, S. M., 1972. Goal Programming for Decision Analysis, Auerbach, Philadelphia. A general introduction to Goal Programming. The following are examples of applications of Goal Programming: Barber, G., 1976. "Land-Use Plan Design via Interactive Multi- Objective Programming," Environment and Planning 8:239- 245. Courtney, J. F., Jr., T.D. Klastorin and T.W. Ruefli, 1972. "A Goal Programming Approach to Urban-Suburban Location Preference," Management Science 18:258-268. Dane. C.W., N.C. Meador and J.B. White, 1977. "Goal Programming in Land Use Planning," Journal of Forestry 75:325-329. Weighting Method: discussed in: Cohon, Jared L., 1978. Multiobjective Programming and Planning, Academic Press, Mathematics in Science and Engineering, Vol. 140. EXAM AND DISCUSSION QUESTIONS 1. Compare the goal programming and weighting methods in terms of technique, practicality and effectiveness at reaching solutions to difficult problems. 2. Discuss the North Bay study as an exercise in community decision-making. What are its strengths and weaknesses? In what ways did it succeed or fail in involving the community in the decision-making process? 3. How might the methodology of the North Bay study be manipulated or distorted by an unscrupulous agency with a hidden agenda? What can be done to protect against this possibility? 4. One of the advantages of decision-making using GIS is that the effects of changes in criteria can be seen almost immediately, in e.g. search for the best site for an activity. Discuss the impact that this capability might have on the decision-making process. Do you regard this impact as positive or negative? 5. Select a current local planning issue and discuss the decision-making criteria being promoted by various interest groups and individuals. SYSTEM PLANNING OVERVIEW A. INTRODUCTION B. PROBLEM RECOGNITION/TECHNOLOGICAL AWARENESS Problem recognition Technological awareness Supply-push factors Demand-pull factors Collecting information on GIS Project plan C. DEVELOPING MANAGEMENT SUPPORT Example - AM/FM Project Life Cycle Administration of the project D. NEWPORT BEACH GIS PROJECT Needs awareness Management support Administration of the project Establishing the automation priorities Pilot projects REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES The introduction to this unit describes the outline for the next module. Many of the issues outlined in this unit are illustrated in a 20 minute video, GEOBASE - A Better Way, produced by and available from the City of Newport Beach, California. The video was originally intended for viewing by the City Council and other city officials to show the progress and promise of the GEOBASE system. UNIT 60 - SYSTEM PLANNING OVERVIEW Compiled with assistance from Frank Gossette, California State University, Long Beach and Warren Ferguson, Ferguson Cartotech, San Antonio and Ken Dueker, Portland State University A. INTRODUCTION in most cases, the design, purchase and implementation of a GIS is a significant commitment in terms of personnel time and money it is extremely important to understand the issues involved in the development of GISs these issues will ultimately affect the efficiency and value of the installed GIS it is possible to identify several stages in the development of a GIS these can be characterized in several ways the following general outline serves as an organizing framework for the next 6 units: development progresses through the following stages note that these are not necessarily sequential and some may operate concurrently with others 1. Problem recognition and technological awareness a necessary beginning point 2. Developing management support critical to the initiation and success of the project 3. Project definition includes identifying the current role of spatial information in the organization, the potential for GIS, determining needs and products, writing the proposal 4. System evaluation includes reviewing hardware and software options, conducting benchmark tests, pilot studies and cost benefit analysis 5. System implementation includes completion of a strategic plan, system development and startup, design and creation of the database, securing on-going financial and political support this unit looks at the two least formal and unstructured initial stages: needs awareness and building management support B. PROBLEM RECOGNITION/TECHNOLOGICAL AWARENESS in order for an organization to become interested in acquiring a GIS, someone or some group within the organization: 1. must perceive that the methods by which they are currently storing, retrieving and using information are creating problems 2. must be aware of the capabilities of GIS technology Problem recognition Aronoff (1989) suggests six problems that prompt GIS interest 1. spatial information is out of date or of poor quality e.g. often land information documents (maps and lists) are seriously outdated and questions regarding the current situation cannot be answered without digging through a stack of "updates" since the last major revisions 2. spatial data is not stored in standard formats e.g. a city''s parcel maps will often vary in quality from one area to another one area may have been "flown" and mapped using aerial photography at 1:1000 scale some years ago, but updated by hand drafting other areas may have been mapped by photographically enlarging 1:24,000 topographic maps, or city street maps of unknown quality, and hand drafting parcel boundaries maps may have been reproduced by methods which introduce significant errors, e.g. photocopy 3. several departments collect and manage similar spatial data this may result in different forms of representation, redundancies and related inefficiencies in the collection and management of the data 4. data is not shared due to confidentiality and legal concerns 5. analysis and output capabilities are inadequate 6. new demands are made on the organization that cannot be met within the data and technological systems currently available. Technological awareness sometimes the "problem" is simply an awareness of newer technologies that offer a "better way" King and Kraemer (1985, p.5) distinguish between supply- push and demand-pull factors in leading to awareness and the eventual acquisition of computing technology Supply-push factors changes in technological infrastructure improvements in technological capability in GIS: improved hardware, software, peripherals; better access to existing digital datasets, e.g. TIGER files declining price-performance ratios in GIS: impact of introduction of 286- and 386-based PCs, workstations, reduction in cost of mainframes and minis improved packaging of technical components to perform useful tasks in GIS: better (more friendly, more versatile) user interfaces, better applications software concerted marketing efforts of suppliers advertising creates an aura of necessity in GIS: hard not to go with the current trend, in spite of the fact that GIS advertising is probably low-key relative to other areas of EDP direct contact of salespeople with potential buyers in GIS: demonstrations at trade shows, presentations at conferences by vendors long-term strategies of technology suppliers selective phase-outs - vendor drops support of existing system to encourage new investment price reductions or outright donations to universities to raise students'' familiarity with product low-cost or cost-free pilot studies offered by vendors at potential customer''s site interchange - at present, there are high costs to conversion from one GIS vendor''s system to another''s - customers are "locked in" Demand-pull factors endemic demand for accomplishing routine tasks need for faster and more accurate data handling in report generation, queries, map production, analysis society''s appetite for information is unlimited in GIS, there is no upper limit to need for spatial data for decision-making there is no totally satisfactory minimum level of accuracy for data more accurate data always means better decisions institutionalized demand "keeping current" with technology maintaining systems on which the organization has become dependent affective demand perceived need among organizational actors to exploit the political, entertainment and other potentials of the technology in GIS: GIS technology is impressive in itself - high quality, color map output, 3D displays, scene generation - GIS output may be perceived to have greater credibility than hand-drawn products Collecting information on GIS once the need for GIS is recognized, an individual or group may begin gathering information on GIS in order to develop a management proposal information will need to be collected on: the status of existing GIS projects the direction the GIS industry is moving the potential applications of GIS in the organization sources of information include: personnel within the company "missionaries" or GIS proponents may have familiarity through educational background, external contacts industry consultants, system vendors, conversion service companies will be very willing to provide information industry organizations such as AM/FM International or American Congress on Surveying and Mapping (ACSM) are excellent sources a growing number of newsletters and magazines are being marketed within the GIS industry a useful mechanism is a Request for Information (RFI) sent by the company to all known vendors of GIS software should ask for: general company information system capabilities hardware and software requirements customer references general functional capabilities example applications customer support - training and maintenance programs general pricing information site visits to operating GIS projects are useful can observe the daily operations of the project gain insight from project personnel about system performance and support Project plan after consulting with industry experts, visiting other sites, considering corporate objectives, the first level of project definition and planning can occur project plan should be dynamic, adaptable, refined as better information becomes available plans will be very general, broad-brush at this stage - a general description of the desire to investigate systems further and a plan for proceeding for those charged with developing a project plan, it is important to discover who or what is the force behind the interest in GIS the individuals involved and the significance of the problem are important in determining how to proceed with selling the idea to the organization SYSTEM PLANNING OVERVIEW C. DEVELOPING MANAGEMENT SUPPORT once the need has been identified it is critical to gain support of the decision-makers who will be required to commit support in the way of funding and staff decision-makers need to be assured that the project will be developed and managed in a sound manner management will need to know: 1. what GIS is and what it can do for the organization 2. what the costs and benefits of the system will be a carefully managed development project is critical Example - AM/FM Project Life Cycle AM/FM projects tend to be very large (up to $100 million is not unusual) thus, the process of system planning and implementation must be rigorous in AM/FM because of the size of investment involved in the AM/FM area, this planning process is called the project life cycle overhead - AM/FM Project life cycle is a multi-step approach with well-defined decision points series of stages provides a generic, structured approach to planning this recommended sequence has been devised after reviewing numerous alternative methodologies decision points provide for financial analysis each decision point allows the project team to analyze progress and future risks before proceeding to the next level of commitment need to minimize risks while maximizing benefits Administration of the project with initial support assured, the project requires strong leadership to implement the system quite often, the agency realizes that their own people do not possess the expertise nor have the time to fully explore and evaluate the alternatives in this case, an outside consultant may be brought in to assist in a "needs assessment" the GIS consulting industry is growing rapidly, and now involves several of the "big 8" major international management consultancies D. NEWPORT BEACH GIS PROJECT Newport Beach, California developed one of the early successful urban GISs the following section reviews the initiation and development of their GEOBASE project this provides a general introduction to the process of GIS system development Needs awareness interest in Geographic Information Systems for multi- purpose cadastral applications arose at about the same time in several major departments of the city data processing professionals were exposed to the technology at trade shows the Utility Department saw innovations in AM/FM at the major utility companies and some larger municipalities city planners were exposed to GIS by attending professional meetings these and other departments were becoming aware of these newer technologies being successfully implemented in other cities with a core of interested individuals, an informal committee was formed to study GIS and see what it could do for them Management support to gain administrative support for a LIS, the GEOBASE Committee set about educating the major departments within the city about the benefits of GIS and recruiting their support this included Data Processing (Finance), Utilities, Planning, Building and Safety, Public Works (Engineering), Fire, Police, and even the Library a series of units and demonstrations were set up to inform departmental personnel of the proposed project the result of these efforts was a proposal to the City Council and City Manager for funding for an integrated Land Information System this proposal had the endorsement of all the departments mentioned above the GEOBASE project was approved Administration of the project in Newport Beach, a GEOBASE Steering Committee, comprised of representatives from five departments (Utilities, Planning, Data Processing, Building, and Fire) was established to guide the project''s implementation phases Establishing the automation priorities in Newport Beach, it was recognized that while potential benefits to all departments of the city might be realized, difficult decisions needed to be made concerning the priorities of data entry and application building land parcel information was the highest priority, with other infrastructure elements (street centerlines, right-of-way, and utility lines) to be entered in the initial conversion effort importantly, because the City wished to have complete control over the accuracy of the data, it was decided to do the map conversion and data entry in-house Pilot projects in the GEOBASE project, two major pilot projects were undertaken during the first year of operation one took a portion of the city and converted the parcel and infrastructure data as a "Prototype" for the eventual city-wide basemap this project was useful to determine the best ways of entering the cadastral information (scanning versus digitizing versus coordinate geometry) and for establishing the ground control and accuracy standards for the database the second project involved digitizing the entire city, block-by-block, from a smaller-scale basemap to be used to revise the City''s General Plan in this project, valuable skills were gained in map production, establishing symbolization standards for City maps, and dealing with attribute databases both projects produced useful and highly "visible" results REFERENCES Aronoff, S., 1989. Geographic Information Systems: A Management Perspective. WDL, Ottawa. This excellent text includes lengthy discussion of the GIS acquisition process. Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment. Clarendon, Oxford. Chapter 9 describes the process of choosing a GIS. King, J.L. and K.L. Kraemer, 1985. The Dynamics of Computing, Columbia University Press, New York. Presents a model of adoption of computing within urban governments, and results of testing the model on two samples of cities. Lucas, H.C., 1975. Why Information Systems Fail. Columbia University Press. EXAM AND DISCUSSION QUESTIONS 1. There are over 3000 counties in the US, each with their own needs for LIS and multipurpose cadaster. What factors would you expect to influence the priorities and plans of each county in this area? Design a questionnaire survey that could be used to verify your answer. 2. Compare the circumstances in Newport Beach to those in your local area. Are they similar? How does the state of LIS development FUNCTIONAL REQUIREMENTS STUDY A. INTRODUCTION B. DEVELOPING AN FRS 1. Identify decisions 2. Determine information products needed 3. Determine frequencies 4. Identify data sets required 5. Determine GIS operations required Scope of the FRS within the organization C. METHODS FOR CONDUCTING AN FRS 1. Fully internalized 2. Focus group 3. Interviews 4. Questionnaire D. COMPONENTS OF THE COMPLETED FRS 1. Definitions of information products 2. List of input data sets 3. List of GIS functions required E. WEAKNESSES OF THE FRS PROCESS Invalid assumptions Awareness of GIS Funding uncertainty Changing needs Value of GIS F. IMPORTANCE OF THE FRS EXAM AND DISCUSSION QUESTIONS NOTES Obtain an FRS from a local government operation to use as an illustration for this unit. Unfortunately, there are no readily available references to support this unit. UNIT 61 - FUNCTIONAL REQUIREMENTS STUDY Compiled with assistance from Warren Ferguson, Ferguson Cartotech, San Antonio, TX A. INTRODUCTION once management support has been obtained, the next step is a functional evaluation of the current manual process existing functionality and any new requirements, will be used to define the project scope and basic structure of the implemented GIS the result of this phase is the Functional Requirements Study (FRS) the functional requirements study (FRS) is the primary planning document for a GIS installation it lays out what data is needed, how it must be processed in order to make the necessary reports and products it forms the basis for a Request for Proposals (RFP) during installation and system startup, it provides the basic reference guide very structured methodologies for functional requirements studies have been developed by consulting companies these proprietary methods provide the basis for some of the competition in the lucrative GIS consulting market this unit will therefore take a broader viewpoint, not focusing on the mechanics of any one methodology B. DEVELOPING AN FRS are best created by working in the opposite direction to the GIS''s processing 1. Identify decisions begin by identifying the decisions which people in the organization are required to make what is each person''s area of management responsibility? what decisions must be made in carrying out that responsibility? 2. Determine information products needed identify the information products needed to support those decisions e.g. to schedule service crews, need a map showing locations of service calls at this point consideration of new methods and products is appropriate what additional products would be important in supporting each user''s decision-making responsibilities? how might existing products be modified/improved to support decision-making better? this process involves users in the project definition process opens communication channels helps increase support for the project allows potential problems to be identified and dealt with prior to commitment to the project users may not be familiar with GIS technology and its capabilities need to stress the irrelevance of technology at this stage - simply assume that the necessary technological capabilities exist, and concentrate on determining the user''s needs for its reports and products 3. Determine frequencies each information product will have an associated frequency e.g. the service call map must to produced every morning at 8 am 4. Identify data sets required identify the data sets which must be processed to create the required product e.g. the service calls come into my office as completed forms giving street addresses and details of the nature of the service request 5. Determine GIS operations required identify the processes or operations which must be performed on the data to create the products this step is most likely to require some knowledge of GIS operations however, it is possible to refer to operations in a generic way, or by analogy to manual operations, without knowledge of GIS technology Scope of the FRS within the organization a full FRS gives an organization a significant opportunity to examine its own operations the investigators should clearly identify the appropriate level at which to interact with each department of the organization interacting personnel need to be decision-makers and managers, not technical support since the study should focus on the decisions that are made, not on the data and procedures used an effective FRS requires a large commitment of time the organization as a whole must be willing to commit the necessary amount of time on the part of its staff less than full commitment (interruptions, absence from meetings) will destroy the purpose of the FRS C. METHODS FOR CONDUCTING AN FRS many alternative methods can be used to elicit the necessary information for the FRS methods can be ordered by the level of commitment of the organization''s time and the associated cost of the FRS the following begins with the most costly and works through to the least choice made will depend on the amount of time/money the organization is willing to commit to the FRS this depends in turn on the size of the eventual project e.g. a $2 million project may justify a $100,000 FRS, i.e. a 5% investment in good planning 1. Fully internalized Procedure: organization appoints an FRS team from its own staff FRS team trained by GIS consultant FRS team coordinates the definition of information products by organization''s staff, act as facilitators FRS team compiles information and identifies input data sets, functions required to make products under guidance of consultant consultant prepares final FRS Advantages: FRS team combine familiarity of organization''s operations with limited knowledge about GIS and FRS procedure acquired from consultant Disadvantages: cost of high level of organizational commitment 2. Focus group Procedure: consultant acts as leader at a series of group meetings of organization''s staff meetings are used to discuss procedures, prepare and edit descriptions of products and define input datasets and system functions Advantages: focus group allows consultant to facilitate but leaves work mostly to organization''s personnel excellent tool for building consensus on what is needed Disadvantages: by isolating FRS-related activity to focus group meetings, level of commitment of organization''s staff is lower 3. Interviews Procedure: consultant gathers information at interviews, prepares FRS Advantages: minimal commitment of organization''s personnel Disadvantages: organization has little or no group involvement in FRS 4. Questionnaire Procedure: consultant prepares a questionnaire with advice from the organization, circulates it to all appropriate staff Advantages: low cost, appropriate for obtaining limited information from a large number of users Disadvantages: poor quality of information gathered, no opportunity for discussion FUNCTIONAL REQUIREMENTS STUDY D. COMPONENTS OF THE COMPLETED FRS handout - Functional requirements study example (4 pages) 1. Definitions of information products see Unit 68 for handouts of products identified in an FRS products may be maps, reports, lists for each product need: frequencies of production details of input data processing steps required to make the product for maps, need associated scales, legends, symbolization details for lists and reports, need details of formats useful to prepare rough samples of each product a large organization may generate descriptions of tens or hundreds of different products 2. List of input data sets need details of data to estimate input workload volume, e.g. how many map sheets, how many records, how many attributes? format, e.g. paper maps, digital tape, survey documents sources frequency of update data sets may be shared between products e.g. basic street map may be part of many different information products important to know product priorities products cannot be generated until data is input, and input may take a long time some products may be input data for other products, which creates problems in scheduling 3. List of GIS functions required some functions may be needed only for one or two products others (e.g. plotting) may be needed for all also need to include functions for data input, e.g. digitizing list of functions must make sense to staff with no GIS knowledge E. WEAKNESSES OF THE FRS PROCESS Invalid assumptions the assumptions of the method may be invalid it may be impossible to separate issues of technology from requirements, e.g. raster vs. vector it may be impossible to anticipate the information needed to make decisions it may be difficult to anticipate the decisions that will need to be made if the roles of personnel in the organization are not adequately defined or vary too frequently can decision-making be reduced to the simple model of analysis of information products? will the products really be adequate and reliable enough? Awareness of GIS varying awareness of GIS in the organization may bias the results staff will define products based on their personal awareness of GIS, not on an abstract need for information e.g. staff may be aware of GIS use in a parallel organization, familiar with some of its products e.g. awareness of 3D perspective views may lead to requests for them, independently of actual value in decision-making process Funding uncertainty FRS assumes continued funding over the projection period can the organization sustain funding over a long implementation period many organizations find it difficult to commit funds up to 5 years ahead Changing needs will the FRS be sufficiently valid at the end of the implementation period? have to expect changes in the product set long before the system is in full operation need mechanisms for review and update Value of GIS has GIS technology been oversold? will the production schedule be delayed by data input bottlenecks? will the costs of the system overrun estimates? will the technology be obsolete by the time the project is implemented and in full production (up to 5 years may be needed for full database implementation) F. IMPORTANCE OF THE FRS despite all the uncertainty, planning, however unreliable, is undoubtedly better than no planning the exercise of a functional requirements study is beneficial to the organization in focusing discussion of its procedures irrespective of the eventual outcome management can conduct an initial financial feasibility study the costs of the existing operation are projected assuming the GIS project is not implemented these are weighed against the estimated costs of implementing the project, including costs of: pilot study (if required) system acquisition system development data conversion duplicate operation during system startup retraining EXAM AND DISCUSSION QUESTIONS 1. Discuss the methods you would adopt to carry out functional requirements studies for: a) a National Forest with a staff of 200 and responsibilities ranging from timber sales to management of historical heritage b) a small consulting firm with a staff of 5 specializing in site selection studies for retailers c) a One-Call operation answering 200 telephone queries per day about the locations of underground utility facilities likely to interfere with construction projects 2. List and review the assumptions made by the FRS process discussed in this unit 3. In what ways is the GIS FRS process different from any other FRS process in information processing? Do the differences justify a separate approach? 4. Define the input data, products and processing needed for your campus student records system. 5. RFPs and functional requirements studies are often public documents, especially when public agencies are involved. Obtain one from an agency in your area, and discuss its contents using the framework described in this unit. SYSTEM EVALUATION A. INTRODUCTION B. STRATEGIC PLAN C. REQUEST FOR PROPOSALS (RFP) Contents of the RFP Distribution of the RFP Vendor proposals D. HARDWARE AND SOFTWARE ISSUES Software Hardware E. SYSTEM CHOICE Evaluation factors Two stages of evaluation The winning proposal Risk factors REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES UNIT 62 - SYSTEM EVALUATION Compiled with assistance from Warren Ferguson, Ferguson Cartotech, San Antonio, TX A. INTRODUCTION once the functional requirements study is complete and management gives the "go-ahead", the next step is to develop the document which will solicit proposals from interested GIS vendors this document is the Request for Proposals (RFP) results from the RFP will produce a number of different GIS options for the organization, each of which will have strong points and weaknesses at this point, difficult decisions will need to be made in an attempt to match needs with products available in the current marketplace management will need assurance that the system chosen is the best option available responses to the RFP will indicate the feasibility of achieving the project''s goals an open attitude to the relationship with suppliers and the conduct of tests is essential evaluations must be open to outside scrutiny decisions may be (and frequently are) challenged by vendors and must stand up in court this unit examines these aspects of the system evaluation process: the strategic plan the RFP hardware and software issues system choice and reducing the risks B. STRATEGIC PLAN a strategic plan is essential in defining the limits of the project is important in providing guidance for many later decisions provides a level of planning above that of the FRS, less specific to the system decisions are made regarding the scale of the desired project will it be a small departmental activity or will it be integrated into operations of the whole organization? will it be centralized or distributed? how many people will be using the system at full implementation? need to address which activities need to be automated and which, if any, should remain manual how fast should acquisition of the system proceed? what are the priorities of data input, software development and output? should the project development be directed by a consultant or by in-house committees? how will the project be funded? C. REQUEST FOR PROPOSALS (RFP) the functional requirements study along with decisions made for the strategic plan form the basis for the request for proposals success of the RFP is directly proportional to the quality of the analysis on which it is based Contents of the RFP handout - Extracts from an RFP (2 pages) the RFP describes in detail the following aspects: nature of the proposed database sources of database contents required functions to create and manipulate the database specification of required products, including frequency specifies the functional requirements for the project, not the specific technical processes underlying the functions must allow vendor to adapt capabilities of system to the organization''s specific requirements e.g. must not specify raster vs. vector, or other data structure alternatives, but allow vendor to choose most appropriate the RFP must allow the vendor to determine the best configuration to satisfy the user''s requirements what size of CPU how many input devices - digitizers, scanners etc. how many output devices what software options and enhancements an RFP which is too rigid may exclude potential suppliers the details of the required proposal are made very clear: defines all the requirements outlines the form of response expected and format requirements sets deadlines Distribution of the RFP the RFP starts the formal relationship between organization and suppliers the RFP is sent to all interested suppliers potential suppliers can be identified by polling, or by inviting response to an RFI (request for information) or RFQ (request for qualifications) potential suppliers might be invited to a preliminary meeting to ask questions, reach agreement that it is worth proceeding further cost to vendor in responding to an RFP can be high, need to make sure it is worthwhile conventional approach is to distribute RFP, make first cut of vendors based on proposals received in response, then proceed with more detailed evaluations of the selected systems in the early days of GIS (pre-1984) it was common to receive very few (two or three) responses to an RFP, particularly if the RFP was detailed, because of the poor level of software development in the industry GIS industry has now advanced to the point where six to ten responses might be expected to an RFP for a large (multi-million dollar) project Vendor proposals respond in detail to the customer''s requirements include details of proposed system configuration software hardware network and communications workstations and digitizers maintenance and training costs vendors may have relatively poor data on rates of throughput for specific configurations possible that the proposal is either under- configured (cannot meet the required workload) or over-configured (excess capacity) further tests, such are benchmarks (see Unit 63) are often required to reduce these uncertainties as much as possible SYSTEM EVALUATION . HARDWARE AND SOFTWARE ISSUES Software the proliferation of GIS software available makes the choice of a single system difficult (see Unit 24) however, there is no single best software for any particular application or organization mandates, decision-making processes and data and product requirements make each installation unique software choices that will need to be considered by an organization include: sophisticated applications-specific modifications of standard packages systems with built-in customization options immature systems with great potential for innovation different capabilities with regard to data model, functionality, output, database management system, etc. will each affect the overall operation of the GIS significantly and will need to be individually evaluated and compared Hardware decisions made with respect to hardware issues determine: number of people that can work at one time size of projects that can be handled cost of purchasing and maintaining the equipment need for a computer systems manager start-up effort update potential vendor support and stability many of these issues will be addressed by the technical requirements laid out in the FRS and RFP however, there will be several trade-offs required in the final decision E. SYSTEM CHOICE evaluation requires balancing many factors Evaluation factors costs of hardware and software - will vary despite identical functionality speed and capacity of hardware quality and costs of support supplier''s background in addition to system capabilities, it is also necessary to evaluate suppliers on: financial stability position in the marketplace reports from other users about quality of support references are a useful way of obtaining this information appropriate customer references should be supplied by each vendor Two stages of evaluation does the vendor''s proposal live up to the vendor''s own claims? how does the vendor''s proposal rate against other proposals? The winning proposal must be good enough to get the project funded winning vendor and customer may need to work together in making final presentations to management justifying selection of supplier is only one part of winning project approval however a well-managed selection process is more likely to lead to a successful project Risk factors each vendor''s system has certain risks associated with its implementation the vendor''s product may not live up to expectations e.g. the hardware configuration may be insufficient for the planned workload e.g. the software may not carry out the functions as claimed many risks are associated with the project and become part of the final decision-making several of these risks and uncertainties regarding hardware and software issues have already been pointed out other risks are much more subtle e.g. since many vendors are US-based, foreign organizations must consider the stability of the value of the local currency against the US dollar the typical planning horizon for a GIS project is 5 years most factors are very difficult or impossible to forecast this far ahead however good the planning, there is a risk that the system will not satisfy the end-users in fact the winning vendor''s system may fall short of requirements in several key areas it may be necessary to modify the system definition because of limited vendor capabilities - some products may have to be dropped in other cases, the final contract should require the vendor to develop software to deal with these problems when additional software development is required, the contract must include deadlines and penalties because success is heavily dependent on the additional software being supplied on time and fully debugged this situation is still common because of the immature state of the GIS industry in view of these risks, an investment of 5% or even 10% of project costs in planning and system evaluation is more than justified organizations wishing to reduce these risks further may conduct one or more additional sophisticated, though costly, procedures before making the final commitment these include: benchmark tests pilot studies cost benefit analyses REFERENCES Forrest, E., G.E. Montgomery and G.M. Juhl, 1990. Intelligent Infrastructure Workbook: A Management-Level Primer on GIS, A-E-C Automation Newsletter, P.O. Box 18418, Fountain Hills, AZ 85269-8418. Guptill, S., 1988. "A process for evaluating GIS," USGS Open File Report 88-105. The report of the Federal Interagency Coordinating Committee on Digital Cartography (FICCDC) on GIS evaluation. Smith, D.R., 1982. "Selecting a turn-key geographic information system using decision analysis," Computers, Environment and Urban Systems 7:335-45. EXAM AND DISCUSSION QUESTIONS 1. Review the approach to system selection documented in Smith (1982). What are the arguments for and against the rigorous decision-theoretic approach used in this paper? 2. Discuss the steps in planning and choosing a GIS system. What are the risks associated with a project, and how are these reduced in the project lifecycle approach? 3. "The best-laid plans of mice and men...". Despite the use of a well-defined framework, mistakes inevitably happen in the best-designed projects. Discuss the weaknesses in the approach described in these units. BENCHMARKING . INTRODUCTION Two types of benchmarking Benchmark script B. QUALITATIVE BENCHMARKS C. QUANTITATIVE BENCHMARKS Performance evaluation (PE) Subtasks for GIS PE Requirements for a quantitative benchmark GIS PE is more difficult D. EXAMPLE MODEL OF RESOURCE UTILIZATION Subtasks Products and data input Frequency required Execution of tasks Prediction Forecast Summary of phases of analysis E. APPLICATION OF MODEL Three phases of benchmark Qualitative benchmark Quantitative benchmark Model F. LIMITATIONS G. AGT BENCHMARK EXAMPLE Project Background REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES This unit is contains far more information than can possibly be covered in a single lecture. The middle sections, D and E, contain a detailed technical review of a benchmark model. Depending on the abilities and interests of your students you may wish to omit these sections and move on to the description of the AGT benchmark in section G, or focus the lecture on the technical aspects and omit the descriptive example. UNIT 63 - BENCHMARKING A. INTRODUCTION benchmarking is a key element in minimizing the risk in system selection often the customer does not have precise plans and needs - these will be determined to some extent by what the GIS industry currently has to offer no vendor''s product yet meets the requirements of an ideal GIS customer needs reassurance - real, live demonstration - that the system can deliver the vendor''s claims under real conditions the GIS industry is still young, there are two few success stories out there a benchmark allows the vendor''s proposed system to be evaluated in a controlled environment customer supplies data sets and a series of tests to be carried out by the vendor and observed by the customer an evaluation team is assembled and visits each vendor, performing the same series of tests on each system tests examine specific capabilities, as well as general responsiveness and user-friendliness reinforces the written response from the vendor by actual demonstration of capabilities demonstration is conducted in an environment over which the customer has some control - not completely at the vendor''s mercy as e.g. at trade show demonstrations equipment is provided by the vendor, data and processes must be defined by the customer a benchmark can be a major cost to a vendor - up to $50,000 for an elaborate benchmark in some cases part of these costs may be met by the customer through a direct cash payment Two types of benchmarking qualitative benchmark asks: are functions actually present? do they live up to expectations? are they easy to use? quantitative benchmark asks: does the proposed configuration have the necessary capacity to handle the planned workload? Benchmark script handout - Benchmark script example (2 pages) benchmark uses a script which details tests for all of the functions required permits both: subjective evaluation by an observer (qualitative) objective evaluation of performance (quantitative) must allow all of the required functionality to be examined failure of one test must not prevent remainder of test from being carried out must be modular customer must be able to separate the results of each test conditions must be realistic real data sets, realistic data volumes B. QUALITATIVE BENCHMARKS in the qualitative part of the benchmark it is necessary to evaluate the way the program handles operations functions cannot be evaluated simply as present or absent overhead - Qualitative assessment functions are not all equally necessary - they may be: necessary before any products can be generated, e.g. digitizing necessary to some products but not others, e.g. buffer zone generation necessary only to low-priority products, i.e. nice to have C. QUANTITATIVE BENCHMARKS in quantitative tests, procedures on problems of known size are executed analysis of results then establishes equations which can be used to predict performance on planned workload e.g. if it takes the vendor 1 hour to digitize 60 polygons during the benchmark, how many digitizers will be needed to digitize the planned 1.5 million polygons to be put into the system in year 1? this is known in computer science as performance evaluation Performance evaluation (PE) developed in the early days of computing because of need to allocate scarce computing resources carefully a subfield of computer science requires that tasks be broken down into subtasks for which performance is predictable early PE concentrated on the machine instruction as the subtask specific mixes of machine instructions were defined for benchmarking general-purpose mainframes e.g. the "Gibson mix" - a standard mixture of instructions for a general computing environment, e.g. a university mainframe multi-tasking systems are much more difficult to predict because of interaction between jobs time taken to do my job depends on how many other users are on the system it may be easier to predict a level of subtask higher than the individual machine instruction modern operating systems must be "tuned" to perform optimally for different environments e.g. use of memory caching, drivers for input and output systems Subtasks for GIS PE specifying data structures would bias the benchmark toward certain vendors e.g. cannot specify whether raster or vector is to be used, must leave the choice to the vendor similarly, cannot specify programming language, algorithms or data structures a GIS benchmark must use a higher level of subtask an appropriate level of subtask for a GIS benchmark is: understandable without technical knowledge makes no technical specifications e.g. "overlay" is acceptable as long as the vendor is free to choose a raster or vector approach e.g. "data input" is acceptable, specifying digitizing or scanning is not therefore, a GIS PE can be based on an FRS and its product descriptions, which may have been generated by resource managers with no technical knowledge of GIS Requirements for a quantitative benchmark need a mathematical model which will predict resource utilization (CPU time, staff time, plotter time, storage volume) from quantities which can be forecast with reasonable accuracy numbers of objects - lines, polygons - are relatively easy to forecast technical quantities - numbers of bytes, curviness of lines - are less easy to forecast the mathematical form of the model will be chosen based on expectations about how the system operates e.g. staff time in digitizing a map is expected to depend strongly on the number of objects to be digitized, only weakly on the size of the map (unless large maps always have more objects) requires a proper balance between quantitative statistical analysis and knowledge about how the procedures operate GIS PE is more difficult GIS PE is more difficult than other types of PE because: uncertainties over the approach to be adopted by the vendor (data structure, algorithms) high level at which tasks must be specified difficulty of forecasting workload no chance of high accuracy in predictions however even limited accuracy is sufficient to justify investment in benchmark BENCHMARKING D. EXAMPLE MODEL OF RESOURCE UTILIZATION this section describes a mathematical model developed for a quantitative benchmark overhead - Model of resource utilization handout - A model of resource utilization Subtasks begin with a library of subtasks L this is the set of all GIS functions defined conceptually e.g. overlay, buffer zone generation, measure area of polygons, digitize Products and data input FRS identified a series of products identified as R1, R2,...,Ri,... each product requires a sequence of subtasks to be executed data input also requires the execution of a series of subtasks for each dataset, e.g. digitize, polygonize, label Frequency required each product is required a known number of times per year Yij is the number of times product i is required in year j knowledge extends only to the end of the planning horizon, perhaps year 5 Execution of tasks execution of a subtask uses resources e.g. CPU, staff or plotter time these can be quantitatively measured e.g. CPU time measured in seconds e.g. staff time in minutes note: indications are (Goodchild and Rizzo, 1987) that staff time (human) is more predictable than CPU time (machine) because of complications of computer accounting systems, multitasking etc. Mak is the measure of resource k used by subtask a k is one of the resources used a is one of the subtasks in the library L Prediction in order to predict the amount of resources needed to create a product, need to find a mathematical relationship between the amount of resource that will be needed and measurable indicators of task size e.g. number of polygons, queries, raster cells, lines Pakn is predictor n for measure k, subtask a Mak = f(Pak1,Pak2,...,Pakn,...) e.g. the amount of staff time (Mk) used in digitizing (a) is a function of the number of polygons to be digitized (Pak1) and the number of points to be digitized (Pak2) the general form of the prediction function f will be chosen based on expert insight into the nature of the process or statistical procedures such as regression analysis e.g. use the results of the benchmark to provide "points on the curve" with which to determine the precise form of f Forecast given a prediction function, we can then forecast resource use during production with useful, though not perfect, accuracy Wkit is the use of resource k by the tth subtask required for a single generation of product i Wki = sum of Wkit for all t is the amount of the resource k used by all subtasks in making product i once Vkj = sum of (Wki Yij) for all i is the amount of resource k used to make the required numbers of all products in year j Summary of phases of analysis overhead - Summary of phases of analysis 1. Define the products and subtasks required to make them 2. Evaluate each subtask from the results of the qualitative benchmark 3. Analyze the system''s ability to make the products from the qualitative evaluations in (2) above 4. Obtain performance measures for known workloads from the results of the quantitative benchmark 5. Build suitable models of performance from the data in (4 ) above 6. Determine future workloads 7. Predict future resource utilization from future workloads and performance models, and compare to resources available, e.g. how does CPU utilization compare to time available? E. APPLICATION OF MODEL this section describes the application of this model of resource use in a benchmark conducted for a government forest management agency with responsibilities for managing many millions of acres/hectares of forest land FRS was produced using the "fully internalized" methodology described in Unit 61 FRS identified 33 products 50 different GIS functions required to make them out of a total library of 75 GIS acquisition anticipated to exceed $2 million Three phases of benchmark 1. data input - includes digitizing plus some conversion of existing digital files 2. specific tests of functions, observed by benchmark team 3. generation of 4 selected products from FRS these three phases provided at least one test of every required function for functions which are heavy users of resources, many tests were conducted under different workloads e.g. 12 different tests of digitizing ranging from less than 10 to over 700 polygons Qualitative benchmark each function was scored subjectively on a 10-point scale ranging from 0 = "very fast, elegant, user-friendly, best in the industry" to 9 = "impossible to implement without major system modification" score provides a subjective measure of the degree to which the function inhibits generation of a product maximum score obtained in the set of all subtasks of a product is a measure of the difficulty of making the product Quantitative benchmark since this was an extensive study, consider for example the quantitative analysis for a single function - digitizing digitizing is a heavy user of staff time in many systems delays in digitizing will prevent system reaching operational status digitizing of complete database must be phased carefully over 5 year planning horizon to allow limited production as early as possible as stated above, benchmark included 12 different digitizing tasks resource measure of digitizing is staff time in minutes predictors are number of polygons and number of line arcs line arcs are topological arcs (edges, 1-cells) not connected into polygons, e.g. streams, roads other predictors might be more successful - e.g. number of polygons does not distinguish between straight and wiggly lines though the latter are more time-consuming to digitize - however predictors must be readily accessible and easy to forecast sample of results of quantitative benchmark: polygons line arcs staff time (mins) 766 0 930 129 0 136 0 95 120 benchmark digitizing was done by vendor''s staff - well- trained in use of software, so speeds are likely optimistic Model overhead - Models of time resources required expect time to be proportional to both predictors, but constants may be different m = k1p1 + k2p2 m is measure of resource used p is a predictor - p1 is polygons, p2 is line arcs k1, k2 are constants to be determined Results the equation which fits the data best (least squares) is: m = 1.21 p1 + 0.97 p2 i.e. it took 1.21 minutes to digitize the average polygon, 0.97 minutes to digitize the average line arc to predict CPU use in seconds for the digitizing operation: m = 2.36 p1 + 2.63 p2 i.e. it took 2.36 CPU seconds to process the average polygon uncertainties in the prediction were calculated to be 34% for staff time, 44% for CPU time suggests that humans are more predictable than machines adding together staff time required to digitize the forecasted workload led to the following totals: Year Time required (minutes) 1 185,962 2 302,859 3 472,035 4 567,823 5 571,880 6 760,395 the average working year has about 120,000 productive minutes in the daytime shift by year 6 the system will require more than 6 digitizing stations, or 3 stations working 2 shifts each, or 2 stations working 3 shifts each this was significantly higher than the vendor''s own estimate of the number of digitizing stations required, despite the bias in using the vendor''s own staff in the digitizing benchmark F. LIMITATIONS difficult to predict computer performance even under ideal circumstances GIS workload forecasting is more difficult because of the need to specify workload at a high level of generalization the predictors available, e.g. polygon counts, are crude the model is best for comparing system performance against the vendor''s own claims, as implied by the configuration developed in response to the RFP it is less appropriate for comparing one system to another it assumes that the production configuration will be the one used in the benchmark staff will have equal levels of training hardware and software will be identical it is difficult to generalize from one configuration to another - e.g. claims that one CPU is "twice as powerful" as another do not work out in practice however, any prediction, even with high levels of uncertainty, is better than none after a quantitative benchmark the analyst probably has better knowledge of system performance than the vendor G. AGT BENCHMARK EXAMPLE Project Background in 1983, Alberta Government Telephones (AGT) had been operating a mechanized drawing system for 5 years however, lack of "intelligence" in automated mapping system was increasingly hard to justify given growing capabilities of GIS management was showing interest in updating the record-keeping system an FRS and RFP for an AM/FM system were developed by a consultant in cooperation with staff three companies were identified as potential suppliers and a benchmark test was designed tests included placement and modification of "plant" (facilities), mapping, report generation, engineering calculations, work order generation tests were designed to be progressively more difficult all vendors were not expected to complete all tests data and functional requirements analysis were sent in advance to all vendors for examination actual benchmark script and evaluation criteria were not sent in advance vendors were asked to load supplied data in advance of benchmark methods chosen to load and structure data were part of the evaluation visits were made to each vendor 5 weeks before the actual benchmark to clarify any issues providing the data before the script is typical of benchmarks for systems that are primarily query oriented prevents planning for the queries that are presented in the script on the other hand, benchmarks for systems that are product oriented will normally provide the script in advance in the AGT case, actual benchmarks were conducted by a team of 3, spending one full working week at each vendor during the benchmark the vendor''s staff were responsible for interacting with the system, typing commands, etc. the benchmark team acted as observers and timekeepers, and issued verbal instructions as appropriate must recognize that the vendor''s staff are more familiar with the system than the typical employee will be during production thus the benchmark is biased in favor of the vendor in its evaluation of user interaction - the vendor''s staff are presumed to be better than average digitizer operators etc. during the benchmark, the intent of each phase of testing was explained to the vendor positive and negative evaluations were communicated immediately to the vendor the project team met each evening to compare notes a wrapup session at the end of the benchmark identified major difficulties to the vendor, who was invited to respond when the three benchmarks were completed the results were assessed and evaluated and became part of the final decision-making stages REFERENCES Goodchild, M.F., 1987. "Application of a GIS benchmarking and workload estimation model," Papers and Proceedings of Applied Geography Conferences 10:1-6. Goodchild, M.F. and B.R. Rizzo, 1987. "Performance evaluation and workload estimation for geographic information systems," International Journal of Geographical Information Systems 1:67-76. Also appears in D.F. Marble, Editor, Proceedings of the Second International Symposium on Spatial Data Handling, Seattle, 497-509 (1986). Marble, D.F. and L. Sen, 1986. "The development of standardized benchmarks for spatial database systems," in D.F. Marble, Editor, Proceedings of the Second International Symposium on Spatial Data Handling, Seattle, 488-496. EXAM AND DISCUSSION QUESTIONS 1. Discuss the Marble and Sen paper listed in the references, and the differences between its approach and that presented in this unit. 2. How would you try to predict CPU utilization in the polygon overlay operation? What predictors would be suitable? How well would you expect them to perform based on your knowledge of algorithms for polygon overlay? 3. Since a computer is a mechanical device, it should be perfectly predictable. Why, then, is it so difficult to forecast the resources used by a GIS task? 4. Compare the approach to GIS applications benchmarking described in this unit with a standard description of computer performance evaluation, for example D. Ferrari, 1978, Computer Systems Performance Evaluation. Prentice Hall, Englewood Cliffs, NJ. 5. In some parts of the computing industry, the need for benchmarks has been avoided through the development of standardized tests. For example such tests are used to compare the speed and throughput rates of numerically intensive supercomputers, and of general-purpose mainframes. Are such tests possible or appropriate in the GIS industry? 6. GIS product definition exercise - 2 following pages. PILOT PROJECT . INTRODUCTION Formats for pilot projects B. MANAGEMENT OF A PILOT PROJECT Objectives Issues in pilot design Results of the pilot C. EXAMPLE PILOTS - AM/FM SYSTEMS Pilot projects in AM/FM Salt River Project Pilot comparisons REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES There are no widely available references for this unit. We have included a long handout if you want to give your students some background. UNIT 64 - PILOT PROJECT Compiled with assistance from Warren Ferguson, Ferguson Cartotech, San Antonio, TX A. INTRODUCTION pilot project provides the first physical results from a GIS project is usually the last major milestone prior to corporate and technical commitment recognizes the difference between reading about the system and actually experiencing how it operates a pilot is part of the effort to "sell" the system within the organization the results of pilot projects can be shown to decision-makers as evidence of the system''s immediate value provide a tangible way of communicating the potential of the system to skeptics within the organization some organizations may go to full production without a pilot in this case may need to rework the first deliverables as the system is "run in" this risks alienating users pilots are useful for verifying estimates of costs and benefits evaluating hardware, software, system and database design, procedures and alternatives in summary, pilots provide a range of reduction of risks associated with project before final commitment to full production is made Formats for pilot projects 1. demonstration of concepts a chance for the organization to see a similar system running in production, evaluate its products will be a demonstration of limited facilities on a small area, using a system which may not be part of the final production system, mainly for development and hands-on experience in some cases even the data may not be part of the organization''s operation provides early visibility of the system to management and users 2. prototype a full-scale model of the future system designed to identify any problems not foreseen by FRS and benchmarks, to finalize design and the conversion process may be: a "Development and Technical Prototype" to test code and learn the system an "Applications Prototype" demonstrating potential applications development generally convert an entire region or operating division of the organization from existing procedures to the new system B. MANAGEMENT OF A PILOT PROJECT the pilot project should be defined and managed as effectively as the major project of which it is part objectives must be defined clearly Objectives evaluate system design hardware and software system performance database design updating of cost estimates for development test alternatives ways of generating products formats for products evaluate map input and conversion procedures evaluate whether or not to use outside suppliers for data input and conversion improve estimates of input and conversion schedules and costs improve information on data sources test management procedures training for staff production scheduling system management maintenance schedules market system to end-users and management Issues in pilot design enthusiasm and support of management if support is minimal, the pilot must be oriented to building a sound business case for the system funds available an effective pilot will have a substantial cost to be successful the pilot must justify this cost and the subsequent, larger cost of the production system geographical area of pilot if the pilot covers a region within the organization''s service area, this region must be a significant proportion of the total area level of staff experience pilot project design must consider the current level of experience of the project staff must allow sufficient training and experience for those involved to permit realistic evaluation of the potential of the system the corporate environment success of the project depends on corporate climate - how conservative, how risk-averse Results of the pilot at bare minimum: experience in implementing a GIS project management approval to proceed with major project ideally, it will: reduces risk in all areas increases the effectiveness of the major project improves efficiency in the early stages of the major project a pilot can result in: trained staff and users well-developed technical, managerial and production procedures near-production computer code an improved implementation plan enthusiastic support of management and users C. EXAMPLE PILOTS - AM/FM SYSTEMS all pilots are unique to their corporate and technical context because of the major investments involved in AM/FM projects, AM/FM installations provide good examples of carefully planned pilot projects Pilot projects in AM/FM first in late 1960s in Cheyenne, Wyoming, by Public Service Company of Colorado showed that technology and software cost and performance were not sufficiently advanced to support a large AM/FM project some pilots today use consultants and hardware/software environments that can produce results in 4 months these are generally for small municipalities and utilities, less than 100,000 customers larger projects requiring investments in the $10 million to $100 million range may require 1 to 2 year pilots to meet design objectives Salt River Project is a water management system in Arizona active in AM/FM since 1979 overhead - Salt River Project (2 pages) Pilot comparisons overhead - Comparison of several AM/FM pilots table summarizes 11 AM/FM pilots by size and schedule length of time used and size are functions of: scope of pilot definition resource commitment corporate experience in AM/FM type of service area (urban/rural) system purchased contents of database range of applications demonstrated system requirements REFERENCES "Pacific Gas and Electric project history", see handout following (7 pages). EXAM AND DISCUSSION QUESTIONS 1. Review and discuss the handout provided on Pacific Gas and Electric project history. 2. Summarize the arguments for and against the use of a pilot project as part of the planning process for a major GIS project. COSTS AND BENEFITS A. INTRODUCTION What is benefit/cost analysis? Why do it? Accrual B. DEFINING COSTS One-time vs recurring costs C. BENEFITS OF A GIS Classifying benefits Examples of benefits D. COMPARING COSTS AND BENEFITS E. EXAMPLE - WASHINGTON STATE Background Installed system Data Costs Benefits Benefits vs Costs Intangible benefits - Orphan roads project REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES UNIT 65 - COSTS AND BENEFITS Compiled with assistance from Holly J. Dickinson, State University of New York at Buffalo A. INTRODUCTION What is benefit/cost analysis? assessment of benefits of a GIS installation - what is the value of its products? assessment of costs (initial and recurring) comparison of benefits and costs project should go ahead only if benefits exceed the costs for comparison, benefits and costs must be comparable - measured in same units, over same period of time Why do it? a major GIS implementation is a large monetary investment and upper management wants to know the expected benefits of the system before they agree to the purchase overhead - Costs of 3 example systems three uses of benefit/cost analysis of computer systems: 1. planning tool for choosing among alternatives select the system which meets minimal benefit requirements and offers the highest benefit/cost ratio 2. quantitative support to politically influence a decision a major factor in influencing the decision to proceed 3. an audit tool for existing projects future planning for the system can be based on the outcome benefit/cost analysis is a standard procedure in many areas, including the information processing industry (see King and Schrems, 1978, p. 20) Accrual an organization will want to know the costs and benefits that accrue to the organization (i.e. must be borne by and benefit the organization respectively) these are not necessarily all of the costs some costs may be borne by government through cost-sharing arrangements some costs may be borne by the vendor the benefits which accrue to the organization are not necessarily all of the benefits of the system some government organizations may wish to make decisions based on costs and benefits to society as a whole, not to the organization alone B. DEFINING COSTS the most important aspect of reporting costs is to include all costs, not just the acquisition of the hardware and GIS software overhead - Possible cost categories not all of these categories will be relevant to all GIS implementations whether or not to include certain costs leads to questioning the purpose of the agency as well as the purpose of the GIS for example: should the cost of data collection be included in the total costs for a GIS implementation? No - if the data would have been collected whether or not a GIS was to be implemented Yes - if the data was collected specifically to create the GIS database Partially - if the data would have been collected, but not at the higher level of precision preferred for the GIS database One-time vs recurring costs one-time costs are incurred for hardware, software, possibly data, staff training recurring costs are incurred for maintenance contracts, staff salaries, rent, utilities, etc. one-time and recurring costs and benefits must be adjusted to identical time periods for purposes of comparison e.g. sum the one-time and recurring costs and benefits over entire period of project, e.g. 5 years e.g. express recurring costs and benefits on an annual basis, and apportion one-time costs appropriately e.g. assign 1/5 of one-time costs to each year of project - may have to add interest charges on initial investment, allowances for inflation, etc. C. BENEFITS OF A GIS benefits are much more difficult to quantify than costs costs can be expressed in dollars benefits are often intangible, difficult or impossible to quantify are generally tied to the expected products products may be: the same products as before but created by using the GIS instead of the previous manual or CAD/CAM, (i.e., non-GIS) methods generally the same amount can be produced for less cost, or more can be produced at the same cost new products that could not be produced without the GIS types of products 1. simple map output of the database or subsets thereof 2. map products requiring the spatial analysis functions of a GIS 3. products which may not be end products, but input to a decision making process benefit/cost analysis based solely on map output is different from an analysis involving the spatial analysis and decision support system functions of a GIS the latter type is much more complex there is a need to understand how decision makers use information, specifically geographical information, and how they value that information difficult to define some "products" e.g. the concept is clear enough in the case of a map or report but less so when the GIS is used to browse a database there is still much to be understood about supply and demand for GIS products Classifying benefits tangible benefits: cost reductions decreased operating costs staff time savings cost avoidances increased revenue intangible benefits: improved decision making decreasing uncertainty improving corporate or organizational image Examples of benefits total cost of producing maps by manual means was greater than total cost of making identical maps using GIS tangible benefit use of GIS allows garbage collection company to reduce staff through better scheduling of workload and collection routes tangible, possible to quantify emergency vehicles reduce average arrival time by using GIS- supplied information on road conditions tangible if we can quantify the increased cost resulting from delayed arrival (fire has longer to burn, heart attack victim less likely to survive, etc.) timber company reduced costs of logging because GIS could be used to avoid costly mistakes in locating roads and other logging infrastructure tangible but hard to quantify, implies we can predict the mistakes which would have been made in the absence of GIS information from GIS was used to avoid costly litigation in land ownership case tangible but hard to quantify, implies we can predict the outcome of the case if GIS information had not been available Forest Service finds a better location for a campsite through use of GIS intangible, implies we can predict the decision which would have been made in the absence of GIS some of the problems with measuring benefits might be subject to research e.g. take two managers, supply one with GIS information, compare resulting decisions - but the results would be hard to generalize D. COMPARING COSTS AND BENEFITS those benefits easily quantified can be compared directly to costs however, it may be wrong to look at the problem as a matter of predicting costs and benefits as static, simple quantities realistically, a system is likely to change substantially over any extended planning horizon the ability to expand the system easily without major structural change may be a hidden benefit Dickinson and Calkins (1988) discuss a model of cost- effectiveness under varying levels of investment overhead - Cost-effectiveness curve the manual system produces good performance for low levels of investment, but performance fails to grow rapidly as investment increases the automated system has high initial cost, but expandability ensures that performance continues to increase as investment increases Case A shows the reduction in cost from switching from manual to automated at current levels of performance Case B shows the increase in performance from investing the amount currently spent on the manual system the old (manual) system is replaced not because its costs are currently high but because additional investment will produce little increase in system performance relative to the new GIS the appropriate point to switch from manual to automated is at the intersection of the two curves this argument assumes that the benefits of the two systems are the same, and makes the decision based on cost the argument is conservative if we believe that the benefits of GIS are at least as high as those of the manual system because of the difficulty of quantifying intangible benefits, one possibility is to document them as completely as possible and leave their evaluation to the final decision-making group (where the buck finally stops) COSTS AND BENEFITS E. EXAMPLE - WASHINGTON STATE following is a brief analysis of the benefits and costs of a specific GIS implementation (note: the full case study can be found in Dickinson, 1988) Background the organization is Department of Natural Resources, State of Washington, Olympia, WA seven regional offices and one central office in Olympia manages three million acres of state-owned land, two million are forested; the rest are in urban, recreational, or agricultural uses charged with producing revenue, management of the natural resources, and public service involving such activities as: clearcutting, thinning, fire and insect control, stand conversion, market harvesting, replanting, land exchanges, recreation site planning these activities can create up to 200 changes daily, in landuse and landcover, affecting up to 13,000 ownership parcels pre-1980, activity centered around sustainable harvest forestry two computerized systems were used during this time: GRIDS (Gridded Resource Inventory Data System) - able to calculate sustainable harvest yields and produce forest inventory reports and line printer maps CALMA (Calmagraphics Mapping System) - a computer aided drafting system used to maintain soil maps for the state in the 1980s, the Forest Land Management Program was adopted required Multiple Use Forest Planning, environmental analysis, and overall, more effective analysis of geographic data possible answers to this need were either more staff or a GIS the choice was a GIS, and expected products included: overhead - Washington State study - Examples of Products base maps of land use and land cover data land lease and land exchange maps road and bridge maintenance maps environmental impact analysis potential debris flow hazard maps fire hazard maps timber harvest tracking spatial allocation of workloads Installed system overhead - Washington State study - Description of GIS GIS was installed in November of 1983 system is known as GEOMAPS (GEOgraphic Multiple use Analysis and Planning System) consists of ARC/INFO software and associated macros (procedures) built around ARC/INFO Equipment: Central Office PRIME 9955 (upgraded as of 4/1/89) 6 Tektronix CRTs 11 other type CRTs 5 digitizers 2 pen plotters Equipment: Regional Offices workstation consisting of one graphics and one alpha CRT, digitizer, pen plotter, line printer, modem communications Staff: Central Office 1 administrator, 3 user-coordinators (to coordinate needs between regional offices and central office), 4 programmers, 11 production people Staff: Regional Offices 1 GEOMAPS coordinator Data overhead - Washington State study - Data database is centralized regional offices are responsible for updates to their area, but actual update to the master database is performed in the central office, only after the updates have been checked and verified two main data layers exist: 1. POCA - Public Land Survey Data, State Ownership Parcels, County and Administrative Boundaries 60% of this layer is at a scale of 1:12000; 40% at 1:24000 this layer took 3-8 people over an 8-year time period to digitize (40 person years) 2. LULC - Land Use and Land Cover Inventory Data; scale: 1:24,000 no records on digitizing time were available updates to this data layer occur approximately 2,000 times per year these two data layers were combined (polygon overlay) to produce the composite layer called POCAL approximately 64,600 polygons, each with 77 attributes; updates occur at a rate of about 35 polygons per week the other major data layer contains all soil data (300,000 polygons, 1:24000 scale) existed in digital form before GEOMAPS entry of road and hydrological data was being planned in 1988 Costs overhead - Detailed costs of Geomaps shows the detailed costs recorded for Fiscal Years 1984 to 1987 note the percentage of total costs that the different categories of costs cover: hardware and software = 33% maintenance contracts = 9% staff = 43% travel = 1% supplies and services = 14% overhead - Resource management system costs taken from a DNR report and shows costs of all three systems total costs for each system are: GEOMAPS (FY 82-87) =$ 4,611,000 CALMA (FY 80-86) =$ 947,302 GRIDS (FY 80-81) =$ 1,162,613 Benefits overhead - Summary of GEOMAPS benefits shows the summary of tangible benefits from GEOMAPS as estimated by the DNR staff figures appeared in the Post-Implementation Review approved by the DNR executives as well as State data authorities all estimates are considered to be very conservative the categories of tangible benefits are as follows: 1. increased revenue due to the increased net value of timber by optimal thinning choices based on analysis of information about physical parameters of timber stands, location of work camps, and market prices 2. decreased costs better stewardship by means of better management based on improved calculations, planning tools, and the effective use and storage of data intensive management produced an estimated decrease of $7 per acre for thinning operations due to decreased number of ground visits, automatic preparation of contract maps, and ability to rank sites for priority harvest based on market information 3. staff savings estimated staff time savings by using GEOMAPS (this includes salary only, not benefits) 4. cost reductions DNR also claimed benefits from the cost reductions resulting from the phasing out of the two prior systems Benefits vs Costs there are two ways to treat the cost reductions from phasing out the old system: 1. cost reductions can be added to the benefits of GEOMAPS and compared to the costs of all three systems over the total time period (call this version 1) overhead - Benefits vs costs version one shows there is a positive benefit/cost ratio between total benefits and costs for all three systems for the fiscal years of 1982, 83, 84, 86, and 87 overhead - Benefits/costs - Version one graph 2. if we only want to look at the benefits and costs of GEOMAPS, we could subtract the cost reductions from the GEOMAPS costs, and then compare this total to the new tangible benefits of GEOMAPS only (version two on overhead) also shows a positive benefit/cost ratio between the new tangible benefits from GEOMAPS and the costs of GEOMAPS itself for fiscal years of 1984, 86, 87 and 88 Intangible benefits - Orphan roads project a very specific application of GEOMAPS was not entered into the benefit/cost analysis, primarily because the benefit could not be easily quantified however, the benefit is by no means trivial before the 1970 Forest Act, forest road construction was unregulated loggers would build temporary roads and bridges when they moved in to log a new area when the task was finished, the roads were left behind (i.e., orphan roads) since they were only temporary roads, many were constructed on steep gradients without usual engineering controls this create a high potential for debris flows where these roads cross streams two disasters, resulting in the loss of lives, were caused by the poor placement of such roads each of these disasters cost the DNR over two million dollars in law suits many other orphan roads exist and are still being used across the state GEOMAPS was used to locate potential hazard locations by locating potential debris flow trigger points data used included: road locations, categorized by year of construction (1941, 1947, 1956/62/65, 1969, 1976/78, and 1983) stream locations elevation data in a TIN data structure procedure: ARC/TIN was used to create a contour map from the elevation data this was overlaid with the stream data to trace to the stream heads, calculate gradients, and categorize the streams into those with a gradient of less than 3.6 degrees, between 3.6 and 8 degrees, and greater than 8 degrees ARC/ALLOCATE was used to flag all intersections of roads and streams with a gradient greater than 8 degrees for the allocation model, the impedance factor was the gradient, and the resource was the debris in the stream these intersections were potential trigger points for debris flow obviously, a benefit exists by using GEOMAPS in this type of analysis but how to quantify the benefit, and how (or if) to include it in benefit/cost analysis? REFERENCES Dickinson, H.J., 1988, "Benefit/Cost Analysis of Geographic Information System Implementation," unpublished Master''s Thesis, Department of Geography, State University of New York at Buffalo, NY Dickinson, H.J., and H.W. Calkins, 1988, "The Economic Evaluation of Implementing a GIS," International Journal of Geographical Information Systems 2:307-327. Epstein, E., and T.D. Duchesneau, 1984, "The Use and Value of a Geodetic Reference System," University of Maine, Orono, Maine. Available from the National Geodetic Information Center (NOAA), Rockland, Maryland, USA. Joint Nordic Project, 1987. Digital Map Data Bases, Economics and User Experiences in North America, Publications Division of the National Board of Survey, Helsinki, Finland. King, John L., and E.L. Schrems, 1978, "Cost-Benefit Analysis in Information Systems Development and Operation," Computing Surveys 10:19-34. Stutheit, J., 1990. "GIS procurements: Weighing the costs", GIS World, April/May 1990:69-70. A general overview of a process conducted by the US Forest Service to determine the costs and benefits of a GIS project. Clapp, J.L., J.D. McLaughlin, J.G. Sullivan and A.P. Vonderohe, 1989. "Toward a method for the evaluation of multipurpose land information systems", URISA Journal, 1(1):39-43. Paper originally published in 1985 describes a model for evaluating LIS which measures "operational efficiency, operational effectiveness, program effectiveness and contributions to well-being". EXAM AND DISCUSSION QUESTIONS 1. Summarize the issues involved in assessing costs and benefits when a) a manual system is replaced by a digital system, b) an existing digital system is replaced, and c) a digital system is introduced to an organization which does not have any existing equivalent, manual or digital. 2. Design a series of experiments to determine as far as possible the intangible benefits which accrue from GIS-based decision- making in an organization such as a National Forest. 3. A parcel delivery service plans to install vehicle navigation systems in each of its vehicles. These feature continuous display of maps of the area surrounding the vehicle, and of the location of the vehicle in relation to a specified destination. Design a study to assess the benefits of such a system. 4. Discuss the problems presented by the dimension of time in the evaluation of costs and benefits. DATABASE CREATION . INTRODUCTION B. DATABASE DESIGN Stages in database design C. ISSUES IN DATABASE CREATION D. KEY HARDWARE PARAMETERS Volume Access speed Network configuration E. DATABASE REDEFINITION F. TILES AND LAYERS Reasons for partitioning "Seamless" databases Organizing data into layers Selecting tile configurations G. DATA CONVERSION Database requirements In-house conversion H. SCHEDULING DATABASE CREATION Scheduling issues I. EXAMPLE - FLATHEAD NATIONAL FOREST DATABASE Background Examples of products Proposed database contents Example dataset characteristics Tiling Database creation plan System specific issues Schedule REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES This unit is the longest one included in the Curriculum. It will be impossible to cover all this material in one lecture, but there is no clear break at which to split this cleanly. Some of the material is technical and some of it management oriented. You will have to decide what to omit, if you need to, based on your students'' interests and educational backgrounds. UNIT 66 - DATABASE CREATION A. INTRODUCTION an FRS establishes: the products to be generated by the system the data needed to generate the products the functions which must operate on the data working from the outline provided by the FRS, the data base design and creation process begins this unit examines the management and planning issues involved in the physical creation of the database note that specific implementation details will not be reviewed as these are highly dependant on the particular GIS used emphasis is on databases for resource management applications databases for facilities management are often extensions of existing non-geographic databases depend too much on specifics of systems the key individual involved at this stage is the Database Manager or Coordinator who is responsible for: definition of database contents and its "external views" see Unit 43 for a discussion of the different "views" of a database maintenance and update control day-to-day operation, particularly if database is distributed over a network B. DATABASE DESIGN provides a comprehensive framework for the database allows the database to be viewed in its entirety so that interaction between elements can be evaluated permits identification of potential problems and design alternatives without a good database design, there may be irrelevant data that will not be used omitted data no update potential inappropriate representation of entities lack of integration between various parts of the database unsupported applications major additional costs to revise the database Stages in database design recall from Unit 10, that steps in database design are: 1. Conceptual software and hardware independent describes and defines included entities and spatial objects 2. Logical software specific but hardware independent determined by database management system (discussed in Unit 43) 3. Physical both hardware and software specific related to issues of file structure, memory size and access requirements this unit focuses mainly on this last stage C. ISSUES IN DATABASE CREATION what storage media to use? how large is the database? how much can be stored online? what access speed is required for what parts of the database? how should the database be laid out on the various media? what growth should be allowed for in acquiring storage devices? how will the database change over time? will new attributes be added? will the number of features stored increase? how should the data be partitioned - both geographically and thematically? is source data partitioned? will products be partitioned? what security is needed? who should be able to redefine schema - new attributes, new objects, new object classes? who should be able to edit and update? should the database be distributed or centralized? if distributed, how will it be partitioned between hosts? how should the database be documented? who is responsible for maintaining standards of definition? standards of format? accuracy? should documentation include access to the compiler of the data? how should database creation be scheduled? where will the data come from? who determines product priorities? who is responsible for scheduling data availability? the following sections address some of these questions D. KEY HARDWARE PARAMETERS Volume databases for GIS applications range from a few megabytes (a small resource management project) to terabytes a small raster-based project using IDRISI, 100 by 200 cells, 50 layers might require 10 Mbytes database on a PC/AT a mid-sized vector-based project for a National Forest using ARC/INFO might require 300 Mbytes a national, archival database might reach many hundreds of Gbytes the spatial database represented by the currently accumulated imagery of Landsat is order 1013 bytes Access speed overhead - Storage media data which can be accessed in order 1 second is said to be "on-line" to be on-line, data must be stored on fixed or removable disk relative to other forms of permanent storage, disk costs are high, and there is an effective upper limit of order 100 Gbytes for on-line storage when using common magnetic disk technology "archival" data (data which is comparatively stable through time) can be stored off-line until needed only extracts will be on-line for analysis at any one time archival systems incur additional time to mount media on hardware access time to extract subsets from archival data once mounted is order 1 minute archival media: magnetic tape removable disk CD-ROM no ability to edit data once written - this is acceptable for many types of geographical data copies are very cheap optical WORM (Write Once Read Many) "video" tape automatic multiple storage and access systems increase capacity and decrease access time magnetic tape stores can be automated, raising effective capacity to 1 Tbyte (order 10,000 tapes) order 10,000 tapes is also an effective upper limit to the size of a (conventional, manual mount) tape library optical WORM libraries can be automated much more easily using "jukebox" technology - automatic selection and mounting of platter devices to mount cassette tapes automatically are also available Network configuration should database be centralized or distributed? there are two answers: 1. all departments share one common database, or 2. parts of the database exist on different workstations in an integrated network each department responsible for maintaining its own share of the database optimizes use of expertise with modern technology (e.g. NFS (Network File System)) user may be unaware of actual location of data being used some workstations may be "diskless", owning no part of the database distributed databases require careful attention to responsibilities, standards, scheduling of updates E. DATABASE REDEFINITION in some applications, all files, attributes, objects can be anticipated when the database is defined e.g. systems for facilities management typically do not allow redefinition of the database structure by user other applications, particularly those involving analysis, require ability to define new objects, attributes this capability is generally important in resource management applications important to determine who is allowed to change the database definitions database administrator only? project manager only? any user? DATABASE CREATION F. TILES AND LAYERS many spatial databases are partitioned internally partitions may be defined spatially (like map sheets) or thematically or both the term tile is often used to refer to a geographical partition of a database, and layer to a thematic partition Reasons for partitioning capacity of storage devices may limit the amount of data that the system can handle as one physical unit update easier to update one partition (e.g. map sheet) at a time access speed may be faster if only one partition is accessed at a time distribution easier to copy a partition than to extract part of a larger database e.g. US Bureau of the Census chose to partition its TIGER files by county for distribution based on user needs e.g. US Geological Survey partitions digital cartographic data by 1:100,000 map sheet user needs users need certain combinations of geographical area and theme more commonly than others illustrated by the conventional arrangement of topographic and thematic map series e.g. soils information is not normally shown on standard topographic maps the best source of usage patterns is conventional cartographic products because their traditions have been established through continual usage and improvement "Seamless" databases despite the presence of partitioning, system designers may choose to hide partitions from the user and present a homogeneous, seamless view of the database e.g. are systems available to automatically mosaic Landsat scenes, so users can work independently of normal scene boundaries in seamless databases, the data must be fully edgematched parts of an object which span geographical partitions must be logically related features which extend across tile boundaries must have identical geographic coordinates and attributes at adjacent edges every object must have an ID which is unique over the whole database the term Map Librarian is commonly applied to systems which remove partitions from the user''s view of the database Organizing data into layers the source documents (maps) generally determine the initial thematic division of the data into layers these initial layers need not coincide with the way the data are structured internally e.g. the application may consider lakes and streams as one layer while the data structure may see them as two different objects - polygons and lines several distinct layers may be available from the same map sheet e.g. topographic maps may provide contours, lakes and streams (hydrography), roads the Database Manager may choose to store these as different thematic partitions in the database when deciding how to partition the data by theme, need to consider: data relationships which types of data have relationships that need to be stored in the database these will need to be on the same layer or stored in such a way that relationships between them can be quickly determined functional requirements what sets of data tend to be used together during the creation of products it may be more efficient to store these on one layer user requirements how diverse will the users requirements be more diversity may require more layers to allow flexibility updates data which needs to be updated frequently should be isolated common features features which are common to many coverages, such as shorelines and rivers, may be stored separately then used to create other coverages that incorporate these lines as boundaries internal organization of layers depends on the system chosen CAD systems treat each class of object as a separate layer many raster systems treat each attribute as a separate layer, although objects may have many attributes some newer GIS designs avoid the concept of layers entirely, storing all classes of objects and their interrelationships together Selecting tile configurations tiles may cover the same area throughout the database, or they may have variable sizes fixed size tiles are: generally inefficient in terms of storage since some tiles will have lots of data and others very little good when data volume changes through time since it is not necessary to restructure tiles with updates variable size tiles are: efficient in term of storage difficult to restructure if new data is added boundaries may be: overhead - Tiling Variations regular e.g. based on map sheet boundaries free-form e.g. based on political or administrative boundaries, watersheds, major features like roads or rivers tile sizes and boundaries can be chosen based on: areal extent of common queries or products scale needed in output balance between getting the largest areal coverage possible and speed of processing practically speaking, in most databases, partitions correspond to conventional map sheet boundaries, e.g. 7.5 minute quadrangles products will likely be created one tile at a time e.g. a forest manager wants maps of timber inventory at a scale of 1:24,000 the size of plots is limited by the plotter itself, and by physical constraints on handling and storage it makes sense to generate timber inventory maps in 7.5 minute quadrangles since data will be input from quadrangles, why not tile the entire database in quadrangles as well? however, a Map Librarian will be needed when small- scale products have to be generated using many tiles at once G. DATA CONVERSION the process of data input to create the database is often called data conversion involves the conversion of data from various sources to the format required by the selected system previously have examined the different ways of inputting data and various data sources consideration of these options is critical in planning for database creation Unit 7 discusses several issues related to integrating data from different sources often there are several alternative sources and input methods available for a single dataset Database requirements need to consider database requirements in terms of: scale accuracy scheduling priorities cost scale FRS specifies the scale required for output will determine the largest scale that is required for datasets may not need to go to added expense and time to input data at larger scales accuracy required accuracy will determine the quality of input necessary and the amount of data that may be created e.g. coarse scanning or digitizing versus very careful and detailed digitizing e.g. field data collection versus satellite image interpretation scheduling priorities some datasets will be critical in the development of later datasets and early products these may justify expensive input methods or the purchase of existing sources alternatives for creating the database include: obtaining and converting existing digital data manual or automated input from maps and field sources contracting data conversion to consultants In-house conversion data entry is labor intensive and time consuming some GIS vendors assist in the conversion effort, and there are a number of companies which specialize in conversion some agencies do their conversion in-house, but there is a reluctance to do so, in many cases, as the added personnel may not be needed once the initial conversion is complete advantages of in-house conversion agency personnel, who are familiar with the "ground truth" and unique situations of the areas of interest, are able to supervise the conversion effort this can be important for unanticipated situations in which general rules cannot be uniformly applied auxiliary maps and data are available if needed for interpretation if the maps are sent out for digitizing, what you send is all you get in-house validity checks can be made more easily disadvantages of in-house conversion additional equipment and personnel need to be added to the project plan long-term commitment to full-time employees can be expensive H. SCHEDULING DATABASE CREATION database creation is a time-consuming and expensive operation which must be phased over several years of operation the total cost of database creation will likely exceed the costs of hardware and software by a factor of four or five e.g. over a 5 year period, of a total $5 million cost of a typical GIS project for resource management, $4 million went to data collection and entry, only $1 million to hardware, software, administration, application development since the benefits of the system derive from its products database creation must be scheduled so the system can produce a limited number of products as quickly as possible however, benefits will not be realized at the full rate until the database creation is complete need to know the complexity of data on each input source document to forecast data input workload e.g. numbers of points, polygons, lengths of lines, number of characters of attribute Scheduling issues to generate a tile of a product, the required data layers for the correct tile must have been input to determine the order in which datasets must be input, must rank products based on: 1. perceived benefit 2. cost of necessary input highest ranked are those with high benefit, low cost of necessary input lowest ranked are low benefit, high data input cost some layers may be used by several products - once input, the cost to other products is nil the promotional benefit of a product is highest for a single tile, decreases for subsequent tiles a single tile of a product can be used to "sell" the system, draw attention to its possibilities high priority needs to be given to generating a product which can "sell" the system within each department or to each type of user need to know the payoffs between 1. producing a single tile of a new product and 2. producing further tiles of an existing product determining priorities under the constraint of data input capacity is a delicate operation for the Database Manager many layers of data may not exist, may have to be compiled from air photos or field observation the schedule for data input will have to accommodate availability of data as well as product priorities I. EXAMPLE - FLATHEAD NATIONAL FOREST DATABASE Background Flathead NF located in Northwestern Montana on west slope of Continental Divide adjacent to Glacier National Park headquarters in Kalispell, MT total area within Forest boundary is 2,628,705 acres (1,063,822 ha) Forest area spread over 133 1:24,000 (7.5 minute) quadrangles resource management responsibilities include: timber fisheries wildlife water soils recreation minerals wilderness areas rangeland fire plus maintenance of Forest infrastructure (engineering) substantial investment in use of Landsat imagery for forest inventory and management, using VICAR image processing software FRS conducted in 1984/5, planning period extended to 1991 important to note that this plan considers the needs of Flathead NF in isolation may not be compatible with the national needs of the Forest Service or the national policy developed under the National GIS Plan may conflict with emerging concepts of service- wide Corporate Information (see Unit 71) Examples of products FRS identifies 55 information products handout - Examples of products (2 pages) extracted from a study by Tomlinson Associates, Inc. Proposed database contents total of 58 input datasets required total database volume estimated at 1 Gbyte 12 already in digital form in VICAR image processing system, running on mainframe outside Forest 3 interpreted from Landsat, available in raster form 9 digitized in vector form, rasterized by VICARS 2 are large attribute files (forest stand attributes, transportation data for roads) maintained as System/2000 database outside Forest all remaining datasets must be derived from non-digital maps or tables map scales range from 1:24,000 to 1:250,000 datasets vary in complexity number of map sheets varies depending on scale Example dataset characteristics see page 2 of handout Tiling 1:24,000 7.5 minute quadrangle dominates both input and output requirements therefore, makes sense to use quadrangles as tiles in the database if it must be tiled (depends on system chosen) could use aggregations of 7.5 minute quadrangles, e.g. 15 minute quadrangles Database creation plan needed to: assign input data to object types, layers determine which relationships to store in the database determine naming conventions for files, attributes scheduled input of data from 3,162 individual map sheets over 6 years need to allow for updates as well as initial data input some layers updated on a regular basis - e.g. timber harvesting some irregularly - e.g. forest fires System specific issues preferred arrangement is a centralized database at Forest headquarters, access from workstations across the Forest implementation plan is based on scheduled generation of products system design provides little access to the database in query mode therefore product generation can be batched and data need be online only during product generation however 1 Gbyte is easily accommodated online Schedule database creation schedule determines ability to generate products FRS calls for generation of 4,513 individual map products and 3,871 list products in same 6 years digitizing need will be heaviest at beginning of period ability to produce will be highest at end input phasing: roads, PLSS section boundaries - all input in year 1 lakes and streams - phased over years 2-4 forest stands - phased over years 2-4 harvest areas - input to begin in year 4 over the 6 years of database creation there will be increasing output, diminishing input REFERENCES ACSM-ASPRS GIMS Committee, 1989. "Multi-purpose Geographic Database Guidelines for Local Governments," ASCM Bulletin, Number 121:42-50. Provides a general outline for the consideration of scale and content for municipal GIS databases. Calkins, H.W., and D.F. Marble, 1987. "The transition to automated production cartography: design of the master cartographic database," The American Cartographer 14:105- 119. Stresses the need for rigorous database design and illustrates the use of the entity-relationship model for spatial databases. Nyerges, T.L., 1989. "Schema integration analysis for the development of GIS databases," International Journal of Geographical Information Systems 3:152-83. Describes methods for analyzing the differences and similarities between two or more databases Nyerges, T.L., and K.J. Dueker, 1988. "Geographic Information Systems in Transportation," US Department IMPLEMENTATION STRATEGIES FOR LARGE ORGANIZATIONS INTRODUCTION this unit examines issues that arise when GIS is implemented in large organizations these issues include: where in the organizational structure to locate the GIS operation problems and advantages of multi-participant projects B. LOCATION WITHIN THE ORGANIZATION even though a GIS may be an organization-wide tool and is seen as a decentralized resource within the organization, centralized coordination of the GIS operation is still necessary is needed to ensure efficiency and cost- effectiveness in the operation of the GIS e.g. avoid redundancy in the collection of data e.g. ensure expensive hardware is being used efficiently the location of the GIS manager and the support staff will be seen as the location of the GIS unit within the organization the location of this unit will affect the way the GIS staff interacts with the rest of the organization Somers (1990) suggests there are three basic options for the location of the GIS: 1. operational department location e.g. planning, public works, engineering or assessments GIS often develops from a small system obtained to deal with specific needs which have grown to support activities outside the mandate of the original department advantages: such systems are very responsive to original users needs disadvantages: departmental focus makes it difficult for other users to have their needs and priorities recognized may not have high level management support 2. support department location e.g. data processing, MIS or management services in these locations GIS is seen as a service operation like payroll, personnel and DP and will be supported by the organization as such advantages: objectivity of system design and management disadvantages: remote from the users of the GIS may not be responsive to the needs of users priorities of department may be different than users'' 3. executive level location advantages: high level visibility, support and attention objectivity disadvantages: distance from the real operations of the organization users may feel GIS support staff is out of touch with their needs the actual location of the GIS unit within an organization will reflect the circumstance of its introduction, the management structure and the organizational policies and mandates C. MULTIPARTICIPANT PROJECTS increasingly, GISs are being implemented by consortia of agencies with a wide range of legal foundations, including: local government agencies county governments state and federal government agencies public utilities non-profit organizations diverse organizations cooperating in such multi- participant GIS are bound by a common geographic setting and are motivated by the need for fiscal responsibility costs for data collection and management for a common geographic area can be shared among organizations are guided and coordinated through inter-agency committees consisting of representatives from the departments and agencies involved in the use and design of the GIS such committees generally have two structural levels: policy level - senior management technical level - technical and middle management Issues for multiparticipant projects Forrest et al (1990) list several issues that have to be addressed by these inter-agency committees: participation involved agencies need to commit financial and other resources to the project data ownership who owns the data collected? data maintenance which agency or agencies will have the ultimate responsibility of data maintenance and update how will this responsibility be partitioned? hardware and software ownership and maintenance how will the necessary hardware and software be distributed across the agencies? which vendors'' products will be supported by the multi-agency agreement standards what standards will be used for data exchange and communications financing how will the project be funded? how will the costs be shared equitably? new business activities GIS may provide the involved agencies the opportunity to venture into new business areas e.g. sale of digital data, maps D. US FOREST SERVICE EXAMPLE the following sections describe the development and implementation of a national GIS strategy within the US Forest Service Forest Service is an agency of the US Department of Agriculture responsible for management of nearly 200 million acres of federal lands organized into 155 National Forests mandate to manage land for multiple uses - timber and pulp production, mineral resources, recreation, wildlife, conservation Organization National Forests grouped into regions each National Forest has a headquarters, several district offices nature of each Forest varies depending on resources those in the Pacific Northwest are heavy timber producers others may have significant oil and gas, e.g. in Rocky Mountains "wealth" (annual budget) of Forest depends on resources, leases pattern of jurisdiction is typically complex area of Forest is not singly bounded many islands of private ownership within boundary complex system of access rights, grazing and timber leases map - a map of a local National Forest would be useful at this point, plus a description of its resources, management activities E. EARLY GIS ACTIVITIES many Forests and regional offices acquired assorted types of GIS prior to 1987 determining factors in early acquisition included: availability of funds - "rich" forests were early adopters presence of a "missionary" on Forest staff, able to persuade management that available funds should be spent on this high risk innovation examples of status of GIS circa 1985: San Juan National Forest large Forest in southern Colorado extensive mineral resources, recreation little marketable timber Forest broken into 80,000 irregularly shaped units, often called "integrated terrain units" (ITU) the ITU is an area object which is homogeneous on all attributes in the database i.e. a "smallest common denominator" parcel of land with uniform land use, vegetation, soil in essence, these units are the result of overlaying maps of all relevant themes in practice the map is divided up into areas which are both (a) as large as possible and (b) as homogeneous as possible each unit assigned a unique number attributes assigned to each unit, covering forest cover (species, age, density) administrative unit (county, ranger district) slope and aspect watershed soil type, drainage etc. data matrix of 80,000 units by 600 attributes (close to 50,000,000 individual data items) maintained at Region computing facility using System/2000 hierarchical database benefit: low cost of data entry - no digitizing problems: no geography - just a "flat file" of attributes no way of aggregating units based on spatial adjacency, making spatial queries no point or line objects, no associated operations, e.g. buffers around line objects no map products problems with quality control unlike geographical files, cannot make internal consistency checks, every entry must be checked individually - no possibility of using maps for data checking virtually impossible to achieve high quality redundancy if extended to too many attributes, the ITU approach leads to high levels of redundancy in the database e.g. there are only two counties in the Forest, these could be represented accurately as a single layer with two area objects, but using the ITU approach 80,000 entries must be made for county attribute thus while only two possible errors could occur in entering county attribute if county is a separate polygon layer, there are 80,000 chances of error with ITU approach Flathead National Forest large Forest in western Montana adjacent to Glacier National Park much marketable timber, some mineral resources wildlife conservation important because of adjacency to National Park heavy reliance on Landsat imagery as primary data source imagery interpreted with ground checks to provide forest inventory imagery registered to topographic mapping and DEM other layers input by rasterizing vector coverages (e.g. climatic variables) multi-layer raster database at Landsat resolution (80 m) manipulated using remote sensing system (VICAR) benefits: easy to use system for mapping, production of images easy to combine layers for modeling problems: difficult to use system to manage timber resource raster database has no concept of homogeneous stand difficult to link ground checks of timber type/size/density to pixels not easy to handle point or line datasets e.g. campsites, points of historical significance, sightings of endangered species e.g. Grizzly Bear, roads, streams difficult to attach extensive lists of attributes to pixels each attribute treated as a separate layer, no easy way of relating objects between layers Summary Flathead and San Juan NFs illustrate the problems of delivering GIS products using image processing and conventional database technology respectively other examples illustrate the problems of CAD systems by 1985 Forest Service had experience of many GISs in different Forests and regions: vector systems: COMARC ARC/INFO (ESRI) Strings (Geo-Based) Intergraph MOSS raster systems: ERDAS VICAR WRIS input methods included digitizing, scanning and interpretation of imagery Other technical issues in the early 1980s the Forest Service began implementation of a nationwide system of networked computing resources to automate office functions functionality includes electronic mail, word processing, limited database and analysis capabilities supplied by Data General, installed in every Forest, region and Washington headquarters compatibility of an eventual GIS with the DG hardware is therefore a major technical issue in GIS planning and acquisition could the GIS run on the (possibly expanded) DG network? of the GISs installed in various parts of the Forest Service, one vector system (MOSS) had been developed largely within the Department of the Interior and appeared to have much of the necessary functionality how should this system be judged relative to the remaining vendor-supplied systems in the acquisition process? F. 1984/5 FUNCTIONAL REQUIREMENTS STUDIES as a result of pressure from both inside and outside the Service to acquire GISs for their operations, FRSs were conducted for a small sample of forests in order that functional requirements for the entire Forest Service could be determined 6 Forests with a variety of sizes, resource mixes were selected: George Washington (Virginia) Nicolet (Wisconsin) Flathead (Montana) San Juan (Colorado) Siuslaw (Oregon) Shasta/Trinity (California) full Functional Requirements Studies for GIS were carried out fully internal strategy (see Unit 61) contracted to consultant - Tomlinson Associates Inc. contract period of 8 months 30-60 information products identified per Forest, similar numbers of input datasets 60-90% of these were new products not previously generated Siuslaw National Forest FRS 60 information products identified: 10 are simple cartographic products generated by reformatting, rescaling and/or resymbolizing input data 2 require 3D graphics 7 are lists generated from input data 37 require use of GIS functions for simple analysis of input data 8 are the result of sophisticated analysis some are common to most Forests, e.g. timber inventory maps some are specific to local conditions, e.g. map to predict areas suitable for growing marijuana required by law enforcement department database requires input of data from approx. 15,000 map sheets during the 6 year planning period many of these are repeated updates 1200 in year 1 rising to 3500 in year 6 the 1200 maps in year 1 contain approx. 60,000 polygons and 13,000 points, plus 300,000 cm of line objects G. THE NATIONAL GIS PLAN the circa-1985 situation was clearly uncoordinated duplication of effort, high cost of maintaining expertise in a range of systems no analysis of what was optimal for the Forest Service as a whole was an awareness that information should be a corporate resource and managed as such corporate information is that information which must be commonly used, understood and shared to meet the agency''s mission must be freely exchangeable between different departments, Forests, regions must have compatible formats and definitions - well- developed standards although the software to handle this information need not be standardized, the interfaces, methods of analysis and planning, and data structures and formats should be standard in January 1988 the Forest Service approved a plan for implementing a service-wide GIS by 1991 Objectives of the GIS support the management information needed by the Forest Service to accomplish its mission facilitate understanding and sharing of information horizontally and vertically within the organization, and with other organizations where possible allow access to information by managers through a non- technical, user-friendly interface take full advantage of existing Forest Service hardware and networks be flexible enough to incorporate new technologies in the future H. COMPONENTS OF THE PLAN plan is composed of 5 major components or phases: 1. Information Base and Structure identify the objectives, principles and assumptions of GIS implementation - the "vision" - and convert this into a "blueprint" for structuring resource information assemble information from a survey of 34 Forests to identify the kinds of data being used to characterize resources need to distinguish between "basic" and "interpreted" data "basic" is raw but relatively stable and accurate "interpreted" is more immediately useful for management which is more appropriately stored in the database? is there a relatively small set of data types common to many Forest management efforts, but complicated by differences in definition and practice? describe the NFS GIS corporate information structure and the database environment describe the characteristics and functionality of the GIS database environment needed to support the information structure develop standards for the corporate information structure define the requirements for the user interface 2. Organizational Readiness improve awareness of the GIS plan develop guidelines for planning local implementations develop strategy for data conversion and acquisition the data currently availab IMPLEMENTATION ISSUES . INTRODUCTION B. STAGE THEORIES OF COMPUTING GROWTH Nolan model of computing growth Incremental model Radical model C. RESISTANCE TO CHANGE D. IMPLEMENTATION PROBLEMS Overemphasis on technology Rigid work patterns Organizational inflexibility Decision-making procedures Assignment of responsibilities System support staffing Integration of information requirements E. STRATEGIES TO FACILITATE SUCCESS Management involvement Training and education Continued promotion Responsiveness Implementation and follow-up plans REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES UNIT 67 - IMPLEMENTATION ISSUES Compiled with assistance from Ken Dueker, Portland State University A. INTRODUCTION most organizations acquiring GIS technology are relatively sophisticated some level of investment already exists in electronic data processing (EDP) they have experience with database management and mapping systems and some combination of mainframes, minis and micros GIS technology will be moving into an environment with its own institutional structures - departments, areas of responsibility as an integrating technology, GIS is more likely to require organizational changes than other innovations the need for changes - cooperation, breaking down of barriers etc. - may have been used as arguments for GIS existing structures are already changing - centralized computing services with large staffs are disappearing because new distributed workstation hardware requires less support organizational change is often difficult to achieve and can lead to failure of the GIS project organizational and institutional issues are more often reasons for failure of GIS projects than technical issues B. STAGE THEORIES OF COMPUTING GROWTH several models have been proposed for the growth of computing within organizations growth is divided into a number of stages Nolan model of computing growth the Nolan (1973) model has 4 stages: Stage 1: Initiation computer acquisition use for low profile tasks within a major user department early problems appear Stage 2: Contagion efforts to increase use of computing desire to use inactive resources completely supportive top management fast rise in costs Stage 3: Control efforts to control computing expenditures policy and management board created efforts to centralize computing and control formal systems development policies are introduced rate of increase in cost slows charge-back policies introduced Stage 4: Integration refinement of controls greater maturity in management of computing computing is seen as an organization-wide resource application development continues in a controlled way costs rise slowly and smoothly charge-back policy might be modified or abandoned how does this model fit GIS experience? two versions - incremental and radical Incremental model GIS is a limited expansion of existing EDP facilities, no major organizational changes required GIS will be managed by EDP department as a service probably run on EDP''s mainframe this model fits AM/FM and LIS applications best - adding geographical access to existing administrative database GIS acquisition will likely be initiated by one or two departments, other departments encouraged to support by management thus it begins at stage 2 of Nolan''s model if acquisition is successful, use and costs will grow rapidly, leading to control in stage 3 Radical model GIS is independent of existing EDP facilities, e.g. uses micros instead of EDP mainframe, may be promoted by staff with little or no history of EDP use EDP department may resist acquisition, or attempt to persuade management to adopt an incremental-type strategy instead may be strong pressure to make GIS hardware compatible with main EDP facility to minimize training/maintenance costs this model more likely in GIS applications with strong analytical component, e.g. resource management, planning model assumes that GIS will not require large supporting infrastructure - unlike central EDP facility with staff of operators, programmers, analysts, consultants unlike the incremental model, this begins at step 1 of Nolan''s model few systems have progressed beyond stage 2 - process of contagion is still under way in most organizations - GIS is still new stage 2 is slow in GIS because of the need to educate/train users in new approach - GIS does not replace existing manual procedures in many applications (unlike many EDP applications, e.g. payroll) support by management may evaporate before the contagion period is over - never get to stages 3 and 4 we have little experience of well-controlled (stage 3), well integrated (stage 4) systems at this point in time C. RESISTANCE TO CHANGE all organizations are conservative resistance to change has always been a problem in technological innovation e.g. early years of the industrial revolution change requires leadership stage 1 requires a "missionary" within an existing department stage 2 requires commitment of top management, similar commitment of individuals within departments despite the economic, operational, political advantages of GIS, the technology is new and outside many senior managers'' experience leaders take great personal risk ample evidence of past failure of GIS projects initial "missionary" is an obvious scapegoat for failure Rhind (1988), Chrisman (1988) document the role of various leaders in the early technical development of GIS - similar roles within organizations will likely never be documented GIS innovation is a sufficiently radical change within an organization to be called a "paradigm shift" a paradigm is a set of rules or concepts that provide a framework for conducting an organization''s business the role of paradigms in science is discussed by Kuhn (1970) use of GIS to support various scientific disciplines (biology, archaeology, health science) may require a paradigm shift D. IMPLEMENTATION PROBLEMS Foley (1988) reviews the problems commonly encountered in GIS implementation, and common reasons for failure reasons are predominantly non-technical Overemphasis on technology planning teams are made up of technical staff, emphasize technical issues in planning and ignore managerial issues planning teams are forced to deal with short-term issues, have no time to address longer-term management issues Rigid work patterns it is difficult for the planning team to foresee necessary changes in work patterns a formerly stable workforce will be disrupted some jobs will disappear jobs will be redefined, e.g. drafting staff reassigned to digitizing some staff may find their new jobs too demanding former keyboard operators may now need to do query operations drafting staff now need computing skills people comfortable in their roles will not seek change people must be persuaded of the benefits of change through education, training programs productivity will suffer unless the staff can be persuaded that the new job is more challenging, better paid etc. Organizational inflexibility planning team must foresee necessary changes in reporting structure, organization''s "wiring diagram" departments which are expected to interact and exchange data must be willing to do so Decision-making procedures many GIS projects are initiated by an advisory group drawn from different departments this structure is adequate for early phases of acquisition but must be replaced with an organization with well-defined decision-making responsibility for the project to be successful it is usually painful to give a single department authority (funds must often be reassigned to that department), but the rate of success has been higher where this has been done e.g. many states have assigned responsibility for GIS operation to a department of natural resources, with mandated consultation with other user departments through committees project may be derailed if any important or influential individuals are left out of the planning process Assignment of responsibilities assignment is a subtle mixture of technical, political and organizational issues typically, assignment will be made on technical grounds, then modified to meet pressing political, organizational issues System support staffing a multi-user GIS requires at minimum: a system manager responsible for day-to-day operation, staffing, financing, meeting user requirements a database manager responsible for database design, planning data input, security, database integrity planning team may not recognize necessity of these positions in addition, the system will require staff for data input, report production applications programming staff for initial development, although these may be supplied by the vendor management may be tempted to fill these positions from existing staff without adequate attention to qualifications personnel departments will be unfamiliar with nature of positions, qualifications required and salaries Integration of information requirements management may see integration as a technical data issue, not recognize the organizational responses which may be needed to make integration work at an institutional level E. STRATEGIES TO FACILITATE SUCCESS Management involvement management must take a more active role than just providing money and other resources must become actively involved by supporting: implementation of multi-disciplinary GIS teams development of organizational strategies for crossing internal political boundaries interagency agreements to assist in data sharing and acquisition must be aware that most GIS applications development is a long-term commitment Training and education staff and management must be kept current in the technology and applications Continued promotion the project staff must continue to promote the benefits of the GIS after it has been adopted to ensure continued financial and political support projects should be of high quality and value a high profile project will gain public support an example is the Newport Beach, CA tracking of the 1990 oil spill (see Johansen, 1990) Responsiveness the project must be seen to be responsive to users needs Implementation and follow-up plans carefully developed implementation plans and plans for checking on progress are necessary to ensure controlled management and continued support follow-up plans must include assessment of progress, include: check points for assessing project progress audits of productivity, costs and benefits REFERENCES Chrisman, N.R., 1988. "The risks of software innovation: a case study of the Harvard lab," The American Cartographer 15:291-300. Foley, M.E., 1988. "Beyond the bits, bytes and black boxes: institutional issues in successful LIS/GIS management," Proceedings, GIS/LIS 88, ASPRS/ACSM, Falls Church, VA, pp. 608- 617. Forrest, E., G.E. Montgomery, G.M. Juhl, 1990. Intelligent Infrastructure Workbook: A Management-Level Primer on GIS, A-E-C Automation Newsletter, PO BOX 18418, Fountain Hills, AZ 85269-8418. Describes issues in developing management support during project planning and suggests strategies for successful adoption of a project. Johansen, E., 1990. "City''s GIS tracks the California oil spill," GIS World 3(2):34-7. King, J.L. and K.L. Kraemer, 1985. The Dynamics of Computing, Columbia University Press, New York. Presents a model of adoption of computing within urban governments, and results of testing the model on two samples of cities. Includes discussion of adoption factors and the Nolan stage model. Kuhn, T.S., 1970. The Structure of Scientific Revolutions, University of Chicago Press, Chicago. Nolan, R.L., 1973. "Managing the computer resource: a stage hypothesis," Communications of the ACM 16:339-405. Rhind, D.W., 1988. "Personality as a factor in the development of a discipline: the example of computer- assisted cartography," The American Cartographer 15:277- 90. GIS STANDARDS A. INTRODUCTION Reasons for standards Standards organizations related to GIS B. TYPES OF STANDARDS FOR GIS Operating system standards User interface standards Networking standards Database query standards Display and plotting standards Data exchange standards C. IMPLEMENTING STANDARDS Start-up costs Management support Technical tradeoffs Potential for security risks Innovation D. WHAT TO STANDARDIZE? REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES UNIT 69 - GIS STANDARDS A. INTRODUCTION this unit is based largely on information from Exler (1990) and Tom (1990) standards are needed as GIS users attempt to integrate their operations with other hardware, GISs and data sources challenge is to get industry, government and users to implement and promote the use of standards many standards are set simply through common use, though major attempts are currently being made to develop broad ranging national and international standards Reasons for standards 1. Portability of applications need the ability to move developed applications to new hardware platforms in order that development efforts are not duplicated and can be shared 2. Data networks need ability to access digital data which is distributed through various offices, agencies, states and even countries 3. Common environments if applications use similar operating environments, learning curves are reduced and productivity is increased 3. Cost of program development standards are important to software developers as they reduce the need to develop interfaces for many different data formats, operating systems, plotters, etc. Standards organizations related to GIS overhead - Standards Organizations the following is from Exler (1990) ANSI - American National Standards Institute approves standards for US industrial and commercial sectors DCDSTF - Digital Cartographic Data Standards Task Force combines FICCDC-SWG and NCDCDS for digital cartographic standards FICCDC-SWG - Federal Coordinating Committee on Digital Cartography - Standards Working Group formed by the Interagency Coordinating Committee - Office of Management and Budget to serve as a focal point for the coordination of digital cartographic activities FIPS - Federal Information Processing Standards official source of information processing standards for federal departments and agencies IEEE - Institute of Electrical Electronics Engineers develop standards for a broad range of subjects, including information processing ISO - International Standards Organization approves standards for the international community through national standards bodies such as ANSI NCDCDS - National Committee for Digital Cartographic Data Standards formed by ACSM (American Congress on Surveying and Mapping) and funded by USGS NIST - National Institute of Standards and Technology formerly the National Bureau of Standards oversees standards activities for the government recently opened a GIS laboratory OSF - Open Software Foundation a vendor consortium of IBM and Digital Equipment Corporation UNIX International - a vendor consortium of AT&T and SUN X/Open - a nonprofit independent consortium of 19 computer manufacturers representing 160 software developers from 17 countries attempting to define standards for a complete computing environment GIS STANDARDS B. TYPES OF STANDARDS FOR GIS Operating system standards for micro-computers, most GIS use the DOS operating system, though applications are being written for OS/2 and Macintosh UNIX appears to be the current popular operating system for the powerful workstations and mainframe computers, though there are several other well accepted and newly developing options User interface standards affect the "look and feel" of GIS programs windowing is becoming popular as a standard in GIS and as well as most other applications at the micro-computer level: for PC computers, Presentation Manager available under the OS/2 operating system and as Microsoft Windows in DOS is becoming the standard Macintosh operating system has always been a windowing environment X-Windows is the de facto windowing standard for UNIX and other mainframe and workstation operating systems this allows different vendor''s hardware to support a common interface in a networked environment Networking standards are critical to allow communications between remote computers networked environments are increasingly popular for GIS as the technology and data becomes widely used within organizations Database query standards SQL (standard query language) is emerging as a standard across the data processing spectrum, though in its current form it is limited in its ability to handle spatial queries Display and plotting standards several standards have emerged in this area simply as a result of the popularity of specific hardware devices these include: CalComp and HPGL - line plotter formats Postscript - raster, page oriented graphics Data exchange standards the largest standardization effort is currently being directed at this area US Federal government has recognized the need to exchange data between different agencies (see Unit 68) and has formed committees to examine aspects of this note work done by NCDCDS, FICCDC and DCDSTF current efforts are directed towards the development of the Spatial Data Transfer Specification (SDTS) (see Tom (1990) for more details on SDTS) the Defence Mapping Agency''s digital cartographic data standard DIGEST is part of an effort to establish standards within the international defense community, e.g. NATO however, there are several common data exchange formats currently in use (see GIS World, 1989) these include: overhead - Spatial data exchange formats USGS DEM- Digital Elevation Model format used by the USGS since early 1980s for gridded elevation data allows a single attribute per cell USGS DLG - Digital Line Graph all features of the USGS quadrangle map series are supported by this format is the most widely used format for exchange of digital cartographic data used primarily for coordinate information though it does support alphanumeric attributes GBF/DIME - Geographic Base File/Dual Independent Map Encoding (Census Bureau) original Census Bureau digital files developed in early 1970s allows both coordinate and attribute data TIGER - Topologically Integrated Geographic Encoding and Referencing (see Units 8 and 29) IGES - Initial Graphic Exchange Specifications (National Bureau of Standards) used extensively for the exchange of CAD and CAM data only one attribute per feature SDDEF - Standard Digital Data Exchange Format (NOAA) primarily used to exchange data between NOAA, Defense Mapping Agency and the Federal Aviation Administration only supports point data SIF - Standard Interchange Format (Intergraph) developed to support exchange of data between Intergraph and other systems popular data exchange format for many GIS packages MOSS - Map Overlay and Statistical System (US Fish and Wildlife) originally developed as part of the MOSS GIS a non-topological format for vector data with translators to several common spatial data formats now used by several federal and local government agencies DXF - Digital eXchange Format (Autodesk) developed by Autodesk, Inc. for AutoCAD like SIF is a popular data exchange format for many GIS packages C. IMPLEMENTING STANDARDS several issues are related to the implementation of GIS standards Start-up costs implementation of a standard can incur substantial costs in terms of money and time will be major short-term costs related to user training and reprogramming of software Management support management needs to recognize the positive impacts of standards on productivity and system costs and be willing to commit short-term resources for retraining and reprogramming Technical tradeoffs adopting of standards require tradeoffs between functionality and performance standards provide for broad functionality e.g. adopting software that uses a standard data exchange format allows access to a broad range of data sources e.g. adopting a standard operating system provides access to a large library of existing applications however, standards by their very nature, do not allow fine tuning to specific hardware or applications e.g. plotter standards may not make the optimum use of the hardwired capability of your plotter some de facto standards are neither efficient nor the best available many exist simply due to the original popularity of the hardware or software, even though they may no longer be the state-of-the-art Potential for security risks wide availability of common operating systems allow for misuse and exploitation e.g. the spread of computer viruses depends on common operating systems Innovation broadly accepted standards make it very difficult to introduce innovations D. WHAT TO STANDARDIZE? the majority of standards effort in GIS to date has concerned data formats standards such as DIGEST provide standard record layouts, coding schemes although formats are standardized, these efforts deal primarily with the structure of the data, and not with its meaning data may be written into a standard format for transfer, and thus be readable by some other system, but it may still be virtually meaningless without extensive documentation the SDTS goes well beyond format standards by defining standard meanings for terms e.g. SDTS attempts to remove the confusion over the use of arc, link, edge, chain, segment in GIS by establishing a standard term for every type of object the USFS effort to establish a corporate database may similarly yield standards of meaning, e.g. standardized definitions of GIS layers, at least within this organization still missing is a standard of data models that would provide standard ways of representing geographic phenomena e.g. for digital elevation data, should the standard include all of contours, DEMs and TINs? should there be standard resolutions for DEMs? should there be standards of vertical accuracy? also missing are standards of data accuracy for GIS map accuracy standards deal only with cartographic features e.g. a GIS standard for digital elevation data might specify the accuracy of elevation for any point in an area, not the accuracy of positioning of a contour such standards would provide the GIS user with expectations about the reliability of the database as a window on the world, rather than a window on source documents, or a window on transferred databases REFERENCES GIS World, 1989. "Spatial data exchange formats", The GIS Sourcebook, GIS World, Fort Collins, CO, pp. 122-23. Exler, R.D., 1990. "Geographic Information Systems Standards: An Industry Perspective", GIS World Vol 3(2):44-47. FICCDC, 1988. "The proposed standard for digital cartographic data", The American Cartographer, Vol 15(1). Thorley, G.A., 1987. "Standards - Why bother?," USGS Open File Reports, 87-314. Abstracts of papers presented at the GIS Symposium. Tom, H., 1990. "Geographic Information Systems Standards: A Federal Perspective", GIS World Vol 3(2):47-52. EXAM AND DISCUSSION QUESTIONS 1. Standards can be imposed from above, or emerge through consensus. Discuss the pros and cons of top-down and bottom-up approaches to GIS standardization. 2. How successful do you think DXF can be as a GIS exchange standard? What aspects of information exchange does it standardize? 3. Review the approach taken by SDTS to standardizing the use of the term "chain". 4. "SDTS is a standard for cartography, not GIS" - discuss. LEGAL ISSUES A. INTRODUCTION The legal regime is the structure of laws, regulations and practices within which society operates components of the legal regime: statutes - laws, bills, acts passed by legislative bodies administrative regulations - rules established by branches of government, under the authority of statutes (enabling legislation) common law - past decisions of courts and judges have the effect of statute law unless over-ridden professional and related practices - the system of conventions and traditions which rely on law for their ultimate authority decisions of some bodies, e.g. medical professional associations, may have force of law in some cases Function of law in society to resolve disputes, e.g. over ownership of land, locations of boundaries etc. maintain order establish a framework for common expectations about the events of life secure efficiency, harmony and balance in the functioning of government protect against excessive or unfair power in government or the private sector assure an opportunity to enjoy the minimum decencies of life B. INFORMATION AS A LEGAL AND ECONOMIC ENTITY information is often considered as a commodity is information similar or dissimilar to other traded commodities? Quantity value of a traded commodity is generally proportional to quantity difficult to measure quantity of information quantity of geographic information might be based on geographical area covered e.g. US Bureau of the Census charges a flat fee for each county''s TIGER file irrespective of the size or population of the county some Western counties with population <100 cost the same as a New York borough with population in the millions can buy specific information, e.g. health information from a doctor, directory information for phone, information in a book however, measuring the amount of information acquired for a given fee is a problem which distinguishes information from other products Property rights who owns information? precondition for operation of a market is that property rights are created and can be protected difficult to assign property rights to information degree of appropriability (extent to which something can be owned) varies for information in the absence or limitation of property rights, suppliers will either find it unprofitable to produce the product, or if produced, tend to underproduce it the patent system tries to assign ownership to information to create incentives for its production if the producer cannot prevent use in the absence of payment, then the market mechanism cannot operate properly Public goods information is sometimes a public good in economics, a good is public if its use by one person does not prevent or curtail its use by another e.g. use of a car by one person denies its use by another a map is a common form of public good produced by governments at all levels other forms of spatial data produced by governments, e.g. Census statistics, are also public goods Value of information decision-makers seek data at reasonable cost to: reduce uncertainty in planning, investment, development decisions provide new opportunities information has value because it can be used to make decisions about the allocation of scarce resources lack of information results in decisions being made under uncertainty risk of bad decisions has an associated cost risks can be averted using information thus, value of information may equal the expected cost of the averted risk difficulties that apply to the definition of information as a legal entity also apply to the development of an economic model of the use of information and its value Information as evidence spatial information is a form of scientific evidence, contributes to resolution of conflict maps may be used as evidence in court during litigation all parties in litigation may have different views on maps and their interpretation in fact, the idea that a neutral agency might create a non-contentious set of spatial data for decision- making is inconsistent with the American legal system where individuals have the right to question the facts that are used to determine rights and interests as evidence, it is often assumed that spatial information has been created using scientific measurement, methods the assumption may not be true: some maps rely heavily on subjective interpretation maps contain errors maps can be used in inappropriate ways map designer has no control over map''s use C. LIABILITY liability is determined in cases where a person alleges harm from a poorly made decision land management decisions may be later shown to have been based on inaccurate information e.g. when spatial information is used, decisions are often made without expert knowledge of the forms and accuracy of spatial information and associated processes of data collection and compilation policy formation process is often based on small- scale, generalized maps policy is often implemented using large-scale, detailed maps the problems of scale change and generalization are rarely understood suppliers and users of spatial information are concerned about liability for such decisions there are three types - contract, negligence, strict product liability Contract liability arises where the terms of an agreement between producer and user allocate responsibilities contract provisions are upheld by courts important for those who make contracts for production of maps, digitizing, data conversion, software maintenance a contract should set standards for products and services difficult to establish standards for data accuracy cost of checking data to see if it meets standards - does every data item have to be checked? where computer or product is involved: seller should carefully describe standards that apply buyer should obtain warranty that product is suitable for intended purpose purpose must be clearly laid out in contract however, written contracts between public agencies and users of their maps are rare Negligence arises where a person fails to exercise the standard of reasonable care normally expected of someone in the same situation, and harm results courts and legislatures have defined "reasonable care" for many situations map-producers and users are often covered by this type of standard computer-based information presents additional problems so-called "computer error" is often found to be a failure of system owners or operators to respond to known bugs in these circumstances it seems likely that the basic principles of "reasonable care" will apply failure to select and maintain the hardware and software that executes the required tasks may constitute negligence failure to market capabilities of products accurately may be negligence Strict product liability user is required to show that the product is of an inherently dangerous nature e.g. recent lawsuits over children''s toys does not require that the injured party present evidence that the producer acted improperly D. LIABILITY SCENARIOS errors will occur in information services, data and products the appropriate level of care in design and operation of systems is difficult to establish three types of errors are important in liability: Errors in represented location typically result from measurement and data handling mistakes national map accuracy standards prescribe a reasonable frequency of errors in locations court is likely to consider the process of data entry and whether a reasonable level of care was established and used in design and implementation of the system, and emphasized in training in the case of Indian Towing Co. vs United States, federal government was held to have negligently failed to maintain a lighthouse whose location was marked on charts and whose character was described in the official Light List in Reminga vs United States the federal government was held to have inaccurately and negligently depicted the location of a broadcasting tower on an aeronautical chart, contributing to the mistakes and fatalities in an airplane accident Representations of error-free data e.g. Aetna Casualty and Surety Company vs Jeppeson and Company asserted that fatal plane crash resulted from defective aeronautical chart published by Jeppeson and Company chart by Jeppeson and Co. depicted the instrument approach procedure to an airport - information based on tabular data from the Federal Aviation Administration parties did not dispute the accuracy of the data on the chart but rather the graphic depiction of it the chart showed two views of the approach, one from above, one from side two views appeared to have same scale on chart - actually scales differed by factor of 5 court found the crew were misled by representation Unintended and inappropriate uses user lacks expertise in interpreting map, and has no access to map''s designers and compilers who could explain it e.g. in Zinn vs State: the state owns all land below the Ordinary High Water Mark (OHWM) of a lake evidence from botanists and surveyors at a regulatory hearing had established an elevation of 990 ft for the OHWM for a certain lake the report of the hearing included a 1:24,000 USGS quadrangle showing the OHWM, and thus the extent of the state''s land, defined by the 990 ft contour on the map an adjacent landowner sued the state for the harm resulting from temporarily claiming part of her land (temporary in that the agency subsequently rescinded the report) the state was held liable - the 990 ft contour had been drawn for purposes other than definition of property rights but its use to depict the OHWM also implied a specific boundary of the property in question - i.e. the property boundary at the lake is defined by the OHWM, not a line on a map inappropriate uses of data are likely to increase with GIS technology unless safeguards can be built into systems E. ACCESS AND OWNERSHIP the general goal of law in the areas of access and ownership is to make as much publicly held data available as possible, subject to reservations about personal privacy and commercial value Privacy and Confidentiality these laws protect individual and commercial aspects of property from excessive government and private power privacy is a recently recognized concept, largely governed by common law protection is provided through statutes that require data and information gatherers and managers to: provide physical security to personal and property records design systems that prevent inappropriate access to publicly held records experience with computer "viruses" casts doubt on possibility of complete security given present level of interconnection between computer systems privacy rights problems arise when information is converted to digital form i.e. property ownership records in paper files do not allow easy searching in digital form it is possible to search these records for any combination of attributes this may produce publicly available information that infringes on privacy rights of individuals balance between public access to information and individual privacy rights changes when publicly held data becomes digital Open Records Laws, Freedom of Information Act Open Records Laws (states) and Freedom of Information Act (federal) designed to provide citizens with reasonable access to publicly held records provides citizens with the basis for understanding government functions and actions which concern them Open Records Laws define what are records those records not open to general scrutiny the conditions under which copies can be made available Copyright designed to protect the commercial, proprietary aspect of creative works in some countries (e.g. UK) data from public agencies can be copyright copyright laws make it easier to establish ownership, and therefore value, of information in the UK public data can be sold by government agencies (e.g. digital data by the Ordnance Survey) at prices which allow full cost recovery in the US this is impossible because data produced by federal agencies cannot be copyright - prices of digital data typically cover only direct costs of copying - no control over resale of data by corporations Conflict of laws guarantees provided by open access can conflict with protections of privacy and copyright concept of public information involves a complex balance between access, ownership and economic factors REFERENCES Aronoff, S., 1989. Geographic Information Systems: A Management Perspective, WDL Publications, Ottawa. Chapter 8 examines responsibilities for accuracy and access to information contained within GIS. Epstein, E.F., 1988a. "Litigation over information: the use and misuse of maps," in Proceedings, IGIS: The Research Agenda, NASA, Washington DC, 1:177-84. Good overview of legal issues in the context of conventional and digital mapping. Epstein, E.F., 1988b. "Legal aspects of global databases," in H. Mounsey and R.F. Tomlinson, editors, Building Databases for Global Science, Taylor and Francis, Basingstoke, UK. Introduces international legal issues and reviews the legal problems involved in building global databases. Mackay, E., 1982. Economics of Information and Law, Kluwer Nijhoff, Boston. Chapter 5 discusses information, law and economics. DEVELOPMENT OF NATIONAL GIS POLICY INTRODUCTION GIS is a coalescence of many interests and fields: automation in the surveying and mapping industry automation of facilities management (AM/FM) demand for analysis and modeling to support resource management and planning interest in use of digital databases in marketing, transportation interest in applying the products of remote sensing need for automation of land records, and interest in multipurpose cadaster (MPC) each of these fields have their own societies and institutions, regulatory agencies in government, academic disciplines etc. coalescence leads to pressure for new institutional structures new series of conferences, e.g. GIS/LIS (San Antonio, 1988; Orlando, 1989) - jointly sponsored by surveyors and mappers (ACSM), urban managers and planners (URISA), geographers (AAG), private and public facility companies (AM/FM International) new structures in government - e.g. interdepartmental committees in some states, federal government new magazines, journals, textbooks, courses a clear national strategy could: speed the process of coalescence, e.g. by reorganization of government departments avoid duplication, mistakes, false starts provide much needed support for research and development promote training and education programs compare US attempts to develop national policy for MPC (see Unit 54 references) this unit looks at one country''s efforts to develop national policy the United Kingdom particularly, the role of the "Chorley Report" (DOE, 1987) B. BACKGROUND Predecessors Ordnance Survey Review Committee reported in 1979 covered role of digital technology within premier mapping agency House of Lords Select Committee on Science and Technology reported in 1984 first recognition of potential role of GIS technology in integrating all forms of geographically referenced data raised awareness of obvious potential for duplication, inconsistency and incompatibility between different forms of geographical data led to formation of Committee of Enquiry (Chorley Committee) Charge to the committee "to advise the Secretary of State for the Environment within two years on the future handling of geographic information in the UK, taking account of modern developments in information technology and market needs" similar to Congress''s 1989 charge to Department of the Interior in Public Law 100-409 (see references at end of Unit 53) with reference to land information (more narrowly defined than geographic information) Scope problems with interpretation of term "geographic information" in the charge thus, the committee included all information which can be related to specific locations on the Earth this is very broad - includes indirect as well as direct spatial referencing in fact, committee included: land and property data resource data - land use, ecological, environmental, etc. infrastructure data - utilities, facilities socioeconomic data - census statistics, health, etc. Membership of committee 11 members 65% from the private sector - vendors, utilities, market research companies, etc. chair (Lord Chorley) is a member of the House of Lords, accountant with major international management consultancy, familiar with subject, in part from work on previous House of Lords committee Role of committee many systems were in process of rapid development in UK in all these areas many were dependent on government agencies as sources of data, standards, policy committee''s charge required it to define the role of government in fostering, coordinating, supporting system development identified the factors which are important in determining the way the technology is adopted and developed: the costs of adoption, particularly in staffing, training, equipment variations in the availability of data need for development of faster, more flexible, easier to use tools variation in awareness among managers of the benefits of GIS technology shortage of skilled personnel needed to define what role government, national policy can play in controlling these factors Comparison with North America evidence presented to the Committee indicated that the UK lagged behind North America in many of these areas lack of training and awareness was more critical much of the technology had been developed in North America these problems are likely even more severe in other countries, e.g. Eastern Europe Relationship to other technologies GIS is a comparatively small market segment many key technical developments originated in other areas e.g. peripherals developed for larger CAD markets other technologies may be less affected by non-technical factors lack of training less of a problem in more mature markets like CAD other technologies may be less innovative, require less reorganization, e.g. word processing C. RECOMMENDATIONS Digital mapping in UK, Ordnance Survey has copyright over its products, virtual monopoly over large-scale mapping government policy requires OS to stress cost recovery increasing demand from utilities for digital versions of basemaps accuracy levels required by utilities were substantially below those of OS private sector can produce digital data to utilities'' specifications at substantially lower cost than OS OS''s monopoly and copyright are under pressure from private sector committee encouraged OS to seek joint agreements with the private sector to relieve pressure Availability of data first comprehensive list of geographically related data holdings in government was prepared for committee evident that data were not sufficiently accessible to users outside government because of real or imagined concerns for privacy and confidentiality because government rules prevented departments from repackaging data and receiving financial benefit from sale Linking data sets difficult because of e.g. incompatible reporting units for social statistics committee recommended maximizing use of common geographical referencing systems extend postal codes from limited application in mail to general role as reporting units for statistics of all kinds need for further development of data transfer standards Awareness, education and training recommended setting up demonstration projects need for expanded training courses, new teaching packages, greater role in Business education Research and development generally, the report stressed the non-technical impediments to GIS adoption need for R&D in both fundamental and applied areas particular stress on the development of intelligent Knowledge Based Systems which incorporate rules derived from human experience development of better tools for estimating reliability of information from GIS Role of government government is one of the biggest users of GIS, also the biggest supplier of geographical data its level of commitment is critical to the development of the field potential roles of government in: development and implementation of standards legislation on relevant issues, e.g. copyright funding education programs carrying out or funding R&D increasing accessibility of data many submissions to committee urged establishment of a government organization to coordinate GIS committee recommended a Centre for Geographic Information (CGI) as: promoter of technology advisor on national GIS policy focus for users D. GENERAL FINDINGS emergence of a discipline through coalescence of common interests usefulness of maps is increased enormously by digitizing, but digital systems allow access to vast stores of non- map data as well geographical data for small areas is very useful in social planning, but government must play an important role in handling such data to prevent invasions of privacy it is impractical to assemble all geographical data in one national archive - the role of government should be to increase access to geographical data through directories, compatibility etc. the commercial opportunities of GIS technology will continue to expand rapidly and internationally change in UK government policy since 1979 has had a profound effect on data collecting agencies because of pressure for cost recovery E. OUTCOMES the key technical recommendations - role of postcodes, production of digital basemaps - were rejected in the official government response government also rejected the recommendation for a Centre for Geographic Information (in effect, rejected the recommendation that it take the lead in organizing and funding the Centre) with no new organizational structure, there is doubt about whether the more far-reaching recommendations can be implemented efforts are under way to form an organization outside government to play at least part of the role intended for the Centre many non-technical recommendations were accepted and many are being implemented by relevant departments e.g. restructuring of legislation to make it easier to share and access data the impact of the committee''s meetings, background work for submissions, publicity given to the report may have had more impact than the recommendations possibility of similar exercises in other countries, e.g. BLM report under PL 100-409 F. RELATED ACTIVITIES IN OTHER COUNTRIES different countries have focused interest in the development of GIS in different ways (the following based on information in Shepherd et al, 1989) several aspects vary from country to country: perception of priorities in GIS scale of funding governmental/institutional context extent of involvement of the private sector emphasis upon applied as opposed to fundamental research other national initiatives include: UK Regional Research Laboratories established before the completion of the Chorley Report by the UK Economic and Social Research Council objectives include carrying out basic and applied GIS research, training, providing data services and promotion of the use of GIS in general U.S. National Center for Geographic Information and Analysis funded by the National Science Foundation created to promote basic research in GIS and to improve the education of GIS professionals The Netherlands research consortium funded by the Netherlands Science Research Council for four years at the University of Utrecht, the Technical University of Delft, the Agricultural University of Wageningen and the International Training Center at Enschede France creation of the Maison de la Geographie in Montpelier providing a network linking 49 research teams in France REFERENCES Department of Environment, 1987. Handling Geographic Information. Her Majesty''s Stationery Office, London. The full Chorley Report. Lord Chorley, 1988. "Some reflections on the handling of geographical information," International Journal of Geographical Information Systems 2:3-10. Views from the chair, including a summary of the report''s conclusions. Rhind, D. and H. Mounsey, 1988. "The Chorley Committee and "handling geographic information"," Proceedings, Third International Symposium on Spatial Data Handling. International Geographical Union, Columbus, Ohio, 407-21. Excellent summary of the Chorley Committee and its report. Shepherd, et al, 1989, "The ESRC''s Regional Research Laboratories: An Alternative Approach to the NCGIA?," AutoCarto 9, Sydney, Australia. Tomlinson, R.F., 1987. "Current and potential uses of geographical information systems: the North American experience," International Journal of Geographical Information Systems 1:203-18. Based on a background paper for the Chorley Committee which appears in the report''s appendices. Ventura, S.J., 1990. "Federal land and geographic information system activities," Photogrammetric Engineering and Remote Sensing 56(5):631-4. A useful review of the need for coordination and standardization in the federal government. GIS AND GLOBAL SCIENCE A. INTRODUCTION B. SOURCES OF GLOBAL DATA Remotely sensed imagery Terrestrial-based sources C. CHALLENGES TO DATA INTEGRATION Multiple sources Data volumes Geometric rectification, geographic referencing Issues of data storage Database model Documentation, access, dissemination, archiving Internal dataset consistency Merging terrestrial and satellite data In summary D. EXAMPLES OF DATABASES AT GLOBAL SCALES CORINE UN Environment Program GRID project Global Change Diskette Project Digital Chart of the World REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES UNIT 72 - GIS AND GLOBAL SCIENCE Compiled with assistance from Helen Mounsey, Birkbeck College, University of London A. INTRODUCTION why do we need GIS and databases for the globe? ever-increasing concern over the quality of the earth''s environment frequent press reports on issues such as global warming and the greenhouse effect, the ozone hole, deforestation and water pollution these are global issues, but we can also identify disasters, which, although local in origin, have pronounced continental or global scale consequences for example, the Brundtland Report noted that during the 900 days the World Commission on Environment and Development was at work: the African drought put at risk the lives of 35 million people, and probably killed up to 1 million of them the leak at a chemical factory in Bhopal, India, killed 2000 people and injured 200,000 more the explosion at a nuclear power plant at Chernobyl, USSR, caused environmental damage throughout Europe a chemical fire in Switzerland caused toxic materials to be transported by the Rhine as far as the Netherlands at least 60 million people died of diarrhoeal diseases caused by malnutrition and dirty water of these, only the Bhopal incident could be argued to be local in its effects there is clearly an ever-greater need to monitor processes at a global scale in order to gain knowledge of the earth''s processes and how these affect and are affected by human activity this knowledge is very sketchy at present two developments contribute to improving the situation: technical development and ever increasing speed and power of digital computing increasing sources of data for use in environmental modeling the ultimate aim is a global database and associated GIS (access and analysis system) at a large enough scale (e.g. > 1:250,000) and with fine enough resolution (e.g. < 250 m) to enable environmental scientists to develop models which replicate, as near as is possible, the earth''s processes would assist in data integration and visualization at global scales B. SOURCES OF GLOBAL DATA global databases are derived from two sources remotely sensed imagery terrestrial-based sources - analog maps, statistics and digital data recording Remotely sensed imagery aircraft and (more usually) satellite-borne sensors provide much information at a global scale for environmental analysis characteristics: usually global (or near-global) coverage repeated coverage over intervals of hours to days (depending on sensor) enables construction of time series spatial resolution of data is improving, e.g. for example Landsat MSS - 80 m, SPOT - 10 m very many existing sources of remotely sensed imagery, the largest contributor being NASA major new development is the NASA Earth Observing System (EOS) comprehensive information system - includes data processing, access and analysis capabilities as well as hardware aims to be international in system provision, use and benefit will provide consistent, long-running datasets into the 1990s and beyond EOS is based on the collection of data from two proposed NASA polar platforms, one European Space Agency platform and one Japanese polar platform (a polar platform is a satellite in an orbit which passes over both poles) this will generate a massive dataflow (estimated at 1012 bytes (1 terabyte) per day) Terrestrial-based sources Analog maps and tabular statistics digital data derived from maps are an important contributor to global databases, and, as a data source, complementary to remote sensing usually based on ground survey or checking, digitized cartographic data can provide: human assigned attributes (e.g. place names or administrative boundaries) a more useful / detailed classification of features a historical (pre ''advent of remote sensing'') data source to be useful the maps from which these data are derived should be: part of a series which offers global coverage and is based on common standards of accuracy of source material and common cartographic conventions at scales larger than 1:1 million - smaller scales are too highly generalized to represent reality with any degree of utility and are of use only for general reference maps are a frequent source of data on topography, soils, geology etc. tabular statistics originate from many national organizations (e.g. census gathering agencies) and are collected by international organizations into databanks (e.g. the UN, World Bank, OECD, etc.) mostly this provides a source of socio-economic data on the ''human'' element in global modeling Digital data recording this source of data results from automatic data logging mostly in the geo-physical and climatological sciences collected mostly on a national basis then assembled into international databases some examples include: the World Data Centers 27 centers worldwide who coordinate the global collection of data determine standards for collection and documentation hold multiple copies of the resultant datasets distribute them freely throughout the world emphasis on physical data geology, geophysics, meteorology, atmospheric physics, oceanography the World Meteorological Organization under the World Weather Watch program collects and supplies members with observational data and processed products for meteorological forecasting there are many other such international organizations, gathered together under the auspices of the International Council of Scientific Unions ICSU has endorsed the establishment of the International Geosphere Biosphere Project (IGBP), which has the long-term aim of describing the various processes which affect the Earth''s environment, and the manner in which they may be changed by human action GIS AND GLOBAL SCIENCE C. CHALLENGES TO DATA INTEGRATION Multiple sources global modeling and prediction will in most cases demand data from multiple sources often there will be a mixture of remote sensing and analog input remotely sensed data is global in coverage and updated frequently remotely sensed data is most useful when calibrated with ground-based data but, ground-based data often lacks global coverage and is updated infrequently Data volumes possibly the most pressing problem, especially as far as remotely sensed sources of data are concerned volumes are potentially huge surface area of earth is order 1014 sq m single coverage of SPOT imagery at 10 m resolution is order 1012 pixels assuming a single value / pixel is stored in 1 byte - then dataset is order 1012 bytes, or 1 terabyte note that this is for only one coverage! most application will require more than one coverage in time series, and possibly data from other sources as well note that this is for current SPOT platform - future EOS will generate order one terabyte per day - this is 104 conventional magnetic tapes per day, or over a mile of shelves in a conventional tape library per week a number of other problems are a consequence of such massive data volumes: Geometric rectification, geographic referencing global databases must be referenced to a common coordinate system if they are to be merged and manipulated from a number of sources conventionally, latitude / longitude is used the cost of installing a referencing system into remotely sensed datasets may be prohibitively high Issues of data storage simple raster data structures are inadequate if rapid access is required for browsing and retrieval possible solutions include: vector - but spatial relationships in the data must be stored (which increases data volumes further) or computed every time (which increases access times further) hierarchical - structures based on recursive subdivision of the earth''s surface various forms of data compression Database model must be multi-purpose and global scale the number of possible relationships is large the object definition is inexact (what may be a point at one scale is an area at larger scales) Documentation, access, dissemination, archiving there is a not-insignificant administrative problem in devising methods of user access to global databases how to document datasets for international, multidisciplinary use? how to enable the user to access a centralized database, probably over computer networks? how to disseminate data and documentation - in what format and on what physical medium? how to handle the costs of archiving such large databases? are dual copies of every dataset strictly necessary? Internal dataset consistency have all the individual datasets being merged into a global database been collected and classified to consistent and high standards of accuracy, with a common definition of variables? this is less of a problem with remotely-sensed data can be a serious problem with terrestrial-based sources: e.g. there is no consistently produced topographic map series of the world at a scale greater that 1:1 million e.g. for soils, largest scale is 1:1.5 million, with considerable disagreement between soil scientists over a consistent, global classification of soil type e.g. there is no strictly consistent definition of "total population" in the UK Census of Population through time (some years include visitors, etc.) this is a problem within a well established national data source when multiplied to international scales such problems may become insurmountable Merging terrestrial and satellite data what errors may be generated through this process? how are missing data handled? In summary there are problems of data acquisition, in particular from terrestrial sources there are problems of spatial and temporal inconsistency both within and between datasets we have limited experience in handling very large databases, with consequent issues of structure, access and administrative support the cost of all this may at least in the short term limit the development of global databases the increasing application of GIS is, however, critical, to enable users to: merge datasets from widely disparate sources handle, analyze and map the results model environmental processes at a global scale D. EXAMPLES OF DATABASES AT GLOBAL SCALES very few truly global environmental databases at present some are developing at a continental scale, e.g. CORINE CORINE Co-Ordinated Information on the European Environment established as a project in 1985, to build an environmental database covering the 12 Member States of the European Community (2.25 million sq km) has now assembled a large number of consistent datasets into a centralized database these include: topography soils climate nature reserves and other sites of scientific importance water resources atmospheric pollution to be of use to policy makers, these are supported by socio-economic data certain key findings from the project: many datasets are unavailable for reasons of cost, confidentiality, administrative inadequacies or non- collection in certain countries where available, they may mask massive discrepancies in data collection methods and huge internal inconsistencies e.g. in a climatological dataset we find 8 methods of calculation for evapotranspiration, and 5 for maximum monthly temperature enormous problems in merging datasets derived from maps of different scales and projections merging larger scale dataset derived from remotely- sensed sources with smaller scale ones from terrestrial sources involved fundamental decisions on generalization vs. loss of detail important issues of user access and data use, especially by unskilled users who may not understand the ''fuzzy'' nature of some of the datasets, and the likelihood of error propagation through application of GIS techniques nevertheless, the project is expanding both in content and scale likely to be subsumed into the Environmental Agency being established in the European Community for the provision of technical, scientific and economic information for use in environmental monitoring UN Environment Program GRID project GRID = Global Resources Information Database established in 1980, and now based in Nairobi GRID aims to: establish global, regional and, in some cases, national environmental datasets of known quality establish computer systems which can handle these establish regional nodes for the dissemination of local sub-sets of data train scientific staff in the use of this information unlike CORINE, it draws heavily on data from remotely- sensed sources (from NASA), and also from other bodies such as FAO (Food and Agriculture Organization of United Nations), UNESCO (United Nations Educational, Scientific and Cultural Organization) and IUCN (International Union for the Conservation of Nature) much of its work has been at regional or continental scale thus far e.g. projects on sea level rise in the Mediterranean, and the distribution of elephants in Africa moving towards global-scale studies e.g. global deforestation project Global Change Diskette Project a project of the International Geosphere-Biosphere Program a project designed to create and distribute to research groups, particularly the developing countries, medium- resolution digital data sets on diskettes for micro- computers contains satellite imagery and complementary thematic data Digital Chart of the World sponsored by the Defense Mapping Agency contracted to ESRI source - the Operational Navigational Charts coverage at 1:1 million of all the world''s land area show elevation (500 m contours), cultural features, hydrography maintained for air navigation currently being digitized is intended to be a general source of high resolution cartographic data for the globe to be delivered in 1991 on CD-ROM REFERENCES Most of the material in this unit is extracted from various papers in: Mounsey, H.M. (Ed.), 1988. Building Databases for Global Science, Taylor and Francis, London. See in particular papers by Simonett and by Peuquet in Part Two, and by Mooneyhan (on GRID) in Part Three. Additional material: Briggs, D.J. and H.M. Mounsey, 1989. "Integrating land resource data into a European Geographical Information System," Journal of Applied Geography 9:5-20. A good source on the CORINE project. IGBP, 1988. Global change report #4: a plan for action, International Geosphere Biosphere Project, Stockholm. Many other reports on global science are available from IGBP, ICSU and NASA. DISCUSSION AND EXAM QUESTIONS 1. Discuss the relative advantages of the various spatial data models in global database building. Give examples of datasets which might be best suited to each type. 2. The greatest problems in the construction of global databases lie not with the datasets, hardware or software, but with the "liveware" - the human element of use (or abuse!) of the databases. Discuss some of the issues which might lie behind this statement. 3. Select one of the major disasters mentioned in this unit (or another known to you of similar magnitude). Discuss likely sources of data, and particular GIS techniques, which you would use to address this problem and its associated issues. 4. Some parts of the world are relatively rich in spatial data, and others are relatively poor. Examples of the latter include much of the Third World and Antarctica. Because of gaps in coverage and variable quality it could be argued that the globe as a whole is data-poor. Is spatial data handling technology more or less valuable in data-poor areas? Discuss the arguments on both sides of this issue. GIS AND SPATIAL COGNITION A. INTRODUCTION B. SPATIAL INFORMATION FROM GIS Components of the user interface Fundamental questions C. SPATIAL LEARNING 1. Developmental psychology perspective 2. Cognitive and environmental psychological perspective D. FORM OF SPATIAL REPRESENTATION Images or propositions? Hierarchical or non-hierarchical structures? Frames of reference E. EFFECTS OF INTERNAL REPRESENTATION ON SPATIAL REASONING Causes of errors in spatial reasoning F. HOW DOES NATURAL LANGUAGE STRUCTURE SPACE? Examples Fuzziness G. RELEVANCE TO GIS Design of better user interfaces and query languages Design of universal GIS systems New database models Improved data entry techniques Expert Systems REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES UNIT 73 - GIS AND SPATIAL COGNITION Compiled with assistance from Suchi Gopal, Boston University A. INTRODUCTION the next two units (73 and 74) examine advanced topics: knowledge based techniques spatial cognition both are efforts to deal with the complexity of real GIS applications complexity of real-world problems - the number of goals and issues which have to be dealt with in real problem- solving complexity of the knowledge and rules which can be brought to bear on a problem complexity of the man/machine interaction which ultimately determines the effectiveness of GIS interest in both areas is high progress is still largely in the research domain B. SPATIAL INFORMATION FROM GIS GIS are tools for supporting human decision-making in applications such as car navigation systems, electronic atlases, GIS are tools to help people acquire spatial information, learn about geography e.g. research is under way on the design of a portable GIS to help visually impaired people navigate in complex spaces the information acquired through a GIS is used in this case to make simple route-finding decisions the interface between the GIS and the user is a filter which determines how successfully information can be transferred Components of the user interface physical design - keyboard, mouse, tablet, color or monochrome, screen resolution, sound, speech recognition how can these be combined to maximize transfer of information? a car navigation system might use either a display screen with a map, or spoken instructions (e.g. turn left), or some combination - which is the most effective mode of communication between the GIS and the driver? functionality - what set of operations is allowed? control technique - commands, picking from menus, pointing at icons? Fundamental questions to design effective user interfaces we need to know more about how people learn, reason with spatial information issues that need to be addressed are: 1. spatial learning how is spatial knowledge learned or acquired by people? 2. form of internal spatial representation what is the nature of people''s internal representation of space? how is spatial information stored in the brain? can this help us design ways of representing spatial data in GIS that lead to better user interfaces? 3. effects on spatial reasoning how does this internal spatial representation affect decision-making and behavior e.g. navigation, search for housing how do people''s naive-models of space lead to errors in geographic reasoning? how to design GIS user interfaces to minimize these errors? 4. natural language how does the language people use to communicate (natural language) affect their ability to deal effectively with spatial information? would GIS interfaces be more effective if they used natural language to describe spatial relations? 5. relevance to GIS how should the results of research on these fundamental questions be used to improve GIS user interfaces? this unit looks at each of these major issues C. SPATIAL LEARNING how do people learn about space and the objects and routes within it? two disciplinary perspectives 1. Developmental psychology perspective study of the qualitative changes in the cognitive and perceptual development of a child most influential theory of spatial learning is the developmental stage theory proposed by Piaget describes the stages in a child''s development of spatial skills 4 stages sensorimotor stage - from birth to about 2 years - locations of all objects with reference to self preoperational stage - 2 to 7 years of age - simple spatial problems are solved - an understanding of spatial relations between objects and self concrete operational stage - 7 to 11 years of age - properties of Euclidean space are understood - more complex spatial problems are solved - e.g. concept of reversibility - n steps in one direction followed by n steps in the opposite direction returns one to the same place formal operational stage - 11 to adulthood - child masters more abstract spatial problems - self and other objects located in an independent frame of reference e.g. child begins to understand simple spatial relations - "in front of", "left of" in stage 2 - abstract coordinate systems such as UTM are not understandable until stage 4 2. Cognitive and environmental psychological perspective studies the sequence of development of knowledge about a space by an adult many alternatives proposed (see references) - following is a consensus: landmark knowledge - ability to recognize certain features, but no knowledge of their locations or relationships between them procedural knowledge - knowledge of certain routes, and the procedures necessary to navigate from one end to the other topological knowledge - knowledge of how the known routes intersect and form a network - ability to combine parts of known routes into new routes metric knowledge - ability to recall metric relations between locations - distances, angles - this level of knowledge is needed to reason about previously untraveled routes and shortcuts D. FORM OF SPATIAL REPRESENTATION how do our minds construct mental images of the world which somehow capture its basic properties and structure? three major questions: what form of representation? images or propositions? what types of structures are used in the representation of spatial relations? hierarchical or non-hierarchical? what frames of reference are used? Images or propositions? images preserve the visual properties of objects and relations between them the "map in the head" propositions provide abstract representation of both verbal and visual information e.g. images of street maps, memory of street names, verbal directions one form can be generated from the other if we assume the mind is capable of simple processing compare the GIS''s ability to compute vector from raster impossible to determine if one form is more accurate than the other as a model of the way people store spatial information Hierarchical or non-hierarchical structures? hierarchical structures represent spatial information in a nested fashion local and global are different levels of a tree compare hierarchical data structures, e.g. quadtrees non-hierarchical structures have no clear differentiation of levels Frames of reference 1. egocentric frame moves with the individual - objects are always represented in their relationship to the individual 2. environmental frame uses a local point as reference, moves when the individual moves from one local area to another 3. global frame is constant irrespective of the location of the individual E. EFFECTS OF INTERNAL REPRESENTATION ON SPATIAL REASONING internal representation can be identified by the pattern of errors it produces errors in direction, distance, orientation, judgment of spatial relations have been studied Causes of errors in spatial reasoning lack of explicit representation in memory not all information is perceived or remembered use of incorrect procedures in storing or retrieving information e.g. errors because of incorrect rotation of information to or from internal alignment suppose the "mental map" has North at the top errors can be made in reasoning about which way to turn when approaching a junction from the North natural language used to describe spatial relations may be vague or context-dependent e.g. "is north of" does not indicate how far or how exactly north decay of information processing constraints limits to size of memory storage and type of representation e.g. Reno, NV is actually to the west of San Diego, CA, however, because CA is largely west of NV and the mind stores a hierarchical relationship between states and cities, we expect Reno to be east of San Diego GIS AND SPATIAL COGNITION . HOW DOES NATURAL LANGUAGE STRUCTURE SPACE? natural language appears to affect the way we think and reason about space basic components of spatial information - objects, relations between objects, motion - are roughly equivalent to nouns, locative expressions and verbs in natural language however correspondence is not exact natural language reflects the human view of the world, is more complex than abstract mathematical structures it may be very difficult to represent the complex human view of the world within a digital system Examples use of prepositions to convey spatial relations is subject to complex, hidden rules "in", "on", "between", "across", "near" convey complex meanings e.g. we say "the car is near the house" but not "the house is near the car" - why? e.g. "across the lake" suggests different spatial relationship than "along the lake" e.g. in North America we live "in" a city but "on" a street the structure of names has hidden meanings e.g. whether the word "lake" occurs first or second in a name is determined to some extent by its size - "Lake Erie" vs. "Trout Lake" - but "Great Bear Lake" is very large nouns can be chosen to convey spatial relations e.g. "timber" has no spatial meaning by itself, but "stand of timber" suggests a small area occupied by trees - "forest" suggests a large area of trees translation of prepositions from one language to another poses enormous problems a multilingual natural language interface for a GIS would have to deal with these Fuzziness the spatial relationships defined by natural language are fuzzy and context-dependent e.g. meaning of "near" an object depends on the size of the object and is imprecise a natural language GIS interface would have to know the range of distances conveyed by "near" G. RELEVANCE TO GIS research in the area of spatial cognition can have several benefits for GIS development, including: Design of better user interfaces and query languages given the problems of determining the meaning of natural language, are natural language interfaces worth pursuing? yes, because some applications must use natural language, e.g. GIS for the visually impaired yes, because other forms of interface may be impractical, e.g. car navigation aids must not distract the driver''s visual attention to the road yes, because some applications require more than one mode of interaction to maximize effectiveness, e.g. voice can be used in digitizing to augment input from cursor Design of universal GIS systems such systems should be compatible with cognitive models of the way we perceive and structure space thus would avoid costly problem of transferring GIS technology between different countries and languages New database models understanding how spatial information is represented internally may provide novel designs for database models permit representation is transformed from natural language into GIS database and vice versa Improved data entry techniques natural language is the simplest way of collecting information about the world, but difficult to formalize into precise structures in a digital environment Expert Systems knowledge of how spatial information is stored and processed will provide fertile input to the design of intelligent expert systems for spatial information REFERENCES Herskovits, A., 1987. Spatial Prepositions in English. Cambridge University Press. Interesting book on the use and meaning of spatial prepositions. Kuipers, B., 1978. "Modeling spatial knowledge," Cognitive Science 2:129-53. One of the most influential papers on the classes of spatial knowledge. Piaget, J. and B. Inhelder, 1967. The Child''s Conception of Space. The classic developmental theory. Talmy, L., 1983. "How language structures space," in H. Pick and L. Acredolo, editors, Spatial Orientation: Theory, Research and Application, Plenum Press, New York. Argues that language affects the ways in which we think about spatial relationships. EXAM AND DISCUSSION QUESTIONS 1. Summarize the arguments for believing that an understanding of processes of spatial learning and reasoning is essential if we are to design better GISs, particularly better user interfaces. 2. What would be the desirable functions and other characteristics of a portable GIS for the visually impaired? 3. A paper by Openshaw and Mounsey ("Geographic Information Systems and the BBC Domesday Interactive Videodisk," International Journal of Geographical Information Systems 1:173-180, 1987) describes the design of the BBC Domesday Project, a form of electronic atlas using optical disk technology. What features of the conventional atlas does this system implement? In what ways does it go beyond the capabilities of the conventional atlas? How might principles of human spatial learning and reasoning be combined with the capabilities of GIS to significantly improve the usefulness of the atlas concept? (Note: a number of other atlas-like digital products are available and might be used as similar bases for discussion.) 4. A simple way to illustrate the problems of spatial relations in natural language is to take a formal representation of some spatial data - e.g. a small part of a topographic map or a city street map. One person is asked to describe the contents of the map using only natural language to another person, who must then try to reconstruct the map. Both are aware of the rules governing the map''s contents, e.g. contour interval. The participants could be asked to summarize the results, including the role of non- verbal communication, e.g. facial expressions and gestures. KNOWLEDGE BASED TECHNIQUES INTRODUCTION Example Elements of knowledge based systems Expert system "shells" B. KNOWLEDGE ACQUISITION Example of knowledge base constructed by experts Examples of knowledge inferred from interaction with experts C. KNOWLEDGE REPRESENTATIONS Trees Semantic networks Frames Production rules D. SEARCH MECHANISMS E. INFERENCE F. ISSUES REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES UNIT 74 - KNOWLEDGE BASED TECHNIQUES Compiled with assistance from David Lanter, University of California, Santa Barbara A. INTRODUCTION many geographical problems are ill-structured an ill-structured problem "lacks a solution algorithm and often even clear goal achievement criteria" goals are poorly defined data may be incomplete, or lack sufficient spatial resolution problem is complex - large volume of knowledge may be relevant to the problem e.g. past experience with similar cases e.g. precise knowledge in certain narrowly defined parts of the problem a DSS is one response to ill-structured problems concentrates on delivering a wide range of functions to the user, rather than one solution leaves the user with the role of expert knowledge based techniques are another concentrate on making use of all available knowledge goal is to emulate the reasoning of an expert system takes the role of expert the term "artificial intelligence" suggests the role of the machine in emulating the reasoning power of humans Example where to put a label in a polygon? (the "label placement problem") - important in designing map output from GIS goals are poorly defined - "maximize legibility", "maximize visual impact" cannot turn goals into simple rules one rule might be "draw the label horizontally, centered at the centroid" easy to turn the rule into an algorithm rule is too simple - no good if the centroid lies outside the polygon - not clear how it affects legibility, visual impact an expert system or knowledge based system should know when to use this rule, when not - may be many such rules there have been many attempts to reduce the label placement problem to a set of simple rules and build these into an "expert system" ideally, the expert system could then perform the functions of a cartographer Elements of knowledge based systems techniques for acquiring knowledge ways of representing knowledge internally computers are good at representing numbers, words, even maps, but knowledge is potentially much more difficult search procedures for working with the internally stored knowledge inference mechanisms for deducing solutions to problems from stored knowledge Expert system "shells" are software packages with functions which help the user construct special-purpose expert systems provide a framework for organizing and representing knowledge provide procedures for accessing knowledge in order to respond to queries or make decisions example applications of shells: building a system to make medical diagnoses - emulating the medical expert building a system to emulate the cartographer''s knowledge of map projections, to pick the best projection for a particular problem B. KNOWLEDGE ACQUISITION how is a knowledge base constructed? two approaches: by asking experts to break their knowledge down into its individual facts, rules etc. by deducing rules from the behavior of experts both have been used in a GIS context Example of knowledge base constructed by experts local government agency responsible for regulating land use in vast sparsely populated area - small staff must consider many hundreds of applications for land use permits annually, mostly from oil companies with large budgets and armies of lawyers decisions are subject to complex system of regulations, laws, past precedents, guidelines decisions must be defensible in court desirable to know precise regulations, rules etc. which led to each decision decisions must not be held to be arbitrary or capricious basic data - vegetation, soils, wildlife, geology etc. - in GIS knowledge base of all regulations, laws, precedents, guidelines decisions can be generated from knowledge base Examples of knowledge inferred from interaction with experts Knowledge Based GIS (KBGIS) developed by Smith and others system can reduce query time by anticipating queries e.g. certain overlay operations can be done in advance if the results will be needed frequently, redone when updates occur e.g. certain topological relationships might be computed in advance and stored KBGIS analyzes queries received to "learn" about the pattern of queries and organize its database to optimize response examines whether retrieving a stored fact takes longer than deducing it from other facts if deducing it takes longer, the fact will be stored the first time it is deduced - subsequently it will be retrieved rather than deduced systems such as KBGIS learn about important spatial facts through the user''s interaction with the system C. KNOWLEDGE REPRESENTATIONS data structures in which knowledge can be stored more general than conventional databases four general methods for representing knowledge - trees, semantic networks, frames, production rules Trees way of organizing objects that are related in a hierarchical fashion tree structures are common in geographical data e.g. quadtrees and octrees e.g. hierarchical nesting of census reporting zones Semantic networks knowledge is organized as a set of nodes connected by labeled links an algorithm can follow the links e.g. topological data structures for road and river networks, boundaries of polygons (arcs) the GIS operations required to build an information product from input data layers can be visualized as a network of nodes and links the links are GIS processes or functions, the nodes are datasets this is a useful way of tracking the propagation of error through processes (links) new datasets (nodes) inherit the inaccuracies of their predecessor datasets Frames usually consist of the name of a phenomenon and the attributes that describe it attributes are called "slots" increasing availability of frame based expert system shells Production rules consist of two parts - situation part and action part if situation exists, do the action by convention left side is situation, right side is action most popular knowledge representation in geographical applications of the four areas of GIS - input, output, analysis and storage - output is most fully explored production rules used in output for label placement, assignment of class intervals to choropleth maps, choice of projection production rules for GIS analysis used in planning and resource management production rules for GIS input center on scanning - rules for interpreting the image seen by the scanner, and vectorizing the image to create objects D. SEARCH MECHANISMS need a procedure for accessing knowledge "brute force" procedures test all knowledge contained in the database to obtain the best answer - only practical for small knowledge bases and simple problems "heuristic" search procedures use rules designed to obtain the best answer or one close to it while minimizing search time each knowledge representation has associated search mechanisms rules for searching trees dictate the branch to be taken at each fork semantic networks are searched by examining the links at each node frames - search for relevant frames, then relevant slots for production rules, look for matching conditions on the left side of each rule KNOWLEDGE BASED TECHNIQUES E. INFERENCE is the creation of new knowledge the solution to any problem is new knowledge which can be stored in the system a knowledge base can continue to grow as more knowledge is inferred from the existing base e.g. a GIS can create new knowledge by computing topological relationships between objects from their geometrical relationships deductive inference: creates new knowledge from existing facts through logical implication, e.g. using production rules e.g. if A=B and B=C, then the system can deduce that A=C inductive inference: produces new generalizations ("laws") which are consistent with existing facts e.g. if the database contains the knowledge that area A is woodland and area B is woodland, and no information on any other area, the system might infer that all areas are woodland F. ISSUES knowledge based systems have been only moderately successful in areas where problems are relatively straightforward e.g. medical diagnosis several factors may impede greater use: high cost of developing system - building the knowledge base uniqueness of every application dynamic nature of knowledge - knowledge base is not static inadequacy of alternatives for knowledge representation - few examples fit precisely within any one form, e.g. production rules unwillingness to trust the decisions of a machine (no "bedside manner") response time deteriorates rapidly as knowledge base grows most knowledge is "fuzzy" or uncertain - system must return many possible answers to a problem - few problems have a precise, single answer - technical difficulties of representing and processing fuzzy knowledge poor design of user interface - not "user friendly" user often wants the reasoning behind a decision, not just the decision itself some of the most successful applications have been for instruction e.g. use of medical expert system to develop diagnostic skills - encouraging students to structure knowledge and process it systematically in response to a problem as precise, analytical models of knowledge and the ways in which it is used, expert systems can enhance our understanding of human decision-making processes - e.g. how does a cartographer position labels on a map? REFERENCES Texts on artificial intelligence and expert systems: Luger, G.F. and W.A. Stubblefield, 1989. Artificial Intelligence and the Design of Expert Systems, Benjamin/Cummings Publishing Co, Redwood City, CA. Tanimoto, S.L., 1987. The Elements of Artificial Intelligence, Computer Science Press, Rockville, MD. Winston, P.H., 1980. Artificial Intelligence, Addison-Wesley, Reading, MA. KBGIS: Smith, T.R. and M. Pazner, 1984. "Knowledge-based control of search and learning in a large-scale GIS," Proceedings, International Symposium on Spatial Data Handling, Zurich, 2:498-519. Smith, T.R. et al., 1987. "KBGIS-II: a knowledge-based geographical information system," International Journal of Geographical Information Systems 1:149-72. Other: Freeman, H. and J. Ahn, 1984. "AutoNAP an expert system to automate map name placement," Proceedings, International Symposium on Spatial Data Handling, Zurich, pp 556-571. Design of an expert system for polygon label placement. Imhof, E., 1975. "Positioning names on maps," The American Cartographer 2. An analysis of rules for label positioning. Kubo, S., 1986. "The basic scheme of TRINITY: a GIS with intelligence," Proceedings, Second International Symposium on Spatial Data Handling, Seattle. International Geographical Union, Commission on Geographical Data Sensing and Processing, Williamsville, NY, 363-74. Walker, P.A. and D.M. Moore, 1988. "SIMPLE: an inductive modelling and mapping system for spatially oriented data," International Journal of Geographical Information Systems 2:347-63. EXAM AND DISCUSSION QUESTIONS 1. Compare the use of knowledge bases and inference in Smith''s KBGIS, Kubo''s TRINITY and Walker and Moore''s SIMPLE. What general principles of knowledge based systems do they each exploit? Which application do you consider the most successful? 2. Artificial intelligence has often been called the study of a set of unsolved problems. However, once an algorithm has been devised to solve a given problem, it becomes simply a solved problem, no longer meriting the mystique associated with the term "artificial intelligence". Do you agree? 3. What areas of GIS - applications, input techniques, processes etc. - do you consider most suitable for development of expert systems? 4. Discuss the differences between spatial decision support systems and knowledge based systems as alternative approaches to solving poorly structured problems. THE FUTURE OF GIS INTRODUCTION GIS originated in the mid 1960s a continuous history since then nevertheless, many see GIS as a phenomenon of the late 1980s major growth phase began in early 1980s due to combined effects of developments in software, cost- effectiveness of hardware expansion in the late 1980s has been fuelled by: continuing advances in computing technology increasing availability of major digital datasets, e.g. TIGER new application areas, e.g. political districting coalescence of existing application areas, e.g. specific CAD applications, AM/FM, automated mapping, spatial analysis how long can growth continue? will GIS interests continue to converge, or will splits develop? will GIS software converge on a standard, mature product or diverge into specialized markets? will the term "GIS" eventually disappear, or will associated symbols of maturity emerge - university programs, textbooks, magazines what will GIS look like in 10 years? 20 years? this unit has 3 parts: historical analogy to history of remote sensing discussion of convergence/divergence issues prospects for the future B. THE REMOTE SENSING ANALOGY Remote sensing as precursor to GIS major efforts began in late 1960s origins of GIS and remote sensing at similar point in time remote sensing well funded strong incentive to develop peaceful uses of space technology potential value of a tool for gathering geographical data quickly and cheaply remote sensing systems widely installed in universities, research organizations by late 1970s growth of remote sensing in 1960s and 1970s vastly outpaced growth in GIS GIS virtually unknown until early 1980s GIS often seen as add-on to remote sensing systems potential for sophisticated modeling and analysis ability to merge ancillary information to improve accuracy of classification of images three major lessons can be learned from remote sensing analogy: Need for formal theory danger that GIS will suffer in the same way as remote sensing from lack of formal theory underpinning use much work in remote sensing has been purely empirical, limited to specific times and places impossible to generalize many results to other places or times much work is on a project basis little addition to general pool of knowledge strong theoretical framework would be basis for greater generality difficult to generalize results from one satellite/sensor to another much basic work must be repeated for every new satellite/sensor effects of scale are poorly understood results in unintentional "ecological fallacy" - falsely imputing results from one scale of analysis to another e.g. in US plains states, correlation may exist between % of area covered by structures and % tree cover at spatial resolutions down to approx. 200 m, but not below - trees and buildings do not generally occupy the same locations analysis of remote sensing data has not benefited from clear understanding of spatial effects e.g. effects of spatial dependence on statistical significance - frequently lead to overstating true significance many analyses treat each pixel as an independent observation, ignore spatial context the level of theoretical development in these areas is much higher in 1980s possibility that GIS can avoid some of these mistakes however GIS designers operating in the commercial sector are often not aware of problems, available theory will require close liaison between basic research and GIS design Excessive expectations early promise of remote sensing was high e.g. possibility of remote monitoring of agricultural production, forest harvesting in practice, numerous problems degrade accuracy of classification seasonal, diurnal changes in spectral response effects of moisture continuing need for basic research few examples of production applications - i.e. where a standard product can be developed using a standard processing method post-war Western society has been fascinated with technological solutions to problems remote sensing and GIS are particularly attractive, combining high technology with color graphics difficulty of defining adequate cost/benefit measures at the same time, technological change can be opposed by unconvincibles, confirmed nay-sayers, Luddites technological innovation can produce strong emotions on both sides which confuse rational arguments Potential for new paradigms many have expected remote sensing to produce fundamental changes in the ways people think about geographical information however, even today the magnitudes of its future effects on affected natural sciences is not clear much research still remains to be done minimal position: after 17 years of Landsat, remote sensing is here to stay and cannot be ignored maximal position: remote sensing is significant factor in emergence of Global Science, major technology of Global Monitoring view from space has played major role in encouraging view of planet as an integrated system situation in GIS has similarities: just as remote sensing led to global view, GIS can lead to integrated view - need to integrate many layers of spatial information - need to couple human and physical systems e.g. need to couple human occupation, settlement processes with effects on deforestation and CO2 increase Technical advances both GIS and remote sensing have benefited from developments in: workstation power - PCs, file servers, mass storage availability of data, software through networking many vendors now offer the capability of integrating both technologies in the same workstation much research and development in remote sensing occurred in government laboratories - NASA, etc. - funded by government NASA also major source of funds for university R&D GIS context is very different level of public funding of GIS R&D has never been high GIS R&D has been funded by vendors, driven by strong market forces market forces are not necessarily consistent with needs of scientific research C. CONVERGENCE OR DIVERGENCE? GIS is a loose collection of interests how strong are the linkages between the subcultures of GIS (units 51-56)? are they strong enough for continued convergence? several views of possible divergences in GIS: GIS subcultures each of the groups identified in units 51-56 has its own tribal customs, ways of thinking the ties which currently bind the subcultures - e.g. allow AM/FM people to talk a common language with forest managers - may weaken current "glue" is common technology, terminology Marketplace is specialization emerging in the fast-moving PC GIS market? possible classification of current products: desktop mapping - produce simple thematic maps from input data spatial analysis systems - emphasize ability to overlay, combine layers, build buffers database systems - combine databases with limited geographical functions, e.g. display, data input geographical spreadsheets - generalize the concept of spreadsheets by adding geographical functions, e.g. ability to merge two adjacent areas or two rows of a spreadsheet into one area or one row, for e.g. political districting applications query systems - provide access to e.g. TIGER files, limited ability for geocoding, querying, finding optimum routes for vehicles image processing systems - built to process remotely sensed imagery, now with added GIS functions for data integration are there submarkets within GIS? resource management applications need high functionality AM/FM applications need high data volumes, access speeds vendors will pursue the most lucrative submarket two alternative strategies for vendors: build a product to satisfy a common denominator market - product can then withstand shifts in the market adapt to the most lucrative submarket - long-term survival requires new adaptation with every shift in the market What does convergence require? institutions and symbols to provide focus e.g. programs, departments, societies, journals, magazines, books, conferences education and training to raise awareness of GIS technology and its applications a market strong enough to support continued vendor R&D, or its replacement by government R&D technology which can simultaneously deliver the requirements of each submarket e.g. must be possible to deliver high functionality required by one submarket without detracting from high access speed required by another submarket in an operating system context this is the idea of "tuning" - one common operating system can satisfy many specialized computing environments D. PROSPECTS FOR THE FUTURE several different "visions" for GIS Automated geography e.g. see Dobson (1983) almost all forms of use of geographical data can now be automated maps and atlases can be queried geographical information can be analyzed, used in models we can use digital spatial data for specific purposes or to develop general theories geographical information becomes much more powerful in a digital environment, e.g. overlay and integration measurement and simple map analysis seamless browse some have even envisioned "the death of cartography" - the "paperless map library" - along similar lines to the "paperless office" Don Cooke (Geographic Data Technologies, Inc) sees three stages in this process: 1. automating the cartographic process the objective is still to produce maps 2. the map as database the digital database becomes the archive, with the map as the major product 3. using the map database recognizing the far greater potential of data in digital form - new products, models, analysis - with the map playing a minor role as one form of hard copy display However: geographical information is used infrequently compared to text or numerical information people use maps only in certain limited contexts effective use of spatial information requires much higher levels of training than e.g. word processing e.g. the DIDS system - developed within the Executive Office of the President to display geographical information for decision-making - was discontinued in 1983 because of inadequate use but the potential of automated geography may lead to much greater levels of use - people might use geographical data more frequently if they had better access to it, and if it was easier to use Spatial information science GIS and its allied fields, e.g. remote sensing, add up to the makings of a science of spatial information, which would include: data collection - e.g. remote sensing, surveying, photogrammetry - data compilation - classification, interpretation, cartography data models - data structures, theories of spatial information data display - cartography, computer graphics navigation, spatial information query and access spatial analysis and modeling spatial information is sufficiently distinct, theory and problems are sufficiently basic and difficult to justify unique identity, status of minor discipline or subdiscipline Spatial processes space provides a framework within which to organize objects frame is useful for accessing records, e.g. by street address frame is useful for accounting, e.g. totals by county frame is basis for relating objects, e.g. by proximity, adjacency, connectedness what role does space have as a source of explanation and understanding? spatial coincidence or proximity may suggest explanation, e.g. coincidence of cancer cluster and asbestos mining operation spatial proximity may be basis for prediction, e.g. more customers will go to closer store spatial accounting is used as basis for much analysis, e.g. county-to-county variations in employment, health statistics many processes operate in spatial frames, e.g. atmospheric, ocean dynamics measures of space are variables in many processes, e.g. measures of territory in ecology, measures of market area in retailing significance of GIS as a scientific tool - its value in explaining, understanding the world around us - depends on significance of spatial processes REFERENCES Dobson, J.E., 1983. "Automated geography," Professional Geographer 35:135-43. Pages 339-53 of the same volume include extensive discussion of Dobson''s article. Everett, J.E. and D.S. Simonett, 1976. "Principles, concepts and philosophical problems in remote sensing," in J. Lintz and D.S. Simonett, Editors, Remote Sensing of Environment, Addison-Wesley, Reading, MA, pp 85-127. A review of remote sensing from the mid-1970s with striking parallels with current debates within GIS. WHAT IS GIS? A. INTRODUCTION Objectives of this unit to examine various definitions of GIS - what factors uniquely differentiate it from other forms of automatic geographical data handling? to determine origins of the field - how does GIS relate to other fields such as statistical analysis, remote sensing, computer cartography? to give a brief overview of the relevant application areas What is a GIS? a particular form of Information System applied to geographical data a System is a group of connected entities and activities which interact for a common purpose a car is a system in which all the components operate together to provide transportation an Information System is a set of processes, executed on raw data, to produce information which will be useful in decision-making a chain of steps leads from observation and collection of data through analysis an information system must have a full range of functions to achieve its purpose, including observation, measurement, description, explanation, forecasting, decision-making a Geographic Information System uses geographically referenced data as well as non-spatial data and includes operations which support spatial analysis in GIS, the common purpose is decision-making, for managing use of land, resources, transportation, retailing, oceans or any spatially distributed entities the connection between the elements of the system is geography, e.g. location, proximity, spatial distribution in this context GIS can be seen as a system of hardware, software and procedures designed to support the capture, management, manipulation, analysis, modeling and display of spatially-referenced data for solving complex planning and management problems although many other computer programs can use spatial data (e.g. AutoCAD and statistics packages), GISs include the additional ability to perform spatial operations Alternative names alternative names which people have used over the years illustrate the range of applications and emphasis Why is GIS important? "GIS technology is to geographical analysis what the microscope, the telescope, and computers have been to other sciences.... (It) could therefore be the catalyst needed to dissolve the regional-systematic and human- physical dichotomies that have long plagued geography" and other disciplines which use spatial information.1 GIS integrates spatial and other kinds of information within a single system - it offers a consistent framework for analyzing geographical data by putting maps and other kinds of spatial information into digital form, GIS allows us to manipulate and display geographical knowledge in new and exciting ways GIS makes connections between activities based on geographic proximity looking at data geographically can often suggest new insights, explanations these connections are often unrecognized without GIS, but can be vital to understanding and managing activities and resources e.g. we can link toxic waste records with school locations through geographic proximity GIS allows access to administrative records - property ownership, tax files, utility cables and pipes - via their geographical positions Why is GIS so hot? high level of interest in new developments in computing ____________________ 1Abler, R.F., 1988. "Awards, rewards and excellence: keeping geography alive and well," Professional Geographer, 40:135-40. GIS gives a "high tech" feel to geographic information maps are fascinating and so are maps in computers there is increasing interest in geography and geographic education GIS is an important tool in understanding and managing the environment Market value of GIS Fortune Magazine, April 24, 1989 published a major, general-interest article on the significance of GIS to business: GIS is described as a geographical equivalent of a spreadsheet, i.e. allows answers to "what if" questions with spatial dimensions an example of the value of GIS given in the article is the Potlatch Corporation, Idaho controls 600,000 ac of timberland in Idaho - 4,900 separate timber stands old method of inventory using hand-drawn maps meant that inventory was "hopelessly out of date" $180,000/year now being spent on GIS-based inventory "a bargain" GIS "gives Potlatch up-to-the-minute information on the status of timber.... A forest manager sitting at a terminal can check land ownership changes in a few minutes by zooming in on a map" $650,000 on hardware and software produces more than 27% annual return on investment GIS market Dataquest projected a market of $288 million in 1988, $590 million in 1992 for GIS, growing at 35% per year ESRI of Redlands, CA, developers of ARC/INFO, had 350 employees and sales of $40 million in 1988 and a reported 42% increase in sales in 1989 Intergraph had 1988 sales of $800 million in a more diverse but GIS-dominated market the 1989 edition of GIS Sourcebook listed over 60 different "GIS" programs (though not all of these have complete GIS functionality) and over 100 GIS consultants (US) B. CONTRIBUTING DISCIPLINES AND TECHNOLOGIES GIS is a convergence of technological fields and traditional disciplines GIS has been called an "enabling technology" because of the potential it offers for the wide variety of disciplines which must deal with spatial data each related field provides some of the techniques which make up GIS many of these related fields emphasize data collection - GIS brings them together by emphasizing integration, modeling and analysis as the integrating field, GIS often claims to be the science of spatial information Geography broadly concerned with understanding the world and man''s place in it long tradition in spatial analysis provides techniques for conducting spatial analysis and a spatial perspective on research Cartography concerned with the display of spatial information currently the main source of input data for GIS is maps provides long tradition in the design of maps which is an important form of output from GIS computer cartography (also called "digital cartography", "automated cartography") provides methods for digital representation and manipulation of cartographic features and methods of visualization Remote Sensing images from space and the air are major source of geographical data remote sensing includes techniques for data acquisition and processing anywhere on the globe at low cost, consistent update potential many image analysis systems contain sophisticated analytical functions interpreted data from a remote sensing system can be merged with other data layers in a GIS Photogrammetry using aerial photographs and techniques for making accurate measurements from them, photogrammetry is the source of most data on topography (ground surface elevations) used for input to GIS Surveying provides high quality data on positions of land boundaries, buildings, etc. Geodesy source of high accuracy positional control for GIS Statistics many models built using GIS are statistical in nature, many statistical techniques used for analysis statistics is important in understanding issues of error and uncertainty in GIS data Operations Research many applications of GIS require use of optimizing techniques for decision-making Computer Science computer-aided design (CAD) provides software, techniques for data input, display and visualization, representation, particularly in 3 dimensions advances in computer graphics provide hardware, software for handling and displaying graphic objects, techniques of visualization database management systems (DBMS) contribute methods for representing data in digital form, procedures for system design and handling large volumes of data, particularly access and update artificial intelligence (AI) uses the computer to make choices based on available data in a way that is seen to emulate human intelligence and decision-making - computer can act as an "expert" in such functions as designing maps, generalizing map features although GIS has yet to take full advantage of AI, AI already provides methods and techniques for system design Mathematics several branches of mathematics, especially geometry and graph theory, are used in GIS system design and analysis of spatial data Civil Engineering GIS has many applications in transportation, urban engineering C. MAJOR AREAS OF PRACTICAL APPLICATION Street network-based address matching - finding locations given street addresses vehicle routing and scheduling location analysis, site selection development of evacuation plans Natural resource-based management of wild and scenic rivers, recreation resources, floodplains, wetlands, agricultural lands, aquifers, forests, wildlife Environmental impact analysis (EIA) viewshed analysis hazardous or toxic facility siting groundwater modeling and contamination tracking wildlife habitat analysis, migration routes planning Land parcel-based zoning, subdivision plan review land acquisition environmental impact statements water quality management maintenance of ownership Facilities management locating underground pipes, cables balancing loads in electrical networks planning facility maintenance tracking energy use D. GIS AS A SET OF INTERRELATED SUBSYSTEMS Data Processing Subsystem data acquisition - from maps, images or field surveys data input - data must be input from source material to the digital database data storage - how often is it used, how should it be updated, is it confidential? Data Analysis Subsystem retrieval and analysis - may be simple responses to queries, or complex statistical analyses of large sets of data information output - how to display the results? as maps or tables? Or will the information be fed into some other digital system? Information Use Subsystem users may be researchers, planners, managers interaction needed between GIS group and users to plan analytical procedures and data structures Management Subsystem organizational role - GIS section is often organized as a separate unit within a resource management agency (cf. the Computer Center at many universities) offering spatial database and analysis services staff - include System Manager, Database Manager, System Operator, System Analysts, Digitizer Operators - a typical resource management agency GIS center might have a staff of 5-7 procedures - extensive interaction is needed between the GIS group and the rest of the organization if the system is to function effectively MAPS AND MAP ANALYSIS A. INTRODUCTION maps are the main source of data for GIS the traditions of cartography are fundamentally important to GIS GIS has roots in the analysis of information on maps, and overcomes many of the limitations of manual analysis this unit is about cartography and its relationship to GIS - how does GIS differ from cartography, particularly automated cartography, which uses computers to make maps? B. WHAT IS A MAP? Definition according to the International Cartographic Association, a map is: a representation, normally to scale and on a flat medium, of a selection of material or abstract features on, or in relation to, the surface of the Earth Maps show more than the Earth''s surface the term "map" is often used in mathematics to convey the notion of transferring information from one form to another, just as cartographers transfer information from the surface of the Earth to a sheet of paper the term "map" is used loosely to refer to any visual display of information, particularly if it is abstract, generalized or schematic Cartographic abstraction production of a map requires: selection of the few features in the real world to include classification of selected features into groups (i.e. bridges, churches, railways) simplification of jagged lines like coastlines exaggeration of features to be included that are to small to show at the scale of the map symbolization to represent the different classes of features chosen Types of maps in practice we normally think of two types of map: topographic map - a reference tool, showing the outlines of selected natural and man-made features of the Earth often acts as a frame for other information "Topography" refers to the shape of the surface, represented by contours and/or shading, but topographic maps also show roads and other prominent features thematic map - a tool to communicate geographical concepts such as the distribution of population densities, climate, movement of goods, land use etc. Thematic maps in GIS several types of thematic map are important in GIS: a choropleth map uses reporting zones such as counties or census tracts to show data such as average incomes, percent female, or rates of mortality the boundaries of the zones are established independently of the data, and may be used to report many different sets of data an area class map shows zones of constant attributes, such as vegetation, soil type, or forest species the boundaries are different for each map as they are determined by the variation of the attribute being mapped, e.g. breaks of soil type may occur independently of breaks of vegetation an isopleth map shows an imaginary surface by means of lines joining points of equal value, "isolines" (e.g. contours on a topographic map) used for phenomena which vary smoothly across the map, such as temperature, pressure, rainfall or population density Line maps versus photo maps an important distinction for GIS is between a line map and a photo map a line map shows features by conventional symbols or by boundaries a photo map is derived from a photographic image taken from the air features are interpreted by the eye as it views the map certain features may be identified by overprinting labels photomaps are relatively cheap to make but are rarely completely free of distortions Characteristics of maps maps are often stylized, generalized or abstracted, requiring careful interpretation usually out of date show only a static situation - one slice in time often highly elegant/artistic easy to use to answer certain types of questions: how do I get there from here? what is at this point? difficult or time-consuming to answer other types: what is the area of this lake? what places can I see from this TV tower? what does that thematic map show at the point I''m interested in on this topographic map? The concept of scale the scale of a map is the ratio between distances on the map and corresponding distances in the real world if a map has a scale of 1:50,000, then 1 cm on the map equals 50,000 cm or 0.5 km on the Earth''s surface the use of the terms "small scale" and "large scale" is often confused, so it is important to be consistent a large scale map shows great detail, small features representative fraction is large, e.g. 1/10,000 a small scale map shows only large features representative fraction is small, e.g. 1/250,000 the scale controls not only how features are shown, but what features are shown a 1:2,500 map will show individual houses and lamp posts while a 1:100,000 will not different scales are used in different countries in the US, 1:100,000 is the largest scale at which complete coverage of the continental states exists, but there is limited coverage at 1:62,500 and 1:24,000 in the UK, there is complete coverage at much larger scales (1:1,250 to 1:10,000) Map projections the Earth''s surface is curved but as it must be shown on a flat sheet, some distortion is inevitable distortion is least for when the map only shows small areas, and greatest when a map attempts to show the entire surface of the Earth a projection is a method by which the curved surface of the earth is represented on a flat surface it involves the use of mathematical transformations between the location of places on the earth and their projected locations on the plane numerous projections have been invented, and arguments continue about which is best for which purposes projections can be identified by the distortions which they avoid - in general a projection can belong to only one of these classes: equal area projections preserve the area of features by assigning them an area on the map which is proportional to their area on the earth - these are useful for applications which require measuring area, and are popular in GIS conformal projections preserve the shape of small features, and show directions (bearings) correctly - they are useful for navigation equidistant projections preserve distances to places from one or two points C. WHAT ARE MAPS USED FOR? traditionally, maps are used as aids to navigation, as reference documents, and as wall decorations maps have four roles today: Data display maps provide useful ways of displaying information in a meaningful way in practice, the cost of making and printing a map is high, so its contents are often a compromise between different needs Data stores as a means of storing data, maps can be very efficient, high density stores a typical 1:50,000 map might have 1,000 place names on it the distances between all possible pairs of these 1,000 places would run to (1,000 x 999 / 2) or 499,500 numbers if stored in a table instead of scaled off the map when needed the information printed on the typical 1:50,000 topographic map sheet in the UK requires 25 million bytes of storage when it is converted to digital form, equivalent to one standard computer tape, or 10 full-length novels the information on all British topographic maps would require 150 gigabytes (150x109 bytes) Spatial indexes a map can show the boundaries of areas (e.g. land use zones, soil or rock types) and identify each area with a label a separate manual with corresponding entries may provide greater detail about each area Data analysis tool maps are used in analysis to: make or test hypotheses, such as the identification of cancer clusters examine the relationship between two distributions using simple transparent overlays D. THE USE OF MAPS FOR INVENTORY AND ANALYSIS the following examples demonstrate how maps have been used for sophisticated applications in inventory and analysis, and point out some limitations Measuring land use change example, two major land use surveys were carried out in the UK, in the late 1930s by Sir Dudley Stamp and in the 1960s by Professor Alice Coleman the results were published as maps in order to compare changes in land use between 1930s and 1960s, the area of each land use type was measured using a hand planimeter and counting overlaid dots despite interest in measuring the amount of change of land use through time, particularly from agricultural to urban, few results were produced using this method because the traditional techniques are slow and tedious, and because of the difficulty of overlaying or working from very different map sources Landscape architecture Ian McHarg pioneered the use of transparent map overlays for planning locations of highways, transmission corridors and other facilities in environmentally sensitive areas (McHarg, 1969) despite the popularity of this technique and numerous applications, this method remains cumbersome and imprecise E. AUTOMATED AND COMPUTER-ASSISTED CARTOGRAPHY Changeover to computer mapping personalities were critically important in the 1960s and early 1970s - individual interests determined the direction and focus of research and development in computer cartography (see Rhind, 1988) impetus for change began in two communities: 1. scientists wishing to make maps quickly to see the results of modeling, or to display data from large archives already in digital form, e.g. census tables quality was not a major concern SYMAP was the first significant package for this purpose, released by the Harvard Lab in 1967 2. cartographers seeking to reduce the cost and time of map production and editing hardware costs limited interest in this technology prior to 1980 to the major mapping agencies the costs of computing have dropped dramatically, by an order of magnitude every six years what costs $1 to compute in 1989 would have cost $10 in 1983 and $100,000 in 1959 the development of the microcomputer and the launch of the IBM PC in 1983 have had enormous influence an early belief that the entire map-making process could be automated diminished by 1975 because of difficulties of generalization and design has resurfaced in the context of Expert Systems where the computer chooses the proper techniques based on characteristics of the data, scale, map purpose, etc. today, far more maps are made by computer than by hand now few mapmakers are trained cartographers also, it is now clear that once created, digital data can serve purposes other than map-making, so it has additional value Advantages of computer cartography lower cost for simple maps, faster production greater flexibility in output - easy scale or projection change - maps can be tailored to user needs other uses for digital data Disadvantages of computer cartography relatively few full-scale systems have been shown to be truly cost-effective in practice, despite early promise high capital cost, though this is now much reduced computer methods do not ensure production of maps of high quality there is a perceived loss of regard for the "cartographic tradition" with the consequent production of "cartojunk" GIS and Computer Cartography computer cartography has a primary goal of producing maps systems have advanced tools for map layout, placement of labels, large symbol and font libraries, interfaces for expensive, high quality output devices however, it is not an analytical tool therefore, unlike data for GIS, cartographic data does not need to be stored in ways which allow, for example, analysis of relationships between different themes such as population density and housing prices or the routing of flows along connecting highway or river segments F. GIS COMPARED TO MAPS Data stores spatial data stored in digital format in a GIS allows for rapid access for traditional as well as innovative purposes nature of maps creates difficulties when used as sources for digital data most GIS take no account of differences between datasets derived from maps at different scales idiosyncrasies (e.g. generalization procedures) in maps become "locked in" to the data derived from them such errors often become apparent only during later processing of digital data derived from them however, maps still remain an excellent way of compiling spatial information, e.g. field survey maps can be designed to be easy to convert to digital form, e.g. by the use of different colors which have distinct signatures when scanned by electronic sensors as well maps can be produced by GISs as cheap, high density stores of information for the end user however, consistent, accurate retrieval of data from maps is difficult only limited amounts of data can be shown due to constraints of the paper medium Data indexes this function can be performed much better by a good GIS due to the ability to provide multiple and efficient cross-referencing and searching Data analysis tools GIS is a powerful tool for map analysis traditional impediments to the accurate and rapid measurement of area or to map overlay no longer exist many new techniques in spatial analysis are becoming available Data display tools electronic display offers significant advantages over the paper map ability to browse across an area without interruption by map sheet boundaries ability to zoom and change scale freely potential for the animation of time dependent data display in "3 dimensions" (perspective views), with "real-time" rotation of viewing angle potential for continuous scales of intensity and the use of color and shading independent of the constraints of the printing process, ability to change colors as required for interpretation one of a kind, special purpose products are possible and inexpensive THE RASTER GIS A. THE DATA MODEL B. CREATING A RASTER Cell by cell entry Digital data C. CELL VALUES Types of values One value per cell D. MAP LAYERS Resolution Orientation Zones Value Location E. EXAMPLE ANALYSIS USING A RASTER GIS Objective Procedure Result Operations used REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES Although most of the material in this Curriculum is designed to be as independent as possible from specific data models, it is necessary to deal with this basic concept early so that students can start hands-on exercises with a GIS program. Following Unit 5, we return to the more fundamental concepts and do not address specific vector GIS issues until Units 13 and 14. There are other several places these topics could be placed in a course sequence. We have tried to make Units 4 and 5 as independent as possible so that you can move them within the Curriculum relatively easily. UNIT 4 - THE RASTER GIS Compiled with assistance from Dana Tomlin, The Ohio State University A. THE DATA MODEL geographical variation in the real world is infinitely complex the closer you look, the more detail you see, almost without limit it would take an infinitely large database to capture the real world precisely data must somehow be reduced to a finite and manageable quantity by a process of generalization or abstraction geographical variation must be represented in terms of discrete elements or objects the rules used to convert real geographical variation into discrete objects is the data model Tsichritzis and Lochovsky (1977) define a data model as "a set of guidelines for the representation of the logical organization of the data in a database... (consisting) of named logical units of data and the relationships between them."1 current GISs differ according the way in which they organize reality through the data model each model tends to fit certain types of data and applications better than others the data model chosen for a particular project or application is also influenced by: the software available the training of the key individuals historical precedent there are two major choices of data model - raster and vector raster model divides the entire study area into a regular grid of cells in specific sequence the conventional sequence is row by row from the top left corner each cell contains a single value ____________________ 1Tsichritzis, T.C., and F.H. Lochovsky, 1977. Data Base Management Systems, Academic Press, New York. is space-filling since every location in the study area corresponds to a cell in the raster one set of cells and associated values is a layer there may be many layers in a database, e.g. soil type, elevation, land use, land cover vector model uses discrete line segments or points to identify locations discrete objects (boundaries, streams, cities) are formed by connecting line segments vector objects do not necessarily fill space, not all locations in space need to be referenced in the model a raster model tells what occurs everywhere - at each place in the area a vector model tells where everything occurs - gives a location to every object conceptually, the raster models are the simplest of the available data models therefore, we begin our examination of GIS data and operations with the raster model and will consider vector models after the fundamental concepts have been introduced. B. CREATING A RASTER consider laying a grid over a geologic map create a raster by coding each cell with a value that represents the rock type which appears in the majority of that cells areas when finished, every cell will have a coded value in most cases the values that are to be assigned to each cell in the raster are written into a file, often coded in ASCII this file can be created manually by using a word processor, database or spreadsheet program or it can be created automatically then it is normally imported into the GIS so that the program can reformat the data for its specific processing needs there are several methods for creating raster databases Cell by cell entry direct entry of each layer cell by cell is simplest entry may be done within the GIS or into an ASCII file for importing each program will have specific requirements the process is normally tedious and time-consuming layer can contain millions of cells average Landsat image is around 7.4 x 106 pixels, average TM scene is about 34.9 x 106 pixels run length encoding can be more efficient values often occur in runs across several cells this is a form of spatial autocorrelation - tendency for nearby things to be more similar than distant things data entered as pairs, first run length, then value e.g. the array 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 0 1 1 1 1 would be entered as 3 0 2 1 2 0 3 1 2 0 3 1 1 0 4 1 this is 16 items to enter, instead of 20 in this case the saving is 20%, but much higher savings occur in practice imagine a database of 10,000,000 cells and a layer which records the county containing each pixel suppose there are only two counties in the area covered by the database each cell can have one of only two values so the runs will be very long only some GISs have the capability to use run length encoded files note: Units 35 and 36 cover run length encoding and other aspects of raster storage in more detail Digital data much raster data is already in digital form, as images, etc. however, resampling will likely be needed in order that pixels coincide in each layer because remote sensing generates images, it is easier to interface with a raster GIS than any other type elevation data is commonly available in digital raster form from agencies such as the US Geological Survey THE RASTER GIS . CELL VALUES Types of values the type of values contained in cells in a raster depend upon both the reality being coded and the GIS different systems allow different classes of values, including: whole numbers (integers) real (decimal) values alphabetic values many systems only allow integers, others which allow different types restrict each separate raster layer to a single kind of value if systems allow several types of values, e.g. some layers numeric, some non-numeric, they should warn the user against doing unreasonable operations e.g. it is unreasonable to try to multiply the values in a numeric layer with the values in a non- numeric layer integer values often act as code numbers, which "point" to names in an associated table or legend e.g. the first example might have the following legend identifying the name of each soil class: 0 = "no class" 1 = "fine sandy loam" 2 = "coarse sand" 3 = "gravel" One value per cell each pixel or cell is assumed to have only one value this is often inaccurate - the boundary of two soil types may run across the middle of a pixel in such cases the pixel is given the value of the largest fraction of the cell, or the value of the middle point in the cell note, however, a few systems allow a pixel to have multiple values the NARIS system developed at the University of Illinois in the 1970s allowed each pixel to have any number of values and associated percentages e.g. 30% a, 30% b, 40% c D. MAP LAYERS the data for an area can be visualized as a set of maps of layers a map layer is a set of data describing a single characteristic for each location within a bounded geographic area only one item of information is available for each location within a single layer - multiple items of information require multiple layers on the other hand, a topographic map can show multiple items of information for each location, within limits e.g. elevation (contours), counties (boundaries), roads, railroads, urbanized areas (grey tint) these would be 5 layers in a raster GIS typical raster databases contain up to a hundred layers each layer (matrix, lattice, raster, array) typically contains hundreds or thousands of cells important characteristics of a layer are its resolution, orientation and zone(s) Resolution in general, resolution can be defined as the minimum linear dimension of the smallest unit of geographic space for which data are recorded in the raster model the smallest units are generally rectangular (occasionally systems have used hexagons or triangles) these smallest units are known as cells, pixels note: high resolution refers to rasters with small cell dimensions high resolution means lots of detail, lots of cells, large rasters, small cells Orientation the angle between true north and the direction defined by the columns of the raster Zones each zone of a map layer is a set of contiguous locations that exhibit the same value these might be: ownership parcels political units such as counties or nations lakes or islands individual patches of the same soil or vegetation type there is considerable confusion over terms here other terms commonly used for this concept are patch, region, polygon each of these terms, however, have different meanings to individual users and different definitions in specific GIS packages in addition, there is a need for a second term which refers to all individual zones that have the same characteristics class is often used for this concept note that not all map layers will have zones, cell contents may vary continuously over the region making every cell''s value unique e.g. satellite sensors record a separate value for reflection from each cell major components of a zone are its value and location(s) Value is the item of information stored in a layer for each pixel or cell cells in the same zone have the same value Location generally location is identified by an ordered pair of coordinates (row and column numbers) that unambiguously identify the location of each unit of geographic space in the raster (cell, pixel, grid cell) usually the true geographic location of one or more of the corners of the raster is also known E. EXAMPLE ANALYSIS USING A RASTER GIS Objective identify areas suitable for logging an area is suitable if it satisfies the following criteria: is Jackpine (Black Spruce are not valuable) is well drained (poorly drained and waterlogged terrain cannot support equipment, logging causes unacceptable environmental damage) is not within 500 m of a lake or watercourse (erosion may cause deterioration of water quality) Procedure recode layer 2 as follows, creating layer 4 y if value 2 (Jackpine) n if other value recode layer 3 as follows, creating layer 5 y if value 2 (good) n if other value spread the lake on layer 1 by one cell (500 m), creating layer 6 recode the spread lake on layer 6 as follows, creating layer 7 n if in spread lake y if not overlay layers 4 and 5 to obtain layer 8, coding as follows y if both 4 and 5 are y n otherwise overlay layers 7 and 8 to obtain layer 9, coding as follows y if both 7 and 8 are y n otherwise Result the loggable cells are y on layer 9 Operations used recode overlay spread we could have achieved the same result using the operations in other sequences, or by combining recode and overlay operations e.g. overlay layers 2 and 3, coding as follows y if layer 2 is 2 and layer 3 is 2, n otherwise this would replace two recodes and an overlay e.g. some systems allow layers to be overlaid 3 or more at a time the names given to operations vary from system to system, but most of the operations themselves are common across systems REFERENCES Star, J.L. and J.E. Estes, 1990. Geographic Information Systems: An Introduction, Prentice Hall, Englewood Cliffs, NJ. An introduction to GIS with a strong raster orientation. Further references can be found following Unit 5. EXAM AND DISCUSSION QUESTIONS 1. What types of geographical data fit the raster GIS data model best? What types fit worst? 2. Review the issues involved in selecting a resolution for a raster GIS project. 3. What resolutions would be appropriate for the following problems: (a) determining logging areas in a National Forest, (b) finding suitable locations for backcountry campsites, (c) planning subdivisions to take account of noise from an airport? 4. Review the methods of planning described in Ian McHarg''s classic book Design with Nature (1969, Doubleday, New York). In what ways would they (a) benefit and (b) suffer from implementation using raster GIS? 5. Using the documentation for the raster GIS program you have, determine how that program uses (a) the concept of "zone" as a contiguous group of cells of the same value, and (b) the concept of several groups of cells that all have the same value. Is there any ambiguity in the way your program deals with these two concepts? RASTER GIS CAPABILITIES INTRODUCTION a raster GIS must have capabilities for: input of data various housekeeping functions operations on layers, like those encountered in the previous unit - recode, overlay and spread output of data and results the range of possible functions is enormous, current raster GISs only scratch the surface because the range is so large, some have tried to organize functions into a consistent scheme, but no scheme has been widely accepted yet the unit covers a selection of the most useful and common each raster GIS uses different names for the functions B. DISPLAYING LAYERS Basic display the simplest type of values to display are integers on a color display each integer value can be assigned a unique color there must be as many colors as integers if the values have a natural order we will want the sequence of colors to make sense e.g. elevation is often shown on a map using the sequence blue-green-yellow-brown-white for increasing elevation there should be a legend explaining the meaning of each color the system should generate the legend automatically based on the descriptions of each value stored with the data layer overhead - Simple display (IDRISI) on a dot matrix printer shades of grey can be generated by varying the density of dots if there are too many values for the number of colors, may have to recode the layer before display Other types of display it may be appropriate to display the data as a surface contours can be "threaded" through the pixels along lines of constant value the searching operation for finding contours is computer-intensive so may be slow the surface can be shown in an oblique, perspective view this can be done by drawing profiles across the raster with each profile offset and hidden lines removed the surface might be colored using the values in a second layer (a second layer can be "draped" over the surface defined by the first layer) the result can be very effective "LA The Movie" was produced by Jet Propulsion Lab by draping a Landsat image of Los Angeles over a layer of elevations, then simulating the view from a moving aircraft these operations are also computer-intensive because of the calculations necessary to simulate perspective and remove hidden lines C. LOCAL OPERATIONS produce a new layer from one or more input layers the value of each new pixel is defined by the values of the same pixel on the input layer(s) neighboring or distant pixels have no effect note: arithmetic operations make no sense unless the values have appropriate scales of measurement (see Unit 6) you cannot find the "average" of soils types 3 and 5, nor is soil 5 "greater than" soil 3 Recoding using only one input layer examples: 1. assign a new value to each unique value on the input layer useful when the number of unique input values is small 2. assign new values by assigning pixels to classes or ranges based on their old values e.g. 0-499 becomes 1, 500-999 becomes 2, >1000 becomes 3 useful when the old layer has different values in each cell, e.g. elevation or satellite images 3. sort the unique values found on the input layer and replace by the rank of the value e.g. 0, 1, 4, 6 on input layer become 1, 2, 3, 4 respectively applications: assigning ranks to computed scores of capability, suitability etc. some systems allow a full range of mathematical operations e.g. newvalue = (2*oldvalue + 3)2 Overlaying layers an overlay occurs when the output value depends on two or more input layers many systems restrict overlay to two input layers only examples: 1. output value equals arithmetic average of input values 2. output value equals the greatest (or least) of the input values 3. layers can be combined using arithmetic operations x and y are the input layers, z is the output some examples: z = x + y z = xy z = x / y 4. combination using logical conditions e.g. if y>0, then z = y , otherwise z = x note: in many raster packages logical conditions cannot be done directly from input layers must first create reclassified input images so that cells have 0 if they do not meet the condition and 1 if they do 5. assign a new value to every unique combination of input values e.g.LAYER 1 LAYER 2 OUTPUT LAYER 1 A 1 1 B 2 2 A 3 2 B 4 D. OPERATIONS ON LOCAL NEIGHBORHOODS the value of a pixel on the new layer is determined by the local neighborhood of the pixel on the old layer Filtering a filter operates by moving a "window" across the entire raster e.g. many windows are 3x3 cells the new value for the cell at the middle of the window is a weighted average of the values in the window by changing the weights we can produce two major effects: smoothing (a "low pass" filter, removes or reduces local detail) edge enhancement (a "high pass" filter, exaggerates local detail) weights should add to 1 example filters: 1. .11 .11 .11 .11 .11 .11 .11 .11 .11 replaces each value by the simple unweighted average of it and its eight neighboring values severely smooths the spatial variation on the layer 2. .05 .05 .05 .05 .60 .05 .05 .05 .05 gives the pixel''s old value 12 times the weight of its neighboring values slightly smooths the layer 3. -.1 -.1 -.1 -.1 1.8 -.1 -.1 -.1 -.1 slightly enhances local detail by giving neighbors negative weights filters can be useful in enhancing detail on images for input to GIS, or smoothing layers to expose general trends Slopes and aspects if the values in a layer are elevations, we can compute the steepness of slopes by looking at the difference between a pixel''s value and those of its adjacent neighbors the direction of steepest slope, or the direction in which the surface is locally "facing", is called its aspect aspect can be measured in degrees from North or by compass points - N, NE, E etc. slope and aspect are useful in analyzing vegetation patterns, computing energy balances and modeling erosion or runoff aspect determines the direction of runoff this can be used to sketch drainage paths for runoff E. OPERATIONS ON EXTENDED NEIGHBORHOODS Distance calculate the distance of each cell from a cell or the nearest of several cells each pixel''s value in the new layer is its distance from the given cell(s) Buffer zones buffers around objects and features are very useful GIS capabilities e.g. build a logging buffer 500 m wide around all lakes and watercourses buffer operations can be visualized as spreading the object spatially by a given distance the result could be a layer with values: 1 if in original selected object 2 if in buffer 0 if outside object and buffer applications include noise buffers around roads, safety buffers around hazardous facilities in many programs the buffer operation requires the user to first do a distance operation, then a reclassification of the distance layer the rate of spreading may be modified by another layer representing "friction" e.g. the friction layer could represent varying cost of travel this will affect the width of the buffer - narrow in areas of high friction, etc. Visible area or "viewshed" given a layer of elevations, and one or more viewpoints, compute the area visible from at least one viewpoint e.g. value = 1 if visible, 0 if not useful for planning locations of unsightly facilities such as smokestacks, or surveillance facilities such as fire towers, or transmission facilities F. OPERATIONS ON ZONES (GROUPS OF PIXELS) Identifying zones by comparing adjacent pixels, identify all patches or zones having the same value give each such patch or zone a unique number set each pixel''s value to the number of its patch or zone Areas of zones measure the area of each zone and assign this value to each pixel instead of the zone''s number alternatively output may be in the form of a summary table sent to the printer or a file Perimeter of zones measure the perimeter of each zone and assign this value to each pixel instead of the zone''s number alternatively output may be in the form of a summary table sent to the printer or a file length of perimeter is determined by summing the number of exterior cell edges in each zone note: the values calculated in both area and perimeter are highly dependent upon the orientation of objects (zones) with respect to the orientation of the grid however, if boundaries in the study area do not have a dominant orientation such errors may cancel out Distance from zone boundary measure the distance from each pixel to the nearest part of its zone boundary, and assign this value to the pixel boundary is defined as the pixels which are adjacent to pixels of different values Shape of zone measure the shape of the zone and assign this to each pixel in the zone one of the most common ways to measure shape is by comparing the perimeter length of a zone to the square root of its area by dividing this number by 3.54 we get a measure which ranges from 1 for a circle (the most compact shape possible) to 1.13 for a square to large numbers for long, thin, wiggly zones commands like this are important in landscape ecology helpful in studying the effects of geometry and spatial arrangement of habitat e.g. size and shape of woodlots on the animal species they can sustain e.g. value of linear park corridors across urban areas in allowing migration of animal species G. COMMANDS TO DESCRIBE CONTENTS OF LAYERS it is important to have ways of describing a layer''s contents particularly new layers created by GIS operations particularly in generating results of analysis One layer generate statistics on a layer e.g. mean, median, most common value, other statistics More than one layer compare two maps statistically e.g. is pattern on one map related to pattern on the other? e.g. chi-square test, regression, analysis of variance Zones on one layer generate statistics for the zones on a layer e.g. largest, smallest, number, mean area H. ESSENTIAL HOUSEKEEPING list available layers input, copy, rename layers import and export layers to and from other systems other raster GIS input of images from remote sensing system other types of GIS identify resolution, orientation "resample" changing cell size, orientation, portion of raster to analyze change colors provide help to the user exit from the GIS (the most important command of all!) SAMPLING THE WORLD A. INTRODUCTION B. REPRESENTING REALITY Continuous variation C. SPATIAL DATA Location Attributes Time D. SAMPLING REALITY Scales of measurement 1. Nominal 2. Ordinal 3. Interval 4. Ratio Multiple representations E. DATA SOURCES Primary data collection Secondary data sources F. STANDARDS Sharing data Agency standards G. ERRORS AND ACCURACY Original Sin - errors in sources Boundaries Classification errors Data capture errors Accuracy standards REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES This unit begins the section on data acquisition by looking at how the infinite complexity of the real world can be discretized and sampled. UNIT 6 - SAMPLING THE WORLD Compiled with assistance from Charles Parson, Bemidji State University and Timothy Nyerges, University of Washington A. INTRODUCTION the world is infinitely complex the contents of a spatial database represent a particular view of the world the user sees the real world through the medium of the database the measurements and samples contained in the database must present as complete and accurate a view of the world as possible the contents of the database must be relevant in terms of: themes and characteristics captured the time period covered the study area this unit looks at techniques for sampling the world, and associated issues of accuracy, standards B. REPRESENTING REALITY a database consists of digital representations of discrete objects the features shown on a map, e.g. lakes, benchmarks, contours can be thought of as discrete objects thus the contents of a map can be captured in a database by turning map features into database objects many of the features shown on a map are fictitious and do not exist in the real world contours do not really exist, but houses and lakes are real objects the contents of a spatial database include: digital versions of real objects, e.g. houses digital versions of artificial map features, e.g. contours artificial objects created for the purposes of the database, e.g. pixels Continuous variation some characteristics exist everywhere and vary continuously over the earth''s surface e.g. elevation, atmospheric temperature and pressure, natural vegetation or soil type we can represent such variation in several ways: by taking measurements at sample points, e.g. weather stations by taking transects by dividing the area into patches or zones, and assuming the variable is constant within each zone, e.g. soil mapping by drawing contours, e.g. topographic mapping each of these methods creates discrete objects the objects in each case are points, lines or areas a raster can be thought of as: a special case of a point sample where the points are regularly spaced a special case of zones where the zones are all the same size each method is approximate, capturing only part of the real variation a point sample misses variation between points transects miss variation not on transects zones pretend that variation is sudden at boundaries, and that there is no variation within zones contours miss variation not located on contours several methods can be used to try to improve the success of each method e.g. for zones: map the boundaries as fuzzy instead of sharp lines describe the zones as mixtures instead of as single classes, e.g. 70% soil type A, 30% soil type B C. SPATIAL DATA phenomena in the real world can be observed in three modes: spatial, temporal and thematic the spatial mode deals with variation from place to place the temporal mode deals with variation from time to time (one slice to another) the thematic mode deals with variation from one characteristic to another (one layer to another) all measurable or describable properties of the world can be considered to fall into one of these modes - place, time and theme an exhaustive description of all three modes is not possible when observing real-world phenomena we usually hold one mode fixed, vary one in a controlled" manner, and measure"the third (Sinton, 1978) e.g. using a census of population we could fix a time such as 1990, control for location using census tracts and measure a theme such as the percentage of persons owning automobiles holding geography fixed and varying time gives longitudinal data holding time fixed and varying geography gives cross- sectional data the modes of information stored in a database influence the types of problem solving that can be accomplished Location the spatial mode of information is generally called location Attributes attributes capture the thematic mode by defining different characteristics of objects a table showing the attributes of objects is called an attribute table each object corresponds to a row of the table each characteristic or theme corresponds to a column of the table thus the table shows the thematic and some of the spatial modes Time the temporal mode can be captured in several ways by specifying the interval of time over which an object exists by capturing information at certain points in time by specifying the rates of movement of objects depending on how the temporal mode is captured, it may be included in a single attribute table, or be represented by series of attribute tables on the same objects through time SAMPLING THE WORLD D. SAMPLING REALITY Scales of measurement numerical values may be defined with respect to nominal, ordinal, interval, or ratio scales of measurement it is important to recognize the scales of measurement used in GIS data as this determines the kinds of mathematical operations that can be performed on the data the different scales can be demonstrated using an example of a marathon race: 1. Nominal on a nominal scale, numbers merely establish identity e.g. a phone number signifies only the unique identity of the phone in the race, the numbers issued to racers which are used to identify individuals are on a nominal scale these identity numbers do not indicate any order or relative value in terms of the race outcome 2. Ordinal on an ordinal scale, numbers establish order only phone number 9618224 is not more of anything than 9618049, so phone numbers are not ordinal in the race, the finishing places of each racer, i.e. 1st place, 2nd place, 3rd place, are measured on an ordinal scale however, we do not know how much time difference there is between each racer 3. Interval on interval scales, the difference (interval) between numbers is meaningful, but the numbering scale does not start at 0 subtraction makes sense but division does not e.g. it makes sense to say that 200C is 10 degrees warmer than 100C, so Celsius temperature is an interval scale, but 200C is not twice as warm as 100C e.g. it makes no sense to say that the phone number 9680244 is 62195 more than 9618049, so phone numbers are not measurements on an interval scale in the race, the time of the day that each racer finished is measured on an interval scale if the racers finished at 9:10 GMT, 9:20 GMT and 9:25 GMT, then racer one finished 10 minutes before racer 2 and the difference between racers 1 and 2 is twice that of the difference between racers 2 and 3 however, the racer finishing at 9:10 GMT did not finish twice as fast as the racer finishing at 18:20 GMT 4. Ratio on a ratio scale, measurement has an absolute zero and the difference between numbers is significant division makes sense e.g. it makes sense to say that a 50 kg person weighs half as much as a 100 kg person, so weight in kg is on a ratio scale the zero point of weight is absolute but the zero point of the Celsius scale is not in our race, the first place finisher finished in a time of 2:30, the second in 2:40 and the 450th place finisher took 5 hours the 450th finisher took twice as long as the first place finisher (5/2.5 = 2) note these distinctions, though important, are not always clearly defined is elevation interval or ratio? if the local base level is 750 feet, is a mountain at 2000 feet twice as high as one at 1000 feet when viewed from the valley? many types of geographical data used in GIS applications are nominal or ordinal values establish the order of classes, or their distinct identity, but rarely intervals or ratios thus you cannot: multiply soil type 2 by soil type 3 and get soil type 6 divide urban area by the rank of a city to get a meaningful number subtract suitability class 1 from suitability class 4 to get 3 of anything however, you can: divide population by area (both ratio scales) and get population density subtract elevation at point a from elevation at point b and get difference of elevation Multiple representations a data model is essential to represent geographical data in a digital database there are many different data models the same phenomena may be represented in different ways, at different scales and with different levels of accuracy thus there may be multiple representations of the same geographical phenomena it is difficult to convert from one representation to another e.g. from a small scale (1:250,000) to a large scale (1:10,000) thus it is common to find databases with multiple representations of the same phenomenon this is wasteful, but techniques to avoid it are poorly developed E. DATA SOURCES Primary data collection some of the data in a spatial database may have been measured directly e.g. by field sampling or remote sensing the density of sampling determines the resolution of the data e.g. samples taken every hour will capture hour-to- hour variation, but miss shorter-term variation e.g. samples taken every 1 km will miss any variation at resolutions less than 1 km a sample is designed to capture the variation present in a larger universe e.g. a sample of places should capture the variation present at all possible places e.g. a sample of times will be designed to capture variation at all possible times there are several standard approaches to sampling: in a random sample, every place or time is equally likely to be chosen systematic samples are chosen according to a rule, e.g. every 1 km, but the rule is expected to create no bias in the results of analysis, i.e. the results would have been similar if a truly random sample had been taken in a stratified sample, the researcher knows for some reason that the universe contains significantly different sub-populations, and samples within each sub-population in order to achieve adequate representation of each e.g. we may know that the topography is more rugged in one part of the area, and sample more densely there to ensure adequate representation if a representative sample of the entire universe is required, then the subsamples in each subpopulation will have to be weighted appropriately Secondary data sources some data may have been obtained from existing maps, tables, or other databases such sources are termed secondary to be useful, it is important to obtain information in addition to the data themselves: information on the procedures used to collect and compile the data information on coding schemes, accuracy of instruments unfortunately such information is often not available a user of a spatial database may not know how the data were captured and processed prior to input this often leads to misinterpretation, false expectations about accuracy F. STANDARDS standards may be set to assure uniformity within a single data set across data sets e.g. uniform information about timber types throughout the database allows better fire fighting methods to be used, or better control of insect infestations data capture should be undertaken in standardized ways that will assure the widest possible use of the information Sharing data it is not uncommon for as many as three agencies to create databases with, ostensibly, the same information e.g. a planning agency may map landuse, including a forested class e.g. the state department of forestry also maps forests e.g. the wildlife division of the department of conservation maps habitat, which includes fields and forest each may digitize their forest class onto different GIS systems, using different protocols, and with different definitions for the classes of forest cover this is a waste of time and money sharing information gives it added value sharing basic formats with other information providers, such as a department of transportation, might make marketing the database more profitable Agency standards state and national agencies have set standards for certain environmental data the Soil Conservation Service (SCS) has adopted the "seventh approximation"as the national taxonomy the US Geological Survey has set standards for landuse, transportation, and hydrography that are used as guidelines in many states forest inventories are not standardized; agencies may use different systems while managing a contiguous region of forest land Unit 69 covers standards for GIS in greater depth G. ERRORS AND ACCURACY note: Units 45 and 46 discuss this topic in detail there is a nearly universal tendency to lose sight of errors once the data are in digital form errors: are implanted in databases because of errors in the original sources (source errors) are added during data capture and storage (processing errors) occur when data are extracted from the computer arise when the various layers of data are combined in an analytical exercise Original Sin - errors in sources are extremely common in non-mapped source data, such as locations of wells, or lot descriptions can be caused by doing inventory work from aerial photography and misinterpreting images often occur because base maps are relied on too heavily a recent attempt in Minnesota to overlay Department of Transportation bridge locations on USGS transportation data resulted in bridges lying neither beneath roads, nor over water, and roads lying apparently under rivers until they were compared in this way, it was assumed that each data set was locationally acceptable the ability of GIS to overlay may expose previously unsuspected errors Boundaries boundaries of soil types are actually transition zones, but are mapped by lines less than 0.5 mm wide lakes fluctuate widely in area, yet have permanently recorded shorelines Classification errors are common when tabular data are rendered in map form simple typing errors may be invisible until presented graphically floodplain soils may appear on hilltops pastureland may appear to be misinterpreted marsh more complex classification errors may be due to the sampling strategies that produced the original data timber appraisal is commonly done using a few, randomly selected points to describe large stands information may exist that documents the error of the sampling technique however, such information is seldom included in the GIS database Data capture errors manual data input induces another set of errors eye-hand coordination varies from operator to operator and from time to time data input is a tedious task - it is difficult to maintain quality over long periods of time Accuracy standards many agencies have established accuracy standards for geographical data these are more often concerned with accuracy of locations of objects than with accuracy of attributes location accuracy standards are commonly decided from the scale of source materials for natural resource data 1:24,000 scale accuracy is a common target at this scale, 0.5 mm line width = 12 m on the ground USGS topographic information is currently available in digital form at 1:100,000 0.5 mm line width = 50 m on the ground higher accuracy requires better source materials is the added cost justified by the objectives of the study? accuracy standards should be determined by considering both the value of information and the cost of collection REFERENCES Berry, B.J.L and A.M. Baker, 1968. "Geographic sampling. In B.J.L. Berry and D.F. Marble, editors, Spatial Analysis. Prentice Hall, Englewood Cliffs NJ, 91-100. A classic paper on sampling geographical distributions. Hopkins, Lewis D., 1977, "Methods for generating land suitability maps: A comparative evaluation,"AIP Journal October 1977:386-400. An excellent discussion of the different measurement scales is given in an appendix. Sinton, D., 1978. "The inherent structure of information as a constraint to analysis: mapped thematic data as a case study, Harvard Papers on Geographic Information Systems, Vol. 7, G. Dutton (ed.), Addison Wesley, Reading, MA. A classic paper on the relationships between the database and reality. Standard sampling theory is covered in many texts on scientific measurement. EXAM AND DISCUSSION QUESTIONS 1. Take an example map showing the observed occurrences of some rare event, and discuss the factors influencing the sampling process. Good examples are maps of tornado sightings, herbarium records of rare plants. 2. Using a topographic map, discuss the ways in which the contents and design of the map influence the user''s view of the real world. 3. Review the accuracy information available for several different scales and types of maps, and spatial databases if available. 4. The Global Positioning System (GPS) will soon be capable of providing latitude and longitude positions to the nearest meter using portable receivers weighing on the order of 1 kg, in no more than one minute. This is significantly more accurate than the best base mapping generally available in the US (1:24,000). Discuss what effect this system might have on map makers and map users. DATA INPUT A. INTRODUCTION Modes of data input B. DIGITIZERS Hardware The digitizing operation Problems with digitizing maps Editing errors from digitizing Digitizing costs C. SCANNERS Video scanner Electromechanical scanner Requirements for scanning D. CONVERSION FROM OTHER DIGITAL SOURCES Automated Surveying Global Positioning System (GPS) E. CRITERIA FOR CHOOSING MODES OF INPUT F. RASTERIZATION AND VECTORIZATION Rasterization of digitized data Vectorization of scanned images G. INTEGRATING DIFFERENT DATA SOURCES Formats Projections Scale Resampling rasters REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES This unit examines the common methods of data input. This may be a good time to take a field trip to a local GIS shop to show students the operation of these various devices. If you can''t find local examples, the slide set contains some examples of the hardware items described. UNIT 7 - DATA INPUT Compiled with assistance from Jeffrey L. Star, University of California at Santa Barbara, and Holly Dickinson, SUNY Buffalo A. INTRODUCTION need to have tools to transform spatial data of various types into digital format data input is a major bottleneck in application of GIS technology costs of input often consume 80% or more of project costs data input is labor intensive, tedious, error-prone there is a danger that construction of the database may become an end in itself and the project may not move on to analysis of the data collected essential to find ways to reduce costs, maximize accuracy need to automate the input process as much as possible, but: automated input often creates bigger editing problems later source documents (maps) may often have to be redrafted to meet rigid quality requirements of automated input because of the costs involved, much research has gone into devising better input methods - however, few reductions in cost have been realized sharing of digital data is one way around the input bottleneck more and more spatial data is becoming available in digital form data input to a GIS involves encoding both the locational and attribute data the locational data is encoded as coordinates on a particular cartesian coordinate system source maps may have different projections, scales several stages of data transformation may be needed to bring all data to a common coordinate system attribute data is often obtained and stored in tables Modes of data input keyboard entry for non-spatial attributes and occasionally locational data manual locating devices user directly manipulates a device whose location is recognized by the computer e.g. digitizing automated devices automatically extract spatial data from maps and photography e.g. scanning conversion directly from other digital sources voice input has been tried, particularly for controlling digitizer operations not very successful - machine needs to be recalibrated for each operator, after coffee breaks, etc. B. DIGITIZERS digitizers are the most common device for extracting spatial information from maps and photographs the map, photo, or other document is placed on the flat surface of the digitizing tablet Hardware the position of an indicator as it is moved over the surface of the digitizing tablet is detected by the computer and interpreted as pairs of x,y coordinates the indicator may be a pen-like stylus or a cursor (a small flat plate the size of a hockey puck with a cross-hair) frequently, there are control buttons on the cursor which permit control of the system without having to turn attention from the digitizing tablet to a computer terminal digitizing tablets can be purchased in sizes from 25x25 cm to 200x150 cm, at approximate costs from $500 to $5,000 early digitizers (ca. 1965) were backlit glass tables a magnetic field generated by the cursor was tracked mechanically by an arm located behind the table the arm''s motion was encoded, coordinates computed and sent to a host processor some early low-cost systems had mechanically linked cursors - the free-cursor digitizer was initially much more expensive the first solid-state systems used a spark generated by the cursor and detected by linear microphones problems with errors generated by ambient noise contemporary tablets use a grid of wires embedded in the tablet to generate a magnetic field which is detected by the cursor accuracies are typically better than 0.1 mm this is better than the accuracy with which the average operator can position the cursor functions for transforming coordinates are sometimes built into the tablet and used to process data before it is sent to the host The digitizing operation the map is affixed to a digitizing table three or more control points ("reference points", "tics", etc.) are digitized for each map sheet these will be easily identified points (intersections of major streets, major peaks, points on coastline) the coordinates of these points will be known in the coordinate system to be used in the final database, e.g. lat/long, State Plane Coordinates, military grid the control points are used by the system to calculate the necessary mathematical transformations to convert all coordinates to the final system the more control points, the better digitizing the map contents can be done in two different modes: in point mode, the operator identifies the points to be captured explicitly by pressing a button in stream mode points are captured at set time intervals (typically 10 per second) or on movement of the cursor by a fixed amount advantages and disadvantages: in point mode the operator selects points subjectively two point mode operators will not code a line in the same way stream mode generates large numbers of points, many of which may be redundant stream mode is more demanding on the user while point mode requires some judgement about how to represent the line most digitizing is currently done in point mode Problems with digitizing maps arise since most maps were not drafted for the purpose of digitizing paper maps are unstable: each time the map is removed from the digitizing table, the reference points must be re-entered when the map is affixed to the table again if the map has stretched or shrunk in the interim, the newly digitized points will be slightly off in their location when compared to previously digitized points errors occur on these maps, and these errors are entered into the GIS database as well the level of error in the GIS database is directly related to the error level of the source maps maps are meant to display information, and do not always accurately record locational information for example, when a railroad, stream and road all go through a narrow mountain pass, the pass may actually be depicted wider than its actual size to allow for the three symbols to be drafted in the pass discrepancies across map sheet boundaries can cause discrepancies in the total GIS database e.g. roads or streams that do not meet exactly when two map sheets are placed next to each other user error causes overshoots, undershoots (gaps) and spikes at intersection of lines diagram user fatigue and boredom for a complete discussion on the manual digitizing process, see Marble et al, 1984 Editing errors from digitizing some errors can be corrected automatically small gaps at line junctions overshoots and sudden spikes in lines error rates depend on the complexity of the map, are high for small scale, complex maps these topics are explored in greater detail in later Units Unit 13 looks at the process of editing digitized data Units 45 and 46 discuss digitizing error Digitizing costs a common rule of thumb in the industry is one digitized boundary per minute e.g. it would take 99/60 = 1.65 hours to digitize the boundaries of the 99 counties of Iowa C. SCANNERS Video scanner essentially television cameras, with appropriate interface electronics to create a computer-readable dataset available in either black and white or color extremely fast (scan times of under 1 second) relatively inexpensive ($500 - $10,000) produce a raster array of brightness (or color) values, which are then processed much like any other raster array typical data arrays from video scanners are of the order of 250 to 1000 pixels on a side typically have poor geometrical and radiometrical characteristics, including various kinds of spatial distortions and uneven sensitivity to brightness across the scanned field video scanners are difficult to use for map input because of problems with distortion and interpretation of features Electromechanical scanner unlike the video scanning systems, electromechanical systems are typically more expensive ($10,000 to 100,000) and slower, but can create better quality products one common class of scanners involves attaching the graphic to a drum as the drum rotates about its axis, a scanner head containing a light source and photodetector reads the reflectivity of the target graphic, and digitizing this signal, creates a single column of pixels from the graphic the scanner head moves along the axis of the drum to create the next column of pixels, and so on through the entire scan compare the action of a lathe in a machine shop this controls distortion by bringing the single light source and detector to position on a regular grid of locations on the graphic systems may have a scan spot size of as little as 25 micrometers, and be able to scan graphics of the order of 1 meter on a side an alternative mechanism involves an array of photodetectors which extract data from several rows of the raster simultaneously the detector moves across the document in a swath when all the columns have been scanned, the detector moves to a new swath of rows for an in-depth discussion scanning techniques, see Peuquet and Boyle (1984) Requirements for scanning documents must be clean (no smudges or extra markings) lines should be at least 0.1 mm wide complex line work provides greater chance of error in scanning text may be accidently scanned as line features contour lines cannot be broken with text automatic feature recognition is not easy (two contour lines vs. road symbols) diagram special symbols (e.g. marsh symbols) must be recognized and dealt with if good source documents are available, scanning can be an efficient time saving mode of data input DATA INPUT D. CONVERSION FROM OTHER DIGITAL SOURCES involves transferring data from one system to another by means of a conversion program more and more data is becoming available in magnetic media USGS digital cartographic data (DLGs - Digital Line Graphs) digital elevation models (DEMs) TIGER and other census related data data from CAD/CAM systems (AutoCAD, DXF) data from other GIS these data generally are supplied on digital tapes that must be read into the computer however, CD-ROM is becoming increasingly popular for this purpose provides better standards CD-ROM hardware is much less expensive - CD-ROM drive $1000, tape drive $14,000 Automated Surveying directly determines the actual horizontal and vertical positions of objects two kinds of measurements are made: distance and direction traditionally, distance measuring involved pacing, chains and tapes of various materials direction measurements were made with transits and theodolites modern surveyors have a number of automated tools to make distance and direction measurements easier electronic systems measure distance using the time of travel of beams of light or radio waves by measuring the round-trip time of travel, from the observing instrument to the object in question and back, we can use the relationship (d = v x t) to determine the distance an instrument based on timing the travel of a pulse of infrared light can measure distances on the order of 10 km with a standard deviation of +/- 15 mm the total station (cost about $30,000) captures distance and direction data in digital form the data is downloaded to a host computer at the end of each session for direct input to GIS and other programs Global Positioning System (GPS) a new tool for determining accurate positions on the surface of the earth computes positions from signals received from a series of satellites (NAVSTAR) as of April, 1990 there are 20 in orbit, by 1991 there should be the full set of 24 are currently 7 active but eventually will be 21 depends on precise information about the orbits of the satellites a radio receiver with appropriate electronics is connected to a small antenna, and depending on the method used, in one hour to less than 1 second, the system is able to determine its location in 3-D space developed and operated by the US armed forces, but access is generally available and civilian interest is high particularly valuable for establishing accurate positional control in remote areas current GPS receivers cost about $5,000 to $15,000 (mid 1990) but costs will decline rapidly railroad companies are using GPS to create the first accurate survey of the US rail network and to track train positions recently, the use of GPS has resulted in corrections to the elevations of many of the world''s peaks, including Mont Blanc and K2 current GPS positional accuracies are order 5 to 10 m with standard equipment and as small as 1 cm with "survey grade" receivers accuracy will continue to improve as more satellites are placed in orbit and experts fine tune the software and hardware GPS accuracy is already as good as the largest scale base mapping available for the continental US E. CRITERIA FOR CHOOSING MODES OF INPUT the type of data source images favor scanning maps can be scanned or digitized the database model of the GIS scanning easier for raster, digitizing for vector the density of data dense linework makes for difficult digitizing expected applications of the GIS implementation F. RASTERIZATION AND VECTORIZATION Rasterization of digitized data for some data, entry in vector form is more efficient, followed by conversion to raster we might digitize the county boundary in vector form by mounting a map on a digitizing table capturing the locations of points along the boundary assuming that the points are connected by straight line segments this may produce an ASCII file of pairs of xy coordinates which must then be processed by the GIS, or the output of the digitizer may go directly into the GIS the vector representation of the boundary as points is then converted to a raster by an operation known as vector-raster conversion the computer calculates which county each cell is in using the vector representation of the boundary and outputs a raster digitizing the boundary is much less work than cell by cell entry most raster GIS have functions such as vector-raster conversion to support vector entry many support digitizing and editing of vector data Vectorization of scanned images for many purposes it is necessary to extract features and objects from a scanned image e.g. a road on the input document will have produced characteristic values in each of a band of pixels if the scanner has pixels of 25 microns = 0.025 mm, a line of width 0.5 mm will create a band 20 pixels across the vectorized version of the line will be a series of coordinate points joined by straight lines, representing the road as an object or feature instead of a collection of contiguous pixels successful vectorization requires a clean line scanned from media free of cluttering labels, coffee stains, dust etc. to create a sufficiently clean line, it is often necessary to redraft input documents e.g. the Canada Geographic Information System redrafted each of its approximately 10,000 input documents since the scanner can be color sensitive, vectorizing may be aided by the use of special inks for certain features although scanning is much less labor intensive, problems with vectorization lead to costs which are often as high as manual digitizing two stages of error correction may be necessary: 1. edit the raster image prior to vectorization 2. edit the vectorized features G. INTEGRATING DIFFERENT DATA SOURCES Formats many different format standards exist for geographical data some of these have been established by public agencies e.g. the USGS in cooperation with other federal agencies is developing SDTS (Standard Data Transfer Standard) for geographical data, will propose it as a national standard in 1990 e.g. the Defense Mapping Agency (DMA) has developed the DIGEST data transfer standard some have been defined by vendors e.g. SIF (Standard Interchange Format) is an Intergraph standard for data transfer see Unit 69 for more on GIS standards a good GIS can accept and generate datasets in a wide range of standard formats Projections there are many ways of representing the curved surface of the earth on a flat map some of these map projections are very common, e.g. Mercator, Universal Transverse Mercator (UTM), Lambert Conformal Conic each state has a standard SPC (State Plane Coordinate system) based on one or more projections see Unit 27 for more on map projections a good GIS can convert data from one projection to another, or to latitude/longitude input derived from maps by scanning or digitizing retains the map''s projection with data from different sources, a GIS database often contains information in more than one projection, and must use conversion routines if data are to be integrated or compared Scale data may be input at a variety of scales although a GIS likely will not store the scale of the input document as an attribute of a dataset, scale is an important indicator of accuracy maps of the same area at different scales will often show the same features e.g. features are generalized at smaller scales, enhanced in detail at larger scales variation in scales can be a major problem in integrating data e.g. the scale of most input maps for a GIS project is 1:250,000 (topography, soils, land cover) but the only geological mapping available is 1:7,000,000 if integrated with the other layers, the user may believe the geological layer is equally accurate in fact, it is so generalized as to be virtually useless Resampling rasters raster data from different sources may use different pixel sizes, orientations, positions, projections resampling is the process of interpolating information from one set of pixels to another resampling to larger pixels is comparatively safe, resampling to smaller pixels is very dangerous REFERENCES Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment, Clarendon, Oxford. Chapter 4 reviews alternative methods of data input and editing for GIS. Chrisman, N.R., 1978. "Efficient digitizing through the combination of appropriate hardware and software for error detection and editing," International Journal of Geographical Information Systems 1:265-77. Discusses ways of reducing the data input bottleneck. Drummond, J., and M. Bosman, 1989. "A review of low-cost scanners," International Journal of Geographical Information Systems 3:83-97. A good review of current scanning technology. Ehlers, M., G. Edwards and Y. Bedard, 1989. "Integration of remote sensing with GIS: a necessary evolution," Photogrammetric Engineering and Remote Sensing 55(11):1619-27. A recent review of the relationship between the two technologies. Goodchild, M.F. and B.R. Rizzo, 1987. "Performance evaluation and work-load estimation for geographic information systems," International Journal of Geographical Information Systems 1:67-76. Statistical analysis of costs of scanning. Lai, Poh-Chin, 1988. "Resource use in manual digitizing. A case study of the Patuxent basin geographical information system database," International Journal of Geographical Information Systems 2(4):329-46. A detailed analysis of the costs of building a practical database. Marble, D.F., J.P. Lauzon, and M. McGranaghan, 1984. "Development of a Conceptual Model of the Manual Digitizing Process," Proceedings of the International Symposium on Spatial Data Handling, Volume 1, August 20- 24, 1984, Zurich Switzerland, Symposium Secretariat, Department of Geography, University of Zurich-Irchel, 8057 Zurich, Switzerland. Conceptual discussion of the digitizing process. Peuquet, D. J., 1981. "Cartographic data, part I: the raster-to-vector process," Cartographica 18:34-48. Peuquet, D. J., 1981. "An examination of techniques for reformatting digital cartographic data, part II: the vector-to-raster process," Cartographica 18:21-33. Peuquet, D. J., and A. R. Boyle, 1984. Raster Scanning, Processing and Plotting of Cartographic Documents, SPAD Systems, Ltd., P.O. Box 571, Williamsville, New York, 14221, U.S.A. A comprehensive discussion of scanning technology. Tomlinson, R.F., H.W. Calkins and D.F. Marble, 1976. Computer Handling of Geographical Data, UNESCO Press, Paris. Comparison of input methods and costs of 5 GISs. DISCUSSION AND EXAM QUESTIONS 1. In his book Computers and the Representation of Geographical Data (Wiley, New York, 1987), E.E. Shiryaev argues that maps must be redesigned to be equally readable by humans and computer scanners, and that this would ultimately make scanning much more cost-effective than digitizing. How might this be done, and what advantages would it have? 2. The cost of digitizing has remained remarkably constant over the past 20 years despite dramatic reductions in computer hardware and software cost. Why is this, and what impact has it had on GIS? Do you predict any change in this situation in the future? 3. "Digitizing is a suitable activity for convicted criminals." Discuss. 4. As manager of a GIS operation, you have the task of laying out rules which your staff must follow in digitizing complex geographical lines. What instructions would you give them to ensure a reasonable level of accuracy? Assume they will be using point mode digitizing, and that points will be connected by straight lines for analysis and output. 5. What type of documents are best suited for automatic scanning? 6. After reading the article by Marble, Lauzon and McGranaghan on the conceptual model of digitizing, describe and explain the importance of map pre-processing SOCIO-ECONOMIC DATA INTRODUCTION Socio-economic data Aggregate and disaggregate data Cross-sectional and longitudinal data B. SOCIO-ECONOMIC DATA FOR GIS Sources of socio-economic data "Geography" Issues in using secondary socio-economic data C. SOURCES OF SOCIO-ECONOMIC DATA Population census Economic census Agricultural census Labor force statistics Land records Transportation and infrastructure inventories Administrative records D. US CENSUS OF POPULATION AND HOUSING Process of taking the census Content Processing of returns Geographic referencing Census reporting zones Availability of Census data E. TIGER Development Content Marketing TIGER files Non-census uses for TIGER F. LAND RECORDS Issues in land records modernization REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES It may be useful to illustrate this unit with several different examples of the data products described, including examples of census products such as summary reports, maps and even digital tapes. UNIT 8 - SOCIO-ECONOMIC DATA Compiled with assistance from Hugh Calkins, State University of New York at Buffalo A. INTRODUCTION Socio-economic data are data about humans, human activities, and the space and/or structures used to conduct human activities specific classes include demographics (age, sex, ethnic and marital status, education) housing (quality, cost) migration transportation economics (personal incomes, employment, occupations, industry, regional growth) retailing (customer locations, store sites, mailing lists) Aggregate and disaggregate data disaggregated data - data about individuals or single entities, for example: a person''s age, sex, level of education, income, occupation, etc. gross sales, number of employees, profit, etc. for a retail store registration number and type for a single vehicle aggregated data - describing a group of observations with the grouping made on a defined criterion geographical data are often grouped by spatial units such as a census tract, traffic zone, etc. aggregation can also be by time interval e.g. number of persons leaving area in 5 years also by socio-economic grouping e.g. persons aged 5 through 14 years examples of aggregated data are: number of persons, average income, median housing value for a census tract number of commute trips and average trip length from a suburban traffic zone to the central business district Cross-sectional and longitudinal data recall from Unit 6 cross-sectional data gives information on many areas for the same single slice or interval of time e.g. average income in census tracts of Los Angeles for 1988 e.g. numbers migrating out of each state in the period 1971-75 longitudinal data gives information on one or more areas for a series of times e.g. average income for State of New York from 1970-1988 by year B. SOCIO-ECONOMIC DATA FOR GIS Sources of socio-economic data field surveys much data used in marketing is gathered by door-to- door or street interview field surveys require careful sampling design how to obtain a representative sample how to avoid bias toward certain groups in street interviews government statistics statistics collected and reported by government as part of required activities, e.g. Bureau of the Census usually based on entire population, except sampling is used for some Census questions government administrative records records are collected by government as part of administrative functions, e.g. tax records, auto registrations, property taxes these are useful sources of data provided confidentiality can be preserved usually available only to government or for research purposes secondary data collected by another group, often for different purposes e.g. the original mandated purpose of the Census was to provide data for congressional districting increasingly socio-economic data is available in digital form from private sector companies retailers and direct-mail companies are major clients for these companies includes data originally from census augmented from other sources and surveys data can be customized for clients (special sets of variables, special geographical coverage or aggregation) customizing justifies costs, which are often higher than for "raw" census data "Geography" for use in GIS, socio-economic statistics are of little use without associated "geography," the term often used to describe locational data e.g. data on census tracts must be supported by digital information on locations of census tract boundaries geography also allows data to be aggregated geographically, e.g. by merging data on individual cities into metropolitan regions thus, many suppliers of socio-economic data also supply digitized geography of reporting zones boundaries of many standard types of reporting zones change from time to time e.g. changes occur occasionally in county boundaries e.g. census enumeration districts are redefined for each census (see Redistricting in Unit 56) difficult to assemble longitudinal data for such units due to changing geography data is often needed for one set of reporting zones, only available for another set e.g. data available for census tracts, required for school districts which do not follow same boundaries such problems of cross-area estimation are facilitated by GIS technology these problems are often grouped into the area of modifiable area problems (MAP) considerable effort has been expended recently to develop statistically sound techniques to deal with these problems (see Openshaw, 1981) Issues in using secondary socio-economic data cost usually secondary data is much less expensive than field surveys large expenditures by government agencies on data collection (e.g. US Census) are indirect subsidies to users, who often pay much less than real cost of data documentation quality of documentation, supporting information (e.g. maps) is usually high for data collected by government data quality major difficulty is undercounting - census and other social surveys tend to miss certain groups, leading to bias in results undercounting in US Census may be as high as 25% for certain social groups data conversion conversion steps may be necessary to make data useful in GIS e.g. format, type of data may be incompatible aggregation are data available with suitable level of spatial, temporal aggregation? e.g. study to change elementary school district boundaries will require data at resolution of city blocks or higher e.g. location for gas station will require city block level data, for regional shopping mall much lower resolution (greater aggregation of data) is adequate currency social data changes rapidly, can be quickly out of date because of births, deaths, migration, changing economy competitive edge in retailing depends on having current data US has a major census only every 10 years, so its data may be 10 years old often have to estimate current or future patterns based on old data accuracy of location census locates people by place of residence - "night-time" census "daytime" data would show locations during the day (place of work, school etc.) but is generally not available from standard sources medical records often locate individuals by place of treatment (hospital), not residence or workplace e.g. consider implications for detecting exposure to cancer-causing agents C. SOURCES OF SOCIO-ECONOMIC DATA Population census questions on age, sex, income, education, ethnicity, migration, housing quality etc. summary statistics used in research, planning, market research, available at high level of geographic resolution in many countries see detailed discussion following for US case (Census of Population and Housing) Economic census enumeration and tabulation of business activity is conducted in the US by the Census Bureau in years ending in 2 and 7 detailed information on classes of industry low level of geographic resolution (i.e. large reporting zones) data collected in many countries through annual, quarterly or monthly returns of information from companies Agricultural census annual data on crops, yields, livestock etc. more extensive periodic surveys of farm economy available in spatially disaggregated form to e.g. county level in US Labor force statistics enumeration of employment, unemployment produced from periodic (e.g. monthly) sample surveys of workforce other special-purpose surveys often combined with regular labor force survey - e.g. household expenditures, recreation activities often available for small areas, e.g. parts of city Land records record of land parcel description, ownership and value for taxation purposes updated on a regular basis (e.g. annually) by municipality or county government also used for land use planning source of current demographic information in some countries/states (i.e. local census) see detailed discussion following Transportation and infrastructure inventories planning, management and maintenance of facilities includes roads and streets, power lines, gas lines, water, sewer lines collected by local utilities, responsible government departments valuable to variety of users e.g. construction companies needing information on buried pipes e.g. emergency management departments needing data on hazardous facilities compiling agency often sees a substantial market for such data which can offset costs of collection Administrative records vehicle registrations, tax returns etc. useful for various marketing, research purposes based on 100% sample so can be disaggregated spatially however, disaggregation causes problems over confidentiality of records D. US CENSUS OF POPULATION AND HOUSING Process of taking the census purpose is to enumerate the population for redefining election districts taken every ten years (l960, l970, etc.) April lst is census day, although complete enumeration takes a "few" weeks most households receive forms in mail, some require visit by enumerator Content two types of items - those completed by "100%" of the population, those by random sample Processing of returns automated encoding to digital form automated editing to correct obvious inconsistencies some missing items can be assigned automatically using simple rules other missing items are assigned based on probabilities data assembled into master database sample surveys processed to produce statistical summaries Geographic referencing initially returns are identified by street address address is converted into geographic location using a digital referencing system for the 1980 census, DIME (Dual Independent Map Encoding) files were used for digital geographic referencing of urbanized portions of the US for the 1990 census, TIGER files covering every county will be used since TIGER files will have a major impact on GIS databases in the next decade, they are discussed in detail in the next section Census reporting zones range from blocks to states as noted previously, the geographic boundaries and definitions of these areas may change from one census to the next Availability of Census data tabulation of statistics by reporting zones, e.g. population by county, population by age by county crosstabulation, e.g. population by age and sex by county special tabulations, e.g. for unusual combinations of characteristics, or for unusual or custom reporting zones number of possible tabulations and crosstabulations is infinite, volume of census products vastly exceeds volume of data collected alternative formats for products printed reports magnetic media - tapes, disks microfiche, microfilm, now CDs sources of census data state data centers distribute Census data private firms repackage and customize data, produce custom reports (e.g. tabulation of population by distance from proposed mall location) geography products available base maps showing reporting zones atlases produced for urban areas digital products - boundary files, TIGER SOCIO-ECONOMIC DATA E. TIGER reference: beginning of this Unit (TIGER) Development TIGER stands for Topologically Integrated Geographic Encoding and Referencing designed to: support pre-census geographic and cartographic functions in preparation for the 1990 Census to complete and evaluate the data collection operations of the census to assist in the analysis of the data as well as to produce new cartographic products TIGER files were created by the Bureau of the Census with the assistance of the US Geological Survey Content TIGER/line files are organized by county they contain: map features such as roads, railroads and rivers census statistical area boundaries political boundaries in metropolitan areas, address ranges and ZIP codes for streets Marketing TIGER files Census Bureau 1990 Census versions of TIGER/Line files will be available from the Census Bureau in early 1991 cost for prototype and precensus TIGER/Line files on magnetic tape are $200 (US) for the first county and $25 for each additional county in that state ordered at the same time the 50 states plus DC on tape cost $87,450 precensus files are also available on CD-ROM for $250 per disk, 40 disks are required for coverage of the entire country (all prices as of Jan. 1990) Third party vendors as of December 1989, 25 vendors had notified the Census Bureau that they will market repackaged versions of TIGER/Line files, in many cases with software which will enable users to access this data easily and quickly many of these products are being designed for use on micro-computers Non-census uses for TIGER TIGER files are valuable for other purposes e.g. locating customers from address lists e.g. planning vehicle routes through city streets, for parcel delivery, cab dispatching for these purposes TIGER files need to be kept current at all times, but Bureau of the Census only requires them to be current every 10 years see Unit 29 for technical details of TIGER files F. LAND RECORDS many systems have been developed by local governments in the US to manage land, particularly in urban areas in other countries there has been more effective coordination at provincial and national levels, e.g. Australia practices in different countries depend on the system of land tenure the basic entity in land records systems is the land parcel, i.e. the basic unit of ownership traditionally, land records have been managed by hand using methods which often date back 200 years land records are the basis of the system of local taxation, administration, as well as transfer of ownership and subdivision Issues in land records modernization accurate land records systems require accurate base mapping at a large enough scale, e.g. 1:1,000 such base mapping is not normally available in the US, only the wealthiest governments can afford to create it, e.g. from air photos the term cadaster is used for mapping of land ownership the cost of building land records systems can often be recovered, at least partially, from sales of data (e.g. to utilities, real estate developers) and use in other departments the term multi-purpose cadaster (MPC) describes the idea of using the cadaster for many purposes because land records systems are being developed independently by many different jurisdictions, there is little standardization of approach, software, etc. see Unit 54 for a discussion of MPC applications REFERENCES The Bureau of the Census, US Department of Commerce produces numerous documents on the Census and its products, including TIGER. Factfinder for the Nation describes data available from the Census Bureau. Census ''90 Basics describe the content, geographic areas and products of the census. Similar material is available from appropriate organizations in other countries, e.g. Statistics Canada. Marx, R. W., ed, 1990. "The Census Bureau''s TIGER System," a special issue of Cartography and Geographic Information Systems Vol 17(1). Contains several articles providing details on the contents and database structure of TIGER. Kaplan, C.P. and T.L. van Valey, 1980. CENSUS ''80: Continuing the Factfinder Tradition, US Department of Commerce, Bureau of the Census. A good review of Census applications. Richards, D. and P.M. Jones, 1984. "General sources of information," in R.L. Davies and D.S. Rogers, eds., Store Location and Store Assessment Research, John Wiley and Sons, New York, Chapter 4. This chapter reviews sources of socio-economic data in both the US and the UK. Marx, R.W., 1986. "The TIGER System: Automating the Geographic Structure of the United States Census," Government Publications Review 13:181-201. Discusses the development of the TIGER system Openshaw, S., 1977. "A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modelling," Institute of British Geographers, Transactions 2(NS):459-72. Openshaw, S., and P.J. Taylor, 1981. "The modifiable areal unit problem," in N. Wrigley and R.J. Bennett, editors, Quantitative Geography: A British View, Routledge, London. EXAM AND DISCUSSION QUESTIONS 1. Confidentiality is a major issue in the US Census, and the need to preserve privacy conflicts directly with the need for disaggregated data for numerous purposes. What are the factors to be considered in trying to reconcile these conflicting needs? Is the balance affected by use of GIS? 2. Devise a scheme for creating and maintaining a constantly updated digital file of all streets and associated address ranges etc., i.e. a perpetually current TIGER. What would be the costs of the scheme, and what advantages would it have over the current situation? 3. "The concept of a decennial census was devised almost two hundred years ago and has become increasingly inappropriate to the modern age". Discuss. 4. A spreadsheet (such as Lotus 1-2-3) allows the user to perform a variety of functions on tabular data. Discuss the possibility of a "geographical spreadsheet" - what would it do, and what applications would it have it? ENVIRONMENTAL AND NATURAL RESOURCE DATA . INTRODUCTION Contents of environmental databases B. CHARACTERISTICS Spatial management units C. SOURCES OF DATA Thematic Topographic Remote sensing D. REMOTE SENSING AND GIS Wavelengths Scale in images Elevation Image interpretation Classification Problems in classification Using remotely sensed data in GIS E. EXAMPLE DATABASE - MLMIS Minnesota Land Management Information System (MLMIS) Example use of MLMIS data layers REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES You may prefer to use a local example of a natural resources database in place of the section on the MLMIS. This section can then serve as an outline for the organization of information about your local example. Examples of different air photos (low level, high level, oblique), satellite (natural color, false color) and radar images would be useful illustrations for this unit. UNIT 9 - ENVIRONMENTAL AND NATURAL RESOURCE DATA Compiled with assistance from Charles Parson, Bemidji State University and Jeffrey L. Star, University of California, Santa Barbara A. INTRODUCTION natural resource-based GISs may be used: as an inventory tool to better manage the marketing of the resource to protect the resource from improper development to model the complex interactions between phenomena so that forecasts can be used in decision-making Contents of environmental databases there are several different kinds of information needed in an environmental database many of these are obvious: geology, vegetation, hydrology, soils however, to address a range of issues, the environmental database must include several characteristics that are not generally perceived as "natural" transportation network political boundaries management unit boundaries other data may be needed for modeling, e.g. variables relating to: erosion groundwater flow soil productivity B. CHARACTERISTICS natural resource data in GIS is comparatively static update can be infrequent spatial resolution can be relatively low e.g. grid cells covering large areas historically, natural resource GIS have been raster-based adequate for many planning and management applications can provide comprehensive coverage of a jurisdiction at reasonable cost could often run on existing mainframes - hardware requirements were modest Spatial management units the actual management units of most natural resources in North America are pseudo-rasters square, forty acre parcels are the standard building block for PLSS areas (areas surveyed under the Public Land Survey System) of the Midwest, and Western United States, and much of Canada "forties" are frequently broken into ten acre units, or combined into: quarter sections (160 acres) sections (640 acres, 1 square mile) townships (6x6 miles) farms are managed in rectangular fields and forest resources are sold in similar acreage units however, natural resources do not commonly conform to PLSS grids vector-based systems appear better able to accurately represent them on the other hand, satellite imagery, which is an important source of environmental data is raster-based C. SOURCES OF DATA Thematic thematic map series are compiled by various agencies: soil maps (e.g. Soil Conservation Service) land use (e.g. USGS land use series) vegetation (forestry agencies, many state governments) surficial geology (US and state geological surveys) Topographic topographic maps can supply: elevations roads and railroads cultural features streams and lakes political and administrative boundaries public land survey system (PLSS) - "township and range" this type of data from USGS topographic maps is becoming available in digital form as DLG (digital line graph) files elevation data is available from the USGS in the form of DEMs, (digital elevation models) at various resolutions US Geological Survey supplies 30 m resolution data for much of US Remote sensing remotely sensed imagery data can be interpreted to yield many layers e.g. urban/rural, vegetation, crops, surface geology, land use LANDSAT and TM (Thematic Mapper) are commonly used sources D. REMOTE SENSING AND GIS definition of remote sensing "In the broadest sense, the measurement or acquistion of information of some property of an object or phenomena, by a recording device that is not in physical or intimate contact with the object or phenomena under study" (Manual of Remote Sensing) aircraft and satellite platforms can be used selection of a platform involves balancing a number of competing goals: ability to schedule the acquisition atmospheric distortions vs. platform stability the available suite of sensors for a given application issues of coverage and scale cost data can be captured in analog (photographs) or digital form (data, transmitted to a ground station or recorded onboard) Wavelengths key issue in a remotely sensed observation is the range of wavelengths of energy that will be observed the human eye sees only a limited range of wavelengths photographs capture visible light remotely sensed observations may include information in the infrared portion of the spectrum which is not visible to human eyes infrared sensors allow recording of the thermal characteristics of the earth''s surface microwave wavelengths can also be used Radar is a form of microwave system sometimes of particular value due to the ability to penetrate clouds and carry their own source of illumination i.e. radar systems generate and collect radiation - they are active sensors objects with large differences in their electrical properties may be discriminated, and the size of the object compared to the wavelength of the radar system is also important Scale in images key concern is the scale of the images, and how the scale varies within each image due to distortion many sources of distortion focal length of the optical system, viewing geometry, surface topography greatly affect the scale at each location in the image Elevation information on elevation can be obtained by comparing photographs taken from different camera positions, i.e. stereographic images the simplest devices for viewing pairs of photographs in stereo, called stereoscopes, effectively recreate the illusion of one''s eyes being in the same position as the camera lenses when the photographs were taken produce the impression of 3-D images more complex instruments known as stereoplotters allow operators to use pairs of photographs to develop accurate topographic maps and contours thus, by understanding the geometrical details of the camera system and the Earth''s surface, one can determine both horizontal and vertical positions of objects with high accuracy and precision an analytical plotter is a partially automated form of stereoplotter which obtains contours by automatically comparing photographs Image interpretation the identification of objects and determination of their significance involves: Identification - recognizing features on the image Measurement - once features have been identified, can make measurements (i.e., the distance between objects, the number of features per unit area) Interpretation - normally based on a systematic examination of the primitive elements of the photograph, in conjunction with a wide range of ancillary data primitive elements include tone, color, size, shape, texture, pattern, shadow, site, association automated image analysis typically relies on only the first few primitive elements (tone, color, size) ancillary data are often very diverse, may include maps, vegetation phenologies, and many kinds of information about human activities in the general area human experts bring all these elements, plus their acquired skills and knowledge of related disciplines the best photointerpreters have expertise in such related disciplines as physical geography, geology and plant biology and ecology human interpretation also includes a significant perceptual or subjective component Classification the information obtained from a remote sensing instrument consists of reflectance measurements, often in several different bands or parts of the electromagnetic spectrum measurements are in discrete units with fixed range, e.g. 0-255 the process of classification, an important part of image interpretation, attempts to assign each pixel to one of a number of classes based on its reflectance in one or more bands e.g. vegetation types or land use classes ("urban", "pasture", "cropland", "water", "forested") many techniques exist for classification supervised classification develops the rules for assigning reflectance measurements to classes using a "training area", based on input from the user, then applies the rules automatically to the remaining image unsupervised classification develops the rules automatically Problems in classification since reflectances vary with time of day, season of the year, etc., classification rules vary from image to image classification is often uncertain or inaccurate also pixels may often contain several classes - mixed pixels despite this, classification assigns a single class to every pixel, ignoring uncertainty there is no best method of classification - successful classification is time-consuming and can be expensive Using remotely sensed data in GIS often difficult or time consuming to develop systematic products of known accuracy complex operations are required to force images to correspond to a known map projection and/or to have a consistent scale difficult to go from image (varying reflectance or emissivity in different wavelength bands) to interpreted features and objects however, since the value of a GIS is directly related to the quality and currency of its internal data remote sensing offers a suite of tools for quickly creating current, consistent datasets for input to a GIS conversely, remotely sensed data is best interpreted when additional spatial datasets (representing other dates, other scales, other sensors, other methods for acquiring data about the earth) are employed such data may be obtained from a GIS thus, strong links between remote sensing and GIS can improve both technologies E. EXAMPLE DATABASE - MLMIS Minnesota Land Management Information System (MLMIS) one of the most extensive natural resource databases a statewide inventory of layers for natural resource management and planning list is the result of over fifteen years of involvement in projects that added data to the system referred to as MLMIS40 because the fundamental structure is a raster with 40 acre cells to improve spatial resolution, this is being gradually replaced with vector files at a common scale of 1:24,000 (line- width resolution 12 m) raster files with hectare grid cells Example use of MLMIS data layers how might the database (and a GIS) be used to assist a county to locate a waste disposal incinerator? REFERENCES Marble, D.F. et al., 1983. "Geographic information systems and remote sensing," Manual of Remote Sensing. ASPRS/ACSM, Falls Church, VA, 1:923-58. Reviews the various dimensions of the relationship between the two fields. Niemann, Jr., B.J., et al, 1988. "The CONSOIL project: Conservation of natural resources through the sharing of information layers," Proceedings GIS/LIS ''88, San Antonio, TX, pp. 11-25. Reviews a multi-agency project in Wisconsin to design and evaluate an LIS for soil conservation. Radde, G.L., 1987. "Under the Rainbow: GIS and Public Land Management Realities," Proceedings, IGIS ''87, Arlington, VA, 3:461-472. A discussion of the MLMIS, describes some projects that have made use of the system and how policy makers attitudes towards GIS have changed. Star, J.L., and J. Estes, 1990. Geographic Information Systems: An Introduction, Prentice-Hall, Englewood Cliffs, NJ. Chapter 5 reviews data sources. Sullivan, J.G., and B.J. Niemann, Jr., 1987. "Research Implications of eleven natural resource GIS applications," Proceedings, IGIS ''87, Arlington, VA, 3:329-341. A short review of several LIS for natural resource applications, discusses common themes, problems and techniques. EXAM AND DISCUSSION QUESTIONS 1. Review the difficulties inherent in obtaining interpreted features and objects from remotely sensed images. 2. Assume that you have access to remotely sensed images of your city with a resolution of 80 m (roughly the pixel size of Landsat). What functions of city government or local business would be able to make use of this resolution? 3. Discuss the range of errors which may exist in a soils map. 4. Discuss each of the types of data mentioned in this class in terms of required frequency of update. 5. How does a soil map become outdated? 6. What layers might you want for siting a waste incinerator which are not in the MLMIS catalog? SPATIAL DATABASES AS MODELS OF REALITY A. INTRODUCTION the real world is too complex for our immediate and direct understanding we create "models" of reality that are intended to have some similarity with selected aspects of the real world databases are created from these "models" as a fundamental step in coming to know the nature and status of that reality Definition a spatial database is a collection of spatially referenced data that acts as a model of reality a database is a model of reality in the sense that the database represents a selected set or approximation of phenomena these selected phenomena are deemed important enough to represent in digital form the digital representation might be for some past, present or future time period (or contain some combination of several time periods in an organized fashion) Standards many of the definitions in this Unit have been standardized by the proposed US National Digital Cartographic Standard (DCDSTF, 1988) these standards have been developed to provide a nationally uniform means for portraying and exchanging digital cartographic data these cartographic standards will form part of a larger standard being developed for the digital representation of all earth science information B. DATABASE CONTENT AND AN ORGANIZATION''S MISSION Organization mandates organizations have mandates to perform certain tasks that carry out their missions mandates are the reasons they exist as organizations organizations have different needs for data depending on their mandates and the activities required to carry out these mandates mandates often help identify and define entities of interest, requiring a certain view of the world what might seem at first glance to be the same data need in two different organizations can actually be quite different when we look at a more detailed level e.g. wildlife and forestry departments both need information on vegetation but the detail needed is different Database contents Example: Transportation highway data from the different points of view of a natural resources organization and a highway transportation organization a natural resource organization might only need logging roads and the connecting access to state highways the transportation organization''s main interest is in characterizing highways used by the public the database might also be used to store detailed highway condition and maintenance information we would expect their need for highway data to be more detailed than would the natural resource organization''s Example: wetlands wetlands data from the different points of view of an ecological organization and a taxing authority ecological organization might define wetlands as a natural resource to be preserved and restricted from development that perspective might require considerable detail for describing the area''s biology and physical resources a taxing authority might define a wetland to be a "wasteland" and of very little value to society that description might require only the boundary of the "wasteland" in the database Database design in each organization only certain phenomena are important enough to collect and represent in a database the data collection process involves a sampling of geographic reality, to determine the status of that reality (whether past, present or future) identifying the phenomena and then choosing an appropriate data representation for them is part of a process called database design see Units 11 and 66 for more on database design C. FUNDAMENTAL DATABASE ELEMENTS elements of reality modeled in a GIS database have two identities: 1. the element in reality - entity 2. the element as it is represented in the database - object a third identity that is important in cartographic applications is the symbol that is used to depict the object/entity as a feature on a map or other graphic display these definitions and the following concepts are based on those defined by the DCDSTF, 1988 (see references) handout - Definition of terms Entity an entity is "a phenomenon of interest in reality that is not further subdivided into phenomena of the same kind" e.g. a city could be considered an entity and subdivided into component parts but these parts would not be called cities, they would be districts, neighborhoods or the like e.g. a forest could be subdivided into smaller forests Object an object is "a digital representation of all or part of an entity" the method of digital representation of a phenomenon varies according to scale, purpose and other factors e.g. a city could be represented geographically as a point if the area under consideration were continental in scale the same city could be geographically represented as an area if we are dealing with a geographic database for a state or a county Entity types similar phenomena to be stored in a database are identified as entity types an entity type is any grouping of similar phenomena that should eventually get represented and stored in a uniform way, e.g. roads, rivers, elevations, vegetation provides convenient conceptual framework for describing phenomena at a general level organizational perspective influences this interpretation to a large degree precise definitions should be generated for each entity type helps with identifying overlapping categories of information aids in clarifying the content of the database the proposed US National Standard for Digital Cartographic Data Volume 2 (DCDSTF 1988) includes a large number of definitions for entity types handout - Sample entity definitions the first step in database development is the selection and definition of entity types to be included this is guided by the organization''s mandate and purpose of the database this framework can be as important as the actual database because it guides the development the second step of database design is to choose an appropriate method of spatial representation for each of the entity types Spatial object type the digital representation of entity types in a spatial database requires the selection of appropriate spatial object types the National Standard for Digital Cartographic Databases specifies a basic list of spatial objects and their characteristics this classification is based on the following definition of spatial dimensions: 0-D - an object that has a position in space, but no length a point 1-D - an object having a length composed of two or more 0-D objects a line 2-D - an object having a length and width bounded by at least three 1-D line segment objects an area 3-D - an object having a length, width and height/depth bounded by at least four 2-D objects a volume overhead - Spatial object types (3 pages) handout (cont) - Spatial object types note very specific definitions for line segment, string, link, chain spatial objects as representations of reality are dealt with in depth in Unit 11 Object classes an object class is the set of objects which represent the set of entities e.g. the set of points representing the set of wells Attributes an attribute is a characteristic of an entity selected for representation usually non-spatial though some may be related to the spatial character of the phenomena under study e.g. area, perimeter Attribute value the actual value of the attribute that has been measured (sampled) and stored in the database an entity type is almost always labeled and known by attributes e.g. a road usually has a name and is identified according to its class - e.g. alley, freeway attributes values often are conceptually organized in attribute tables which list individual entities in the rows and attributes in the column entries in each cell of the table represent the attribute value of a specific attribute for a specific entity note: attribute table is not an official DCDSTF term Database model is a conceptual description of a database defining entity type and associated attributes each entity type is represented by specific spatial objects after the database is constructed, the database model is a view of the database which the system can present to the user other views can be presented, but this one is likely useful because it was important in the conceptual design e.g. the system can model the data in vector form but generate a raster for purposes of display to the user need not be related directly to the way the data are actually stored in the database e.g. census zones may be defined as being represented by polygons, but the program may actually represent the polygon as a series of line segments examples of database models can be grouped by application area e.g. transportation applications require different database models than do natural resource applications Layers spatial objects can be grouped into layers, also called overlays, coverages or themes one layer may represent a single entity type or a group of conceptually related entity types e.g. a layer may have only stream segments or may have streams, lakes, coastline and swamps options depend on the system as well as the database model some spatial databases have been built by combining all entities into one layer D. DATABASE DESIGN almost all entities of geographic reality have at least a 3-dimensional spatial character, but not all dimensions may be needed e.g. highway pavement actually has a depth which might be important, but is not as important as the width, which is not as important as the length representation should be based on the types of manipulations that might be undertaken map-scale of the source document is important in constraining the level of detail represented in a database e.g. on a 1:100,000 map individual houses or fields are not visible Steps in database design 1. Conceptual software and hardware independent describes and defines included entities identifies how entities will be represented in the database i.e. selection of spatial objects - points, lines, areas, raster cells requires decisions about how real-world dimensionality and relationships will be represented these can be based on the processing that will be done on these objects e.g. should a building be represented as an area or a point? e.g. should highway segments be explicitly linked in the database? 2. Logical software specific but hardware independent sets out the logical structure of the database elements, determined by the data base management system used by the software this is discussed in greater detail in Unit 43 3. Physical both hardware and software specific requires consideration of how files will be structured for access from the disk covered in Unit 66 Desirable database characteristics database should be: contemporaneous - should contain information of the same vintage for all its measured variables as detailed as necessary for the intended applications the categories of information and subcategories within them should contain all of the data needed to analyze or model the behavior of the resource using conventional methods and models positionally accurate exactly compatible with other information that may be overlain with it internally accurate, portraying the nature of phenomena without error - requires clear definitions of phenomena that are included readily updated on a regular schedule accessible to whoever needs it Issues in database design almost all entities of geographic reality have at least 3-dimensional spatial character, but not all dimensions may be needed e.g. highway pavement has a depth which might be important, but is not as important as the width, which is not as important as the length representation should be based on types of manipulations that might be undertaken map-scale of the source document is important in constraining the level of detail represented in a database e.g. on a 1:100,000 map individual houses or fields are not visible REFERENCES Codd, E. F., 1981. "Data Models in Database Management," ACM SIGMOD Record 11(2):112-114. Explains the nature of data models, their role in constructing databases. DCDSTF - Digital Cartographic Data Standards Task Force. 1988. "The proposed standard for digital cartographic data," The American Cartographer 15(1). Summary of the major components of the proposed US National Standard. Robinson, A., R. Sale, J. Morrison, and P. Muehrcke, 1984. The Elements of Cartography, (5th ed.), John Wiley and Sons, New York. Useful survey of cartographic terminology and models. Unwin D., 1981. Introductory Spatial Analysis, Methuen, London. A spatial analysis perspective on spatial data models. SPATIAL OBJECTS AND DATABASE MODELS A. INTRODUCTION B. POINT DATA C. LINE DATA Network entities Network characteristics Attributes Networks as linear addressing systems D. AREA DATA 1. Environmental/natural resource zones 2. Socio-economic zones 3. Land records Areal coverage Holes and islands E. REPRESENTATION OF CONTINUOUS SURFACES General nature of surfaces Data structures for representing surfaces Spatial interpolation REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES This unit continues the development of basic concepts about representing reality as spatial data. Here we look at how the representation of reality in the form of entities is handled with the spatial objects points, lines and areas. UNIT 11 - SPATIAL OBJECTS AND DATABASE MODELS Compiled with assistance from Timothy L. Nyerges, University of Washington A. INTRODUCTION the objects in a spatial database are representations of real-world entities with associated attributes the power of a GIS comes from its ability to look at entities in their geographical context and examine relationships between entities thus a GIS database is much more than a collection of objects and attributes in this unit we look at the ways a spatial database can be assembled from simple objects e.g. how are lines linked together to form complex hydrologic or transportation networks e.g. how can points, lines or areas be used to represent more complex entities like surfaces? B. POINT DATA the simplest type of spatial object choice of entities which will be represented as points depends on the scale of the map/study e.g. on a large scale map - encode building structures as point locations e.g. on a small scale map - encode cities as point locations the coordinates of each point can be stored as two additional attributes information on a set of points can be viewed as an extended attribute table each row is a point - all information about the point is contained in the row each column is an attribute two of the columns are the coordinates overhead - Point data attribute table here northing and easting represent y and x coordinates each point is independent of every other point, represented as a separate row in the database model C. LINE DATA Network entities infrastructure networks transportation networks - highways and railroads utility networks - gas, electric, telephone, water airline networks - hubs and routes natural networks river channels Network characteristics a network is composed of: nodes - junctions, ends of dangling lines links - chains in the database model diagram valency of a node is the number of links at the node ends of dangling lines are "1-valent" 4-valent nodes are most common in street networks 3-valent nodes are most common in hydrology a tree network has only one path between any pair of nodes, no loops or circuits are possible most river networks are trees Attributes examples of link attributes: direction of traffic, volume of traffic, length, number of lanes, time to travel along link diameter of pipe, direction of gas flow voltage of electrical transmission line, height of towers number of tracks, number of trains, gradient, width of most narrow tunnel, load bearing capacity of weakest bridge examples of node attributes: presence of traffic lights, presence of overpass, names of intersecting streets presence of shutoff valves, transformers note that some attributes (e.g. names of intersecting streets) link one type of entity to another (nodes to links) some attributes are associated with parts of network links e.g. part of a railroad link between two junctions may be inside a tunnel e.g. part of a highway link between two junctions may need pavement maintenance many GIS systems require such attributes to be attached to the network by splitting existing links and creating new nodes e.g. split a street link at the house and attach the attributes of the house to the new (2-valent) node e.g. create a new link for the stretch of railroad which lies inside the tunnel, plus 2 new nodes this requirement can lead to impossibly large numbers of links and 2-valent nodes e.g. at a scale of 1:100,000, the US rail network has about 300,000 links the number of links would increase by orders of magnitude if new nodes had to be defined in order to locate bridges on links Networks as linear addressing systems often need to use the network as an addressing system, e.g. street network address matching is the process of locating a house on a street network from its street address e.g. if it is known that the block contains houses numbers from 100 to 198, house #124 would probably be 1/4 of the way along that link points can be located on the network by link number and distance from beginning of link this can be more useful than the (x,y) coordinates of points since it links the points to a location on the network this approach provides an answer to the problem of assigning attributes to parts of links keep such entities (houses, tunnels) in separate tables, link them to the network by link number and distance from beginning of link need one distance for point entities, two for extended entities like tunnels (start and end locations) the GIS can then compute the (x,y) coordinates of the entities if needed links need not be permanently split in this scheme D. AREA DATA is represented on area class maps, choropleth maps boundaries may be defined by natural phenomena, e.g. lake, or by man, e.g. forest stands, census zones there are several types of areas that can be represented 1. Environmental/natural resource zones examples include land cover data - forests, wetlands, urban geological data - rock types forestry data - forest "stands", "compartments" soil data - soil types boundaries are defined by the phenomenon itself e.g. changes of soil type almost all junctions are 3-valent 2. Socio-economic zones includes census tracts, ZIP codes, etc. boundaries defined independently of the phenomenon, then attribute values are enumerated boundaries may be culturally defined, e.g. neighborhoods 3. Land records land parcel boundaries, land use, land ownership, tax information Areal coverage overhead - Areal coverage 1. entities are isolated areas, possibly overlapping any place can be within any number of entities, or none e.g. areas burned by forest fires areas do not exhaust the space 2. any place is within exactly one entity areas exhaust the space every boundary line separates exactly two areas, except for the outer boundary of the mapped area areas may not overlap any layer of the first type can be converted to one of the second type each area may now have any number of fire attributes, depending on how many times it has been burned - unburned areas will have none Holes and islands areas often have "holes" or areas of different attributes wholly enclosed within them diagram the database must be able to deal with these correctly this has not always been true of GIS products cases can be complex, for example: Lake Huron is a "hole" in the North American landmass Manitoulin Island is a "hole" in Lake Huron Manitoulin Island has several large lakes, including one which is the largest lake on an island in a lake anywhere some of these lakes have islands in them some systems allow area entities to have islands more than one primitive single-boundary area can be grouped into an area object e.g. the area served by a school or shopping center may have more than one island, but only one set of attributes diagram SPATIAL OBJECTS AND DATABASE MODELS E. REPRESENTATION OF CONTINUOUS SURFACES examples of continuous surfaces are: elevation (as part of topographic data) rainfall, pressure, temperature population density potential must exist for sampling observations everywhere on an interval/ratio level General nature of surfaces critical points peaks and pits - highest and lowest points ridge lines, valley bottoms - lines across which slope reverses suddenly passes - convergence of 2 ridges and 2 valleys faults - sharp discontinuities of elevation - cliffs fronts - sharp discontinuities of slope slopes and aspects can be derived from elevations Data structures for representing surfaces traditional data models do not have a method for representing surfaces therefore, surfaces are represented by the use of points, lines or areas note: the following series of three overheads on Tiefort Mountains all represent the same area 1. points - grid of elevations overhead - Elevation represented as points DEM or Digital Elevation Model based on sampling the elevation surface at regular intervals result is a matrix of points much digital elevation data available in this form 2. lines - digitized contours overhead - Elevation represented as lines from DLG hypsography layer, identical to those on the printed map, plotted directly from stereo photography based on string object type a line connecting sampled points of equal elevation elevation is attribute could be done for rainfall, barometric pressure etc. 3. areas - TIN (Triangulated irregular network) overhead - Triangulation of a terrain surface overhead - Elevation represented as areas note: perspective diagram is developed from the triangulated surface (TIN created by M.P. Kumler, USGS) sample points often located at peaks, pits, along ridges and valleys sampling can be varied depending on ruggedness of the surface a very efficient way of representing topography result is TIN composed of nodes, lines and triangular faces Spatial interpolation frequently when using continuous data we wish to estimate values at specific locations which are not part of the point, line or area dataset these values must be determined from the surrounding values using techniques of spatial interpolation (see Units 40 and 41) e.g. to interpolate contours, a regular grid is often interpolated from an irregular scatter of points or densified from a sparse grid REFERENCES Burrough, P. A., 1986. Geographical Information Systems for Land Resources Assessment, Clarendon Press, Oxford. See chapter 2 for a review of database models. Dueker, K. J., 1987. "Geographic Information Systems and Computer-Aided Mapping," American Planning Association Journal, Summer 1987:383-390. Compares database models in GIS and computer mapping. Mark, D.M., 1978. "Concepts of Data Structure for Digital Terrain Models," Proceedings of the Digital Terrain Models (DTM) Symposium, ASP and ACSM, pp. 24-31. A comprehensive discussion of DEM database models. Marx, R. W., 1986. "The TIGER System: Automating the Geographic Structure of the United States Census," Government Publications Review 13:181-201. Issues in the selection of a database model for TIGER. Nyerges, T. L. and K. J. Dueker, 1988. Geographic Information Systems in Transportation, Federal Highway Administration, Division of Planning, Washington, D. C. Database model concerns in transportation applications of GIS. Peuquet, D.J., 1984. "A conceptual framework and comparison of spatial data models," Cartographica 21(4):66-113. An excellent review of the various spatial data models used in GIS. EXAM AND DISCUSSION QUESTIONS 1. How does a natural zone coverage differ from an enumeration zone coverage? Describe the differences in terms of (a) application areas, (b) visual appearance, (c) compilation of data. 2. Compare the various data models for elevation data. Which would you expect to be best for (a) a landscape dominated by fluvial erosion and dendritic drainage patterns, (b) a glaciated landscape, (c) a barometric weather map with fronts, (d) a map of population densities for North America. 3. What data models might be needed in a system to monitor oil spills and potential environmental damage to coastlines? Give examples of appropriate spatial objects and associated attributes. 4. Describe the differences between the data models commonly used in remote sensing, computer assisted design, automated cartography RELATIONSHIPS AMONG SPATIAL OBJECTS A. INTRODUCTION Three types of relationship B. EXAMPLES OF SPATIAL RELATIONSHIPS Point-point Point-line Point-area Line-line Line-area Area-area C. CODING RELATIONSHIPS AS ATTRIBUTES Example - "flows into" relationship Example - "is contained in" relationship D. OBJECT PAIRS E. CARTOGRAPHIC AND TOPOLOGICAL DATABASES Strict definition of "topological" Usage of "topological" in GIS F. PLANAR ENFORCEMENT Process Objective G. RELATIONSHIPS IN RASTER SYSTEMS REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES This final unit in the spatial databases module looks at the complex issue of relationships and how they can be coded. The important concept of planar enforcement, introduced here, is referred to several times in later units. UNIT 12 - RELATIONSHIPS AMONG SPATIAL OBJECTS Compiled with assistance from Gerald White, California State University, Sacramento A. INTRODUCTION there are a vast number of possible relationships in spatial data many are important in analysis e.g. "is contained in" relationship between a point and an area is important in relating objects to their surrounding environment e.g. "intersects" between two lines is important in analyzing routes through networks relationships can exist between entities of the same type or of different types e.g. for each shopping center, can find the nearest shopping center (same type) e.g. for each customer, can find the nearest shopping center (different types) Three types of relationship 1. relationships which are used to construct complex objects from simple primitives e.g. relationship between a line (chain) and the ordered set of points which defines it e.g. relationship between an area (polygon) and the ordered set of lines which defines it 2. relationships which can be computed from the coordinates of the objects e.g. two lines can be examined to see if they cross - the "crosses" relationship can be computed e.g. areas can be examined to see which one encloses a given point - the "is contained in" relationship can be computed e.g. areas can be examined to see if they overlap - the "overlaps" relationship 3. relationships which cannot be computed from coordinates - these must be coded in the database during input e.g. we can compute if two lines cross, but not if the highways they represent intersect (may be an overpass) some databases allow an entity called a "complex object", composed of "simple objects", e.g. objects representing "house", "lot", "cable", with associated attributes might be grouped together logically as "account" B. EXAMPLES OF SPATIAL RELATIONSHIPS Point-point "is within", e.g. find all of the customer points within 1 km of this retail store point "is nearest to", e.g. find the hazardous waste site which is nearest to this groundwater well Point-line "ends at", e.g. find the intersection at the end of this street "is nearest to", e.g. find the road nearest to this aircraft crash site Point-area "is contained in", e.g. find all of the customers located in this ZIP code boundary "can be seen from", e.g. determine if any of this lake can be seen from this viewpoint Line-line "crosses", e.g. determine if this road crosses this river "comes within", e.g. find all of the roads which come within 1 km of this railroad "flows into", e.g. find out if this stream flows into this river Line-area "crosses", e.g. find all of the soil types crossed by this railroad "borders", e.g. find out if this road forms part of the boundary of this airfield Area-area "overlaps", e.g. identify all overlaps between types of soil on this map and types of land use on this other map "is nearest to", e.g. find the nearest lake to this forest fire "is adjacent to", e.g. find out if these two areas share a common boundary C. CODING RELATIONSHIPS AS ATTRIBUTES in the database model we can visualize relationships as additional attributes Example - "flows into" relationship overhead - Coding relationships as attributes I option A: each stream link in a stream network could be given the ID of the downstream link which it flows into flow could be traced from link to link by following pointers option B alternatively the network could be coded as two sets of entities - links and nodes the links could "point" to their downstream node the nodes could "point" to the next downstream link Example - "is contained in" relationship overhead - Coding relationships as attributes II given: locations of 4 wells, with attributes of depth and flow wells lie in two different counties with attributes of population we wish to determine how much flow is available in each county: 1. find the containing county of each well (compute the "is contained in" relationship) store the result as a new attribute, County, of each well 2. using this revised attribute table, total flow by county and add results to the county table County Population Flow A 20,000 4,500 B 35,000 5,500 RELATIONSHIPS AMONG SPATIAL OBJECTS . OBJECT PAIRS distance is an attribute of a pair of objects there are other types of information which are similarly attributes of pairs of objects e.g. flow of commuters between a suburb and downtown e.g. trade between two countries e.g. flow of groundwater between a sink and a spring in some cases these attributes can be attached to an object linking the origin and destination objects e.g. on a map, trade can be an attribute of an arrow connecting the two countries thick arrows indicate strong trade however, such maps quickly become impossibly complex in general, it is necessary to allow for information which is not an attribute of any one object but of a pair of objects, including: distance connectedness - yes or no flow of goods, trade number of trips such attributes cannot necessarily be ascribed to any real object e.g. commuting flows between a suburb and downtown are not necessarily attributes of any set of links in the transport network e.g. flow of groundwater between a sink and a spring does not necessarily follow any aquifer or conduit these are attributes of object pairs object pairs are important in various kinds of spatial analysis using GIS attributes of object pairs can be thought of as tables which have one object as rows and the other object as columns with the values in each cell representing the value of the interaction between them are many different terms for the implementation of this concept - e.g. interaction matrix, turn table, Cartesian product E. CARTOGRAPHIC AND TOPOLOGICAL DATABASES Strict definition of "topological" if a map is stretched and distorted, some of its properties change, including: distances angles relative proximities other properties remain constant, including: adjacencies most other relationships, such as "is contained in", "crosses" types of spatial objects - areas remain areas, lines remain lines, points remain points strictly, topological properties are those which remain unchanged after distortion Usage of "topological" in GIS a spatial database is often called "topological" if one or more of the following relationships have been computed and stored connectedness of links at intersections ordered set of lines (chains) forming each polygon boundary adjacency relationships between areas unfortunately the precise meaning of the term has become distorted by use in general, "topological" implies that certain relationships are stored, making the data more useful for various kinds of spatial analysis by contrast, a database is called "cartographic" if the above conditions are absent objects can be manipulated individually relationships between them are unavailable or are considered unimportant cartographic databases are less useful for analysis of spatial data however they are satisfactory for simple mapping of data many packages designed for mapping only use cartographic database models a cartographic database can usually be converted to a topological database by computing relationships - the process of "building topology" through planar enforcement F. PLANAR ENFORCEMENT objects and their attributes are capable of describing the conditions existing on a map or in reality variation of a single property like soil type or elevation over a mapped area is achieved by including appropriate attributes for entity types e.g. elevation described by giving attributes to elevation points e.g. soil type described by giving attributes to areas in cases like soil type, the objects used to describe spatial variation must obey certain simple rules e.g. two areas cannot overlap e.g. every place must be within exactly one area, or on a boundary these rules are collectively referred to as planar enforcement a set of objects obeying these rules is said to be planar enforced planar enforcement is a very important operation in a vector GIS Process begin with a number of unrelated line segments imagine a number of limp spaghetti noodles lying on a table the following elements are now defined (terminology from the US Census Bureau for development of digital spatial database concepts): overhead - Planar enforcement a 0-cell (or node) is identified wherever two noodles cross or a noodle terminates i.e. all intersections are calculated 1-cell (or link, also "chain", "arc", "edge") is identified for each length of noodle between two consecutive 0-cells (nodes) a 2-cell (or area, also "face", "polygon") is defined for each group of consecutive 1-cells forming an enclosed area that does not contain any 1-cells that are not part of the boundary note that these definitions relate directly to the ordinary concept of dimensionality the results are: 0-cells are either isolated ("points") or adjacent to one or more 1-cells ("nodes") all 1-cells end in exactly two 0-cells each line segment (chain) between adjacent 0-cells is assigned to exactly one 1-cell all 1-cells lie between exactly two 2-cells every place on the "map" between noodles is assigned to a single 2-cell (the rest of the world is a 2- cell as well, often given the ID zero) Objective planar enforcement is used to build objects out of digitized lines (hence the phrase "building topology") it is a consistent and precise approach to the problem of making meaningful objects out of groups of lines simple rules can be used to correct some digitizing errors: a very short 1-cell terminating in a 1-valent 0-cell indicates an overshoot diagram a long 1-cell terminating in a 1-valent 0-cell very close to another 1-cell indicates an undershoot diagram planar enforcement is often needed when a set of data is being imported from another system e.g. if the source is a cartographic database and needs to have relationships computed e.g. if the database models of the two systems are incompatible, data is transferred as unrelated noodles, then objects are rebuilt planar enforcement must be applied one layer at a time planar enforcement concepts are built into many systems G. RELATIONSHIPS IN RASTER SYSTEMS in general, it is easier to work with relationships in vector systems the concept of object is not as natural for raster systems, which model the world as composed of pixels however, relationships can be handled in raster systems with simple techniques: overhead - Relationships in raster systems e.g. a map of county boundaries in one layer each pixel has a county code attribute which is an ID pointing to an entry in a county attribute table in a second layer each well location is coded by giving the appropriate pixel an ID pointing to a well attribute table the "is contained in" relationship can be computed by an overlay operation and stored as an additional column in the well attribute table only a few raster systems contain this type of capability to extract relationships into attribute tables most do not handle relationships between spatial objects REFERENCES Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment. Clarendon, Oxford. Chapter 2 describes objects, attribute tables and relationships. Goodchild, M.F., 1988. "Towards an enumeration and classification of GIS functions," Proceedings, IGIS ''87. NASA, Washington DC 2:67-77. Defines and discusses object pairs. Keating, T., W. Phillips and K. Ingram, 1987. "An integrated topologic database design for geographic information systems," Photogrammetric Engineering and Remote Sensing Vol. 53. Good discussion of topological and cartographic database models. EXAM AND DISCUSSION QUESTIONS 1. Discuss the use of planar enforcement for street networks, and the problems presented by overpasses and underpasses. Can you modify the basic rules to maintain consistency but allow for such instances? 2. What additional examples of relationships can you devise in each of the six categories used in section B? 3. Why have designers of raster GIS not commonly devised ways of coding spatial relationships between objects in their systems? Is this likely to change in the future, and if so, why? 4. "Topology is what distinguishes GIS from automated cartography". Discuss. |