| |||
Vector Comcepts DEM Catalogue SRTM SRTM USGS DEMs Database concepts Why choose a database approach? Database models Creating a database GIS database applications Developments in databases Data input and editing Methods of data input Data editing Towards an integrated database Data analysis Learning outcomes Introduction Measurements in GIS – lengths, perimeters and areas Queries Reclassification Buffering and neighbourhood functions Integrating data – map overlay Spatial interpolation Analysis of surfaces Network analysis Analytical modelling in GIS Process models Modelling physical and environmental processes Modelling human processes Modelling the decision-making process Problems with using GIS to model spatial processes Output: from new maps to enhanced decisions Maps as output Non-cartographic output Spatial multimedia Mechanisms of delivery GIS and spatial decision support GIS DATA MODELS Data Model SPATIAL OBJECTS AND DATABASE MODELS A. INTRODUCTION B. POINT DATA C. LINE DATA Network entities Network characteristics Attributes Networks as linear addressing systems D. AREA DATA 1. Environmental/natural resource zones 2. Socio-economic zones 3. Land records Areal coverage Holes and islands E. REPRESENTATION OF CONTINUOUS SURFACES General nature of surfaces Data structures for representing surfaces Spatial interpolation REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES This unit continues the development of basic concepts about representing reality as spatial data. Here we look at how the representation of reality in the form of entities is handled with the spatial objects points, lines and areas. UNIT 11 - SPATIAL OBJECTS AND DATABASE MODELS Compiled with assistance from Timothy L. Nyerges, University of Washington A. INTRODUCTION the objects in a spatial database are representations of real-world entities with associated attributes the power of a GIS comes from its ability to look at entities in their geographical context and examine relationships between entities thus a GIS database is much more than a collection of objects and attributes in this unit we look at the ways a spatial database can be assembled from simple objects e.g. how are lines linked together to form complex hydrologic or transportation networks e.g. how can points, lines or areas be used to represent more complex entities like surfaces? B. POINT DATA the simplest type of spatial object choice of entities which will be represented as points depends on the scale of the map/study e.g. on a large scale map - encode building structures as point locations e.g. on a small scale map - encode cities as point locations the coordinates of each point can be stored as two additional attributes information on a set of points can be viewed as an extended attribute table each row is a point - all information about the point is contained in the row each column is an attribute two of the columns are the coordinates overhead - Point data attribute table here northing and easting represent y and x coordinates each point is independent of every other point, represented as a separate row in the database model C. LINE DATA Network entities infrastructure networks transportation networks - highways and railroads utility networks - gas, electric, telephone, water airline networks - hubs and routes natural networks river channels Network characteristics a network is composed of: nodes - junctions, ends of dangling lines links - chains in the database model diagram valency of a node is the number of links at the node ends of dangling lines are "1-valent" 4-valent nodes are most common in street networks 3-valent nodes are most common in hydrology a tree network has only one path between any pair of nodes, no loops or circuits are possible most river networks are trees Attributes examples of link attributes: direction of traffic, volume of traffic, length, number of lanes, time to travel along link diameter of pipe, direction of gas flow voltage of electrical transmission line, height of towers number of tracks, number of trains, gradient, width of most narrow tunnel, load bearing capacity of weakest bridge examples of node attributes: presence of traffic lights, presence of overpass, names of intersecting streets presence of shutoff valves, transformers note that some attributes (e.g. names of intersecting streets) link one type of entity to another (nodes to links) some attributes are associated with parts of network links e.g. part of a railroad link between two junctions may be inside a tunnel e.g. part of a highway link between two junctions may need pavement maintenance many GIS systems require such attributes to be attached to the network by splitting existing links and creating new nodes e.g. split a street link at the house and attach the attributes of the house to the new (2-valent) node e.g. create a new link for the stretch of railroad which lies inside the tunnel, plus 2 new nodes this requirement can lead to impossibly large numbers of links and 2-valent nodes e.g. at a scale of 1:100,000, the US rail network has about 300,000 links the number of links would increase by orders of magnitude if new nodes had to be defined in order to locate bridges on links Networks as linear addressing systems often need to use the network as an addressing system, e.g. street network address matching is the process of locating a house on a street network from its street address e.g. if it is known that the block contains houses numbers from 100 to 198, house #124 would probably be 1/4 of the way along that link points can be located on the network by link number and distance from beginning of link this can be more useful than the (x,y) coordinates of points since it links the points to a location on the network this approach provides an answer to the problem of assigning attributes to parts of links keep such entities (houses, tunnels) in separate tables, link them to the network by link number and distance from beginning of link need one distance for point entities, two for extended entities like tunnels (start and end locations) the GIS can then compute the (x,y) coordinates of the entities if needed links need not be permanently split in this scheme D. AREA DATA is represented on area class maps, choropleth maps boundaries may be defined by natural phenomena, e.g. lake, or by man, e.g. forest stands, census zones there are several types of areas that can be represented 1. Environmental/natural resource zones examples include land cover data - forests, wetlands, urban geological data - rock types forestry data - forest "stands", "compartments" soil data - soil types boundaries are defined by the phenomenon itself e.g. changes of soil type almost all junctions are 3-valent 2. Socio-economic zones includes census tracts, ZIP codes, etc. boundaries defined independently of the phenomenon, then attribute values are enumerated boundaries may be culturally defined, e.g. neighborhoods 3. Land records land parcel boundaries, land use, land ownership, tax information Areal coverage overhead - Areal coverage 1. entities are isolated areas, possibly overlapping any place can be within any number of entities, or none e.g. areas burned by forest fires areas do not exhaust the space 2. any place is within exactly one entity areas exhaust the space every boundary line separates exactly two areas, except for the outer boundary of the mapped area areas may not overlap any layer of the first type can be converted to one of the second type each area may now have any number of fire attributes, depending on how many times it has been burned - unburned areas will have none Holes and islands areas often have "holes" or areas of different attributes wholly enclosed within them diagram the database must be able to deal with these correctly this has not always been true of GIS products cases can be complex, for example: Lake Huron is a "hole" in the North American landmass Manitoulin Island is a "hole" in Lake Huron Manitoulin Island has several large lakes, including one which is the largest lake on an island in a lake anywhere some of these lakes have islands in them some systems allow area entities to have islands more than one primitive single-boundary area can be grouped into an area object e.g. the area served by a school or shopping center may have more than one island, but only one set of attributes diagram SPATIAL OBJECTS AND DATABASE MODELS . REPRESENTATION OF CONTINUOUS SURFACES examples of continuous surfaces are: elevation (as part of topographic data) rainfall, pressure, temperature population density potential must exist for sampling observations everywhere on an interval/ratio level General nature of surfaces critical points peaks and pits - highest and lowest points ridge lines, valley bottoms - lines across which slope reverses suddenly passes - convergence of 2 ridges and 2 valleys faults - sharp discontinuities of elevation - cliffs fronts - sharp discontinuities of slope slopes and aspects can be derived from elevations Data structures for representing surfaces traditional data models do not have a method for representing surfaces therefore, surfaces are represented by the use of points, lines or areas note: the following series of three overheads on Tiefort Mountains all represent the same area 1. points - grid of elevations overhead - Elevation represented as points DEM or Digital Elevation Model based on sampling the elevation surface at regular intervals result is a matrix of points much digital elevation data available in this form 2. lines - digitized contours overhead - Elevation represented as lines from DLG hypsography layer, identical to those on the printed map, plotted directly from stereo photography based on string object type a line connecting sampled points of equal elevation elevation is attribute could be done for rainfall, barometric pressure etc. 3. areas - TIN (Triangulated irregular network) overhead - Triangulation of a terrain surface overhead - Elevation represented as areas note: perspective diagram is developed from the triangulated surface (TIN created by M.P. Kumler, USGS) sample points often located at peaks, pits, along ridges and valleys sampling can be varied depending on ruggedness of the surface a very efficient way of representing topography result is TIN composed of nodes, lines and triangular faces Spatial interpolation frequently when using continuous data we wish to estimate values at specific locations which are not part of the point, line or area dataset these values must be determined from the surrounding values using techniques of spatial interpolation (see Units 40 and 41) e.g. to interpolate contours, a regular grid is often interpolated from an irregular scatter of points or densified from a sparse grid REFERENCES Burrough, P. A., 1986. Geographical Information Systems for Land Resources Assessment, Clarendon Press, Oxford. See chapter 2 for a review of database models. Dueker, K. J., 1987. "Geographic Information Systems and Computer-Aided Mapping," American Planning Association Journal, Summer 1987:383-390. Compares database models in GIS and computer mapping. Mark, D.M., 1978. "Concepts of Data Structure for Digital Terrain Models," Proceedings of the Digital Terrain Models (DTM) Symposium, ASP and ACSM, pp. 24-31. A comprehensive discussion of DEM database models. Marx, R. W., 1986. "The TIGER System: Automating the Geographic Structure of the United States Census," Government Publications Review 13:181-201. Issues in the selection of a database model for TIGER. Nyerges, T. L. and K. J. Dueker, 1988. Geographic Information Systems in Transportation, Federal Highway Administration, Division of Planning, Washington, D. C. Database model concerns in transportation applications of GIS. Peuquet, D.J., 1984. "A conceptual framework and comparison of spatial data models," Cartographica 21(4):66-113. An excellent review of the various spatial data models used in GIS. EXAM AND DISCUSSION QUESTIONS 1. How does a natural zone coverage differ from an enumeration zone coverage? Describe the differences in terms of (a) application areas, (b) visual appearance, (c) compilation of data. 2. Compare the various data models for elevation data. Which would you expect to be best for (a) a landscape dominated by fluvial erosion and dendritic drainage patterns, (b) a glaciated landscape, (c) a barometric weather map with fronts, (d) a map of population densities for North America. 3. What data models might be needed in a system to monitor oil spills and potential environmental damage to coastlines? Give examples of appropriate spatial objects and associated attributes. 4. Describe the differences between the data models commonly used in remote sensing, computer assisted design, automated cartography and GIS. RELATIONSHIPS AMONG SPATIAL OBJECTS INTRODUCTION Three types of relationship B. EXAMPLES OF SPATIAL RELATIONSHIPS Point-point Point-line Point-area Line-line Line-area Area-area C. CODING RELATIONSHIPS AS ATTRIBUTES Example - "flows into" relationship Example - "is contained in" relationship D. OBJECT PAIRS E. CARTOGRAPHIC AND TOPOLOGICAL DATABASES Strict definition of "topological" Usage of "topological" in GIS F. PLANAR ENFORCEMENT Process Objective G. RELATIONSHIPS IN RASTER SYSTEMS REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES This final unit in the spatial databases module looks at the complex issue of relationships and how they can be coded. The important concept of planar enforcement, introduced here, is referred to several times in later units. UNIT 12 - RELATIONSHIPS AMONG SPATIAL OBJECTS Compiled with assistance from Gerald White, California State University, Sacramento RELATIONSHIPS AMONG SPATIAL OBJECTS there are a vast number of possible relationships in spatial data many are important in analysis e.g. "is contained in" relationship between a point and an area is important in relating objects to their surrounding environment e.g. "intersects" between two lines is important in analyzing routes through networks relationships can exist between entities of the same type or of different types e.g. for each shopping center, can find the nearest shopping center (same type) e.g. for each customer, can find the nearest shopping center (different types) Three types of relationship 1. relationships which are used to construct complex objects from simple primitives e.g. relationship between a line (chain) and the ordered set of points which defines it e.g. relationship between an area (polygon) and the ordered set of lines which defines it 2. relationships which can be computed from the coordinates of the objects e.g. two lines can be examined to see if they cross - the "crosses" relationship can be computed e.g. areas can be examined to see which one encloses a given point - the "is contained in" relationship can be computed e.g. areas can be examined to see if they overlap - the "overlaps" relationship 3. relationships which cannot be computed from coordinates - these must be coded in the database during input e.g. we can compute if two lines cross, but not if the highways they represent intersect (may be an overpass) some databases allow an entity called a "complex object", composed of "simple objects", e.g. objects representing "house", "lot", "cable", with associated attributes might be grouped together logically as "account" B. EXAMPLES OF SPATIAL RELATIONSHIPS Point-point "is within", e.g. find all of the customer points within 1 km of this retail store point "is nearest to", e.g. find the hazardous waste site which is nearest to this groundwater well Point-line "ends at", e.g. find the intersection at the end of this street "is nearest to", e.g. find the road nearest to this aircraft crash site Point-area "is contained in", e.g. find all of the customers located in this ZIP code boundary "can be seen from", e.g. determine if any of this lake can be seen from this viewpoint Line-line "crosses", e.g. determine if this road crosses this river "comes within", e.g. find all of the roads which come within 1 km of this railroad "flows into", e.g. find out if this stream flows into this river Line-area "crosses", e.g. find all of the soil types crossed by this railroad "borders", e.g. find out if this road forms part of the boundary of this airfield Area-area "overlaps", e.g. identify all overlaps between types of soil on this map and types of land use on this other map "is nearest to", e.g. find the nearest lake to this forest fire "is adjacent to", e.g. find out if these two areas share a common boundary RELATIONSHIPS AMONG SPATIAL OBJECTS CODING RELATIONSHIPS AS ATTRIBUTES in the database model we can visualize relationships as additional attributes Example - "flows into" relationship overhead - Coding relationships as attributes I option A: each stream link in a stream network could be given the ID of the downstream link which it flows into flow could be traced from link to link by following pointers option B alternatively the network could be coded as two sets of entities - links and nodes the links could "point" to their downstream node the nodes could "point" to the next downstream link Example - "is contained in" relationship overhead - Coding relationships as attributes II given: locations of 4 wells, with attributes of depth and flow wells lie in two different counties with attributes of population we wish to determine how much flow is available in each county: 1. find the containing county of each well (compute the "is contained in" relationship) store the result as a new attribute, County, of each well 2. using this revised attribute table, total flow by county and add results to the county table County Population Flow A 20,000 4,500 B 35,000 5,500 D. OBJECT PAIRS distance is an attribute of a pair of objects there are other types of information which are similarly attributes of pairs of objects e.g. flow of commuters between a suburb and downtown e.g. trade between two countries e.g. flow of groundwater between a sink and a spring in some cases these attributes can be attached to an object linking the origin and destination objects e.g. on a map, trade can be an attribute of an arrow connecting the two countries thick arrows indicate strong trade however, such maps quickly become impossibly complex in general, it is necessary to allow for information which is not an attribute of any one object but of a pair of objects, including: distance connectedness - yes or no flow of goods, trade number of trips such attributes cannot necessarily be ascribed to any real object e.g. commuting flows between a suburb and downtown are not necessarily attributes of any set of links in the transport network e.g. flow of groundwater between a sink and a spring does not necessarily follow any aquifer or conduit these are attributes of object pairs object pairs are important in various kinds of spatial analysis using GIS attributes of object pairs can be thought of as tables which have one object as rows and the other object as columns with the values in each cell representing the value of the interaction between them are many different terms for the implementation of this concept - e.g. interaction matrix, turn table, Cartesian product E. CARTOGRAPHIC AND TOPOLOGICAL DATABASES Strict definition of "topological" if a map is stretched and distorted, some of its properties change, including: distances angles relative proximities other properties remain constant, including: adjacencies most other relationships, such as "is contained in", "crosses" types of spatial objects - areas remain areas, lines remain lines, points remain points strictly, topological properties are those which remain unchanged after distortion Usage of "topological" in GIS a spatial database is often called "topological" if one or more of the following relationships have been computed and stored connectedness of links at intersections ordered set of lines (chains) forming each polygon boundary adjacency relationships between areas unfortunately the precise meaning of the term has become distorted by use in general, "topological" implies that certain relationships are stored, making the data more useful for various kinds of spatial analysis by contrast, a database is called "cartographic" if the above conditions are absent objects can be manipulated individually relationships between them are unavailable or are considered unimportant cartographic databases are less useful for analysis of spatial data however they are satisfactory for simple mapping of data many packages designed for mapping only use cartographic database models a cartographic database can usually be converted to a topological database by computing relationships - the process of "building topology" through planar enforcement F. PLANAR ENFORCEMENT objects and their attributes are capable of describing the conditions existing on a map or in reality variation of a single property like soil type or elevation over a mapped area is achieved by including appropriate attributes for entity types e.g. elevation described by giving attributes to elevation points e.g. soil type described by giving attributes to areas in cases like soil type, the objects used to describe spatial variation must obey certain simple rules e.g. two areas cannot overlap e.g. every place must be within exactly one area, or on a boundary these rules are collectively referred to as planar enforcement a set of objects obeying these rules is said to be planar enforced planar enforcement is a very important operation in a vector GIS Process begin with a number of unrelated line segments imagine a number of limp spaghetti noodles lying on a table the following elements are now defined (terminology from the US Census Bureau for development of digital spatial database concepts): overhead - Planar enforcement a 0-cell (or node) is identified wherever two noodles cross or a noodle terminates i.e. all intersections are calculated 1-cell (or link, also "chain", "arc", "edge") is identified for each length of noodle between two consecutive 0-cells (nodes) a 2-cell (or area, also "face", "polygon") is defined for each group of consecutive 1-cells forming an enclosed area that does not contain any 1-cells that are not part of the boundary note that these definitions relate directly to the ordinary concept of dimensionality the results are: 0-cells are either isolated ("points") or adjacent to one or more 1-cells ("nodes") all 1-cells end in exactly two 0-cells each line segment (chain) between adjacent 0-cells is assigned to exactly one 1-cell all 1-cells lie between exactly two 2-cells every place on the "map" between noodles is assigned to a single 2-cell (the rest of the world is a 2- cell as well, often given the ID zero) Objective planar enforcement is used to build objects out of digitized lines (hence the phrase "building topology") it is a consistent and precise approach to the problem of making meaningful objects out of groups of lines simple rules can be used to correct some digitizing errors: a very short 1-cell terminating in a 1-valent 0-cell indicates an overshoot diagram a long 1-cell terminating in a 1-valent 0-cell very close to another 1-cell indicates an undershoot diagram planar enforcement is often needed when a set of data is being imported from another system e.g. if the source is a cartographic database and needs to have relationships computed e.g. if the database models of the two systems are incompatible, data is transferred as unrelated noodles, then objects are rebuilt planar enforcement must be applied one layer at a time planar enforcement concepts are built into many systems G. RELATIONSHIPS IN RASTER SYSTEMS in general, it is easier to work with relationships in vector systems the concept of object is not as natural for raster systems, which model the world as composed of pixels however, relationships can be handled in raster systems with simple techniques: overhead - Relationships in raster systems e.g. a map of county boundaries in one layer each pixel has a county code attribute which is an ID pointing to an entry in a county attribute table in a second layer each well location is coded by giving the appropriate pixel an ID pointing to a well attribute table the "is contained in" relationship can be computed by an overlay operation and stored as an additional column in the well attribute table only a few raster systems contain this type of capability to extract relationships into attribute tables most do not handle relationships between spatial objects REFERENCES Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment. Clarendon, Oxford. Chapter 2 describes objects, attribute tables and relationships. Goodchild, M.F., 1988. "Towards an enumeration and classification of GIS functions," Proceedings, IGIS ''87. NASA, Washington DC 2:67-77. Defines and discusses object pairs. Keating, T., W. Phillips and K. Ingram, 1987. "An integrated topologic database design for geographic information systems," Photogrammetric Engineering and Remote Sensing Vol. 53. Good discussion of topological and cartographic database models. EXAM AND DISCUSSION QUESTIONS 1. Discuss the use of planar enforcement for street networks, and the problems presented by overpasses and underpasses. Can you modify the basic rules to maintain consistency but allow for such instances? 2. What additional examples of relationships can you devise in each of the six categories used in section B? 3. Why have designers of raster GIS not commonly devised ways of coding spatial relationships between objects in their systems? Is this likely to change in the future, and if so, why? 4. "Topology is what distinguishes GIS from automated cartography". Discuss. THE VECTOR OR OBJECT GIS INTRODUCTION Vector data model B. "ARCS" Storing areas C. DATABASE CREATION Building topology Editing Relationship between digitizing and editing Edgematching D. ADDING ATTRIBUTES E. EXAMPLE ANALYSIS USING VECTOR GIS Objective Procedure Result REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES This unit begins a two part introduction to vector GIS. We have placed these units here since we feel this discussion benefits from an understanding of the previous introduction to spatial data concepts in Units 10 to 12. However, with a little revision, it is possible to move this module so that it follows Units 4 and 5 on raster GIS. UNIT 13 - THE VECTOR OR OBJECT GIS Compiled with assistance from Holly Dickinson, State University of New York at Buffalo THE VECTOR OR OBJECT GIS INTRODUCTION Vector data model based on vectors (as opposed to space-occupancy raster structures) fundamental primitive is a point objects are created by connecting points with straight lines some systems allow points to be connected using arcs of circles areas are defined by sets of lines the term polygon is synonymous with area in vector databases because of the use of straight-line connections between points very large vector databases have been built for different purposes vector tends to dominate in transportation, utility, marketing applications raster and vector both used in resource management applications B. "ARCS" when planar enforcement is used, area objects in one class or layer cannot overlap and must exhaust the space of a layer every piece of boundary line is a common boundary between two areas the stretch of common boundary between two junctions (nodes) has various names edge is favored by graph theorists, "vertex" for the junctions chain is the word officially sanctioned by the US National Standard arc is used by several systems arcs have attributes which identify the polygons on either side these are referred to as "left" and "right" by reference to the sequence in which the arc is coded arcs (chains/edges) are fundamental in vector GIS Storing areas two ways of storing areas: polygon storage every polygon is stored as a sequence of coordinates although most boundaries are shared between two adjacent areas, all are input and coded twice, once for each adjacent polygon the two different versions of each internal boundary line may not coincide difficult to do certain operations, e.g. dissolve boundaries between neighboring areas and merge them used in some current GISs, many automated mapping packages arc storage every arc is stored as a sequence of coordinates areas are built by linking arcs only one version of each internal shared boundary is input and stored used in most current vector-based GISs C. DATABASE CREATION database creation involves several stages: input of the spatial data input of the attribute data linking spatial and attribute data spatial data is entered via digitized points and lines, scanned and vectorized lines or directly from other digital sources once the spatial data has been entered, much work is still needed before it can be used Building topology once points are entered and geometric lines are created, topology must be "built" this involves calculating and encoding relationships between the points, lines and areas this information may be automatically coded into tables of information in the database Editing during this topology generation process, problems such as overshoots, undershoots and spikes are either flagged for editing by the user or corrected automatically automatic editing involves the use of a tolerance value which defines the width of a buffer zone around objects within which adjacent objects should be joined tolerance value is related to the precision with which locations can be digitized these edit procedures include such functions as snap, move, delete, split, join, etc. Relationship between digitizing and editing digitizing and editing are complementary activities poor digitizing leads to much need for editing good digitizing can avoid most need for editing both can be very labor-intensive the process used to digitize area objects can affect the need for later editing: in "blind" digitizing all linework is digitized once as "noodles" in any order it is unlikely that the building and cleaning operations will be able to automatically sort out area objects unambiguously from the resulting jumble some systems require the user to identify junctions between digitized "noodles" explicitly usually by touching a special button on the cursor mistakes in building topology are less likely some systems require the user to digitize each individual arc/chain separately much easier to sort out polygons - less need for editing some systems support the building of topology "on the fly" the system searches constantly for complete area objects as digitizing proceeds the user is informed by a sound or by blinking as soon as the object is detected Edgematching compares and adjusts features along the edges of adjacent map sheets some edgematches merely move objects into alignment others "join" the pieces together logically - conceptually they become one object the user "sees" no interruption an edgematched database is "seamless" - the sheet edges have disappeared as far as the user is concerned THE VECTOR OR OBJECT GIS ADDING ATTRIBUTES once the objects have been formed by building topology, attributes can be keyed in or imported from other digital databases once added to the database, attributes must be linked to the different objects attributes can be linked by pointing to the appropriate object on the screen and coding its corresponding object ID into the attribute table unlike many raster GIS systems, attribute data is stored and manipulated in entirely separate ways from the locational data E. EXAMPLE ANALYSIS USING VECTOR GIS compare with example analysis in Unit 4 (The Raster GIS) Objective identify areas suitable for logging an area is suitable if it satisfies the following criteria: is Jack pine (Black Spruce are not valuable) is well drained (poorly drained and waterlogged terrain cannot support equipment, logging causes unacceptable environmental damage) is not within 500 m of a lake or watercourse (erosion may cause deterioration of water quality) Procedure database consists of three layers note: polygons do not entirely fill the space in each case hence, areas not included fall in polygon ID 0 buffer hydrography out to 500 m merge buffer and lake extract Jack pine polygons (species = Jack pine) extract drained soil polygons (drainage = 2, therefore soil = A) overlay buffer, Jack pine and soil polygons build topology extract polygons not in the buffer but in others (buffer = n, Jack pine = y, drainage = y) Result loggable area shown in final map REFERENCES Beard, M.V. and N.R. Chrisman, 1988. "Zipping: a locational approach to edgematching," The American Cartographer 15:163-72. Describes a solution to the edgematching problem. Chrisman, N.R., 1990. "Deficiencies of sheets and tiles: building sheetless databases," International Journal of Geographical Information Systems 4:157-67. A more general discussion of building edgematched databases. ESRI, 1990. Understanding GIS: The ARC/INFO Way, ESRI, Redlands, CA. A general introductory tutorial for ARC/INFO, a well-known contemporary GIS. Tomlinson, R.F., H.W. Calkins and D.F. Marble, 1976. Computer Handling of Geographical Data. UNESCO Press, Paris. Excellent semi-technical description of CGIS, an early vector-based system. EXAM AND DISCUSSION QUESTIONS 1. List and describe the processes involved in constructing a vector database by digitizing maps. 2. By using simple sketches, describe and illustrate typical problem cases which lead to difficulties in building area- object topology in a vector database, and the strategies which various GISs use to minimize editing effort. 3. Discuss the applications of GIS, in relation to the vector data model. Give examples of cases where the model would be particularly inappropriate in comparison with raster. 4. Why did the designers of CGIS choose a vector data model, and yet use scanning as the major method VECTOR GIS CAPABILITIES INTRODUCTION B. SIMPLE DISPLAY AND QUERY Display Standard Query Language (SQL) Boolean operators SQL extensions for spatial queries C. RECLASSIFY, DISSOLVE AND MERGE Steps Forestry example City zoning example D. TOPOLOGICAL OVERLAY Point in polygon Line on polygon Polygon on polygon ("Polygon overlay") Example Spurious polygons E. BUFFERING REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES VECTOR GIS CAPABILITIES INTRODUCTION analysis functions with vector GIS are not quite the same as with raster GIS more operations deal with objects measures such as area have to be calculated from coordinates of objects, instead of counting cells some operations are more accurate estimates of area based on polygons more accurate than counts of pixels estimates of perimeter of polygon more accurate than counting pixel boundaries on the edge of a zone some operations are slower e.g. overlaying layers, finding buffers some operations are faster e.g. finding path through road network B. SIMPLE DISPLAY AND QUERY Display using points and "arcs" can display the locations of all objects stored attributes and entity types can be displayed by varying colors, line patterns and point symbols may only want to display a subset of the data e.g. want to display areas of urban landuse with some base map data select all political boundaries and highways, but only areas that had urban land uses how would the user do this? e.g. one of the layers in a database is a "map" of land use, called USE area objects on this layer have several attributes one attribute, called CLASS, identifies the area''s land use for urban land use, it has the value "U" need to extract boundaries for all areas that have CLASS="U" Standard Query Language (SQL) different systems use different ways of formulating queries Standard Query Language (SQL) is used by many systems SQL phrase structure: SELECT <attribute name(s)> FROM <table> WHERE <condition statement> e.g. SELECT FROM USE WHERE CLASS="U" this selects only the objects for display - no attributes are retrieved by the query SQL examples using a list of student names: SELECT name FROM list (selects all names) SELECT name FROM list WHERE grade = "A" (selects names of students receiving an "A") SELECT name FROM list WHERE cumgrade > 3.0 (selects names of students with a cumulative gpa greater than 3.0) SQL operators: relational: >, <, =, >=, <= arithmetic: =, -, *, / (only on numeric fields) Boolean: and, or, not Boolean operators used to combine conditions e.g. WHERE cumgrade > 3.0 AND grade = "A" (selects students satisfying both conditions only) Boolean operators can have a spatial meaning in GIS as well e.g. when two maps are overlayed, areas (polygons) that are superimposed have the "and" condition a spatial representation is used to illustrate Boolean operators in the study of logic, through the use of diagrams called Venn diagrams thus GIS area overlay is a geographical instance of a Venn diagram "XOR" is the "exclusive or" - A xor B means A or B but not both SQL extensions for spatial queries some systems allow specifically spatial queries to be handled under SQL e.g. WITHIN operator SELECT <objects> WITHIN <specific area> the criteria for these spatial searches may include searching within the radius of a point, within a bounding rectangle, or within an irregular polygon C. RECLASSIFY, DISSOLVE AND MERGE reclassify, dissolve and merge operations are used frequently in working with area objects these are used to aggregate areas based on attributes consider a soils map: we wish to produce a map of major soil types from a layer that has polygons based on much more finely defined classification scheme Steps 1. reclassify areas by a single attribute or some combination e.g. reclassify soil areas by soil type only 2. dissolve boundaries between areas of same type by delete the arc between two polygons if the relevant attributes are the same in both polygons 3. merge polygons into large objects recode the sequence of line segments that connect to form the boundary (i.e. rebuild topology) assign new ID #''s to each new object Forestry example consider a forestry GIS where the forest is divided into "stands", average size 10 ha: each stand carries a list of attributes, including tree species and average tree age attributes apply homogeneously to area of each stand boundary occurs between stands whenever at least one attribute changes problem: identify all cuttable areas of white spruce assign new attribute "cuttable" to each stand value = "y" if white spruce AND age > 50 years value = "n" otherwise after assigning new attribute, all others can be dropped now wish to identify cuttable areas, each may be merger of several individual stands dissolve boundaries between polygons with same value of "cuttable" attribute merge polygons into larger objects City zoning example need to know how many individual landuse zones have been created in the city and how these are distributed geographically each land parcel in the city has a zoning attribute attached to it dissolve boundaries between parcels if the zoning is the same result can be a map showing large areas of similar zoning classes VECTOR GIS CAPABILITIES TOPOLOGICAL OVERLAY suppose individual layers have planar enforcement (required in many systems, not all) when two layers are combined ("overlayed", "superimposed") the result must have planar enforcement as well new intersection must be calculated and created wherever two lines cross a line across an area object creates two new area objects topological overlay is the general name for overlay followed by planar enforcement relationships are updated for the new, combined map result may be information about relationships (new attributes) for the old (input) maps rather than the creation of new objects e.g. overlay map of school districts on census tracts result is map showing every school district/census tract combination for each combination, the database contains an area object however, concern may be with obtaining the number of overlapping census tracts as a new attribute of each school district rather than with new objects themselves Point in polygon overlay point objects on areas, compute "is contained in" relationship result is a new attribute for each point e.g. combine wells and planning districts, find district containing each well Line on polygon overlay line objects on area objects, compute "is contained in" relationship lines are broken at each area object boundary number of output lines is greater than number of input lines containing area is new attribute of each output line e.g. combine streams and counties, find county containing each stream segment Polygon on polygon ("Polygon overlay") overlay two layers of area objects boundaries are broken at each intersection number of output areas likely greater than the total number of input areas e.g. input watershed boundaries, county boundaries, output map of watershed/county combinations after overlay we can recreate either of the input layers by dissolving and merging based on the attributes contributed by the input layer Example wish to use find those areas that are the best land for timber harvesting after overlay, each original layer contributes attributes to the combined layer we get the final map by selecting the desired attributes of the combined layer SELECT FROM OVERLAY WHERE Species = "Jack pine" AND Soil = "C" Spurious polygons during polygon overlay, many new and smaller polygons are created, some of which may not represent true spatial variations the small, invalid polygons are called spurious or sliver polygons and can be a major problem in polygon overlay spurious polygons arise when two lines are overlaid which are actually slightly different versions of the same line if the same line occurs on two input maps, the digitized versions may be slightly different in many cases the lines on the source maps have been compiled from different sources, but are nevertheless the same line on the ground e.g. a road may be part of a county boundary, also the boundary between two fields or two soil types or two vegetation types the problem cannot be removed by more careful digitizing - more points simply leads to more slivers some GISs allow the user to set a tolerance value for deleting spurious polygons during overlay operations if the tolerance is set too high, some legitimate polygons may be deleted if set too low, some erroneous polygons will remain deletion rules might also be based on shape, as spurious polygons tend to be long and thin E. BUFFERING a buffer can be constructed around a point, line or area buffering creates a new area, enclosing the buffered object applications in transportation, forestry, resource management protected zone around lakes and streams zone of noise pollution around highways service zone around bus route (e.g. 300 m walking distance) groundwater pollution zone around waste site options available for raster, such as a "friction" layer, do not exist for vector buffering is much more difficult in vector from the point of view of the programmer sometimes, width of the buffer can be determined by an attribute of the object e.g. buffering residential buildings away from a street network: three types of street (1, 2, 3 or major, secondary, tertiary) with the setbacks being 600 feet from a major street, 200 feet from a secondary street, and only 100 feet from a tertiary street problems with buffer operations may occur when buffering very convoluted lines or areas REFERENCES Documentation for ARC/INFO (user manuals, Understanding GIS) provides an overview of vector GIS functionality for a commonly available system. Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment, Clarendon, Oxford. Chapter 5 on data analysis. Lusardi, Frank, 1988. The Database Expert''s Guide to SQL, McGraw-Hill Book Co., New York. Good introduction to Standard Query Language. EXAM AND DISCUSSION QUESTIONS 1. Compare the buffer function in raster and vector systems, in terms of results, options offered by systems, and flexibility. 2. The skeleton function moves the boundary of an area object inwards rather than outwards. Show how this would work using a simple diagram, and what happens as the amount of movement increases. What objects might be created by this operation, and what uses can you devise for them? 3. What are spurious polygons and what characteristics of data cause them? 4. Section D listed three types of topological overlay between points, lines and areas. Are there others? What applications might they have? SPATIAL RELATIONSHIPS IN SPATIAL ANALYSIS INTRODUCTION Review B. ANALYSIS OF ONE CLASS OF OBJECTS Using attributes Using locational information C. ANALYSIS OF OBJECT PAIRS D. ANALYSIS OF MORE THAN ONE CLASS OF OBJECTS Shortest path example What spatial objects are required? Spatial interaction example E. ANALYSIS WHICH DEFINES NEW OBJECTS Buffer example Street noise example Trade area example Polygon overlay example F. GIS ANALYSIS FUNCTIONS Measure Coordinate transformation Generate objects Select a subset of objects Modify attributes of objects Dissolve and merge area objects Generalize or smooth lines Compute statistics for a set of objects Topological overlay Operations on surfaces Network analysis Input and output management REFERENCES SPATIAL RELATIONSHIPS IN SPATIAL ANALYSIS Review B. ANALYSIS OF ONE CLASS OF OBJECTS Using attributes Using locational information C. ANALYSIS OF OBJECT PAIRS D. ANALYSIS OF MORE THAN ONE CLASS OF OBJECTS Shortest path example What spatial objects are required? Spatial interaction example E. ANALYSIS WHICH DEFINES NEW OBJECTS Buffer example Street noise example Trade area example Polygon overlay example F. GIS ANALYSIS FUNCTIONS Measure Coordinate transformation Generate objects Select a subset of objects Modify attributes of objects Dissolve and merge area objects Generalize or smooth lines Compute statistics for a set of objects Topological overlay Operations on surfaces Network analysis Input and output management REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES Having established a basis of the fundamental concepts in GIS data structures, this unit begins a large module looking at how GIS can be used. First we look at how spatial relationships can be analyzed and then present a summary of the range of functions that fall within present GIS capabilities. UNIT 15 - SPATIAL RELATIONSHIPS IN SPATIAL ANALYSIS A. INTRODUCTION Review the types of spatial objects are points, lines, areas, raster cells (Unit 10) these object types are digital representations of phenomena are defined by their dimensionality are sometimes further subdivided e.g. line objects are divided into chains, strings etc. in DCDSTF an entity type is a type of phenomenon, e.g. church, city, highway, lake the same entity type may be represented by different types of objects at different scales e.g. a city may be a point at one scale, an area at another in the database, several different types of entities may be represented by the same type of object e.g. points may represent both cities and churches an object class is a group of objects of the same type, representing the same type of entity e.g. cities and churches are different classes of the same type of object (point) the number and meaning of attributes is the same for all objects in a class e.g. church: denomination, capacity, date of construction e.g. city: name, population, date of charter object attributes may use various measurement scales (i.e. nominal, ordinal, interval, ratio) - see Unit 6 we think of a class of objects and its attributes as a table with rows corresponding to objects and columns corresponding to attributes classes of objects can be grouped into layers sometimes only one per layer, depending on the system the power of a GIS comes from its ability to store relationships among and between objects - see Unit 12 relationships can be between objects of the same class more often between objects of different classes relationships can identify object pairs which have their own attributes using this framework of spatial objects and relationships, the range of analysis possible with a GIS is explored B. ANALYSIS OF ONE CLASS OF OBJECTS Using attributes need only a single attribute table might be any object class example using city neighborhoods: attributes include: population count: ratio scale household count: ratio scale average income: $000s per household, ratio scale name: nominal scale average household expenditure on automobile purchases/year: $000s, ratio scale GIS operations on the attribute table might include: very primitive forms of analysis or enquiry are possible list neighborhoods by average income simple data retrieval, print table list all neighborhoods with average income greater than $40,000 select records satisfying criterion, print table compute the mean expenditure on automobile purchases requires weighting by household count in each area compute and print result look for relationship between average income and average expenditure on automobiles retrieve data and plot a graph all of these are capabilities of standard databases, e.g. DBase III none use GIS capabilities, no access to spatial data (locations) required Using locational information make a map of average household income, shading each neighborhood accordingly requires locational information to plot outline of neighborhood shading might be determined by a function of more than one attribute, e.g. the ratio of household expenditure on automobiles to household income this type of capability is offered by automated mapping packages compute the area of each neighborhood and store it as a new attribute area computed from locational information (digitized outline of neighborhood) useful in making meaningful maps e.g. map population density as population divided by area rather than simply mapping population over variable sized areas other similar measures that can be computed include perimeter, centroid location, distance, e.g. from downtown C. ANALYSIS OF OBJECT PAIRS e.g. pairs formed from each combination of neighborhoods, including neighborhoods paired with themselves with 5 neighborhoods have 15 combinations with n neighborhoods have n(n+1)/2 combinations example attributes: distance number of commuters in each direction time to travel by public transit draw a map of interactions overhead - Net flows between states map becomes exceedingly complex if all pairs are shown may need to show only most important flows, e.g. to downtown analyze commuter trips by time of travel by public transit e.g. how many take over 20 mins, 40 mins, 1 hour produces results in tables SPATIAL RELATIONSHIPS IN SPATIAL ANALYSIS ANALYSIS OF MORE THAN ONE CLASS OF OBJECTS one of the major strengths of GIS analysis Shortest path example find the shortest path through a street network between two places useful for fire truck dispatching, cabs, delivery vehicles navigation systems are being developed for mounting in vehicles able to display current locations and map of surrounding streets, follow vehicle''s path on map able to compute recommended route to given destination What spatial objects are required? links in the network attributes include length also factors such as traffic counts, congestion, number of lanes, average speed are important in finding optimum route nodes in the network intersections allow route to move from one link to another using link-node relationships important attributes include presence of traffic light, overpass or underpass what about turn restrictions? turn restrictions are not attributes of links or nodes e.g. "no left turn" is an attribute of a pair of links - cannot turn from link A to link B thus a link-link object pair is needed (see Unit 12 for more on object pairs) some systems define a "turn table" which is equivalent to this link-link object pair what about stop signs? stop signs are not attributes of nodes but are determined by the direction of entering the node, irrespective of the exit link thus a link-node object pair is needed Spatial interaction example useful for prediction of customer behavior handout - Spatial interaction data given 3 shopping centers, represented as points attributes: parking spaces, number of stores, quality of signage, age of construction given 5 neighborhoods, represented as areas attributes: population count, average household income, average age given information on # of shopping visits by neighborhood by shopping center (information gathered by street interviews) these are attributes of neither shopping centers or neighborhoods, but of the object pairs produces 15 object pairs plus attributes, including distance commonly used model in this situation is a "Spatial Interaction Model" (SIM) - requires information on: neighborhoods (from attributes of area objects), e.g. average household income shopping centers (from attributes of point objects), e.g. number of parking spaces spatial behavior (from attributes of object pairs) e.g. for each object pair, divide number of trips to appropriate shopping center by population of appropriate neighborhood, plot against distance handout - Spatial interaction modeling E. ANALYSIS WHICH DEFINES NEW OBJECTS many GIS operations produce new spatial objects from old ones may be same or different type, e.g. points producing points or points producing areas new objects may have attributes of the old objects which created them Buffer example build a buffer zone (area) around a stream network (a layer of line objects) stream layer has attributes of each stream link, including ID, discharge, length, depth buffer operation creates area objects may be: (a) an object for each link (b) one merged object for the entire network attributes of the new area object: in case (a) - length, ID, discharge, depth of channel in the buffer object (from link attributes) in case (b) - total length Street noise example street is a line object with attribute "traffic count" apply an equation to convert "traffic count" attribute to "noise level" build a buffer of 500 m around line attach noise level attribute to the new area object further development: includes houses as point objects identify all houses lying in buffer ("point in polygon" operation) attach noise level attribute to all houses lying inside area object produce list of all such houses, generate mailing labels from database and mail announcements of meeting to protest noise Trade area example given list of customers of shopping center, with home locations (point objects) create new attribute for each point giving distance to shopping center calculate average distance from all points find the "trade area" of the shopping center e.g. draw a circle with radius equal to the average distance to all customers produces an area object attach count of customers within trade area as an attribute of the new object Polygon overlay example perhaps the most important operation in GIS given two classes of area objects e.g. two maps for the same area, one showing soil types, the other vegetation zones "overlay" the two classes of objects creating a new set of area objects every new area object has two sets of attributes - soil type (copied from the soil map) and vegetation (copied from the vegetation map) F. GIS ANALYSIS FUNCTIONS functions should be defined independently of technical issues, understandable by users with little technical knowledge of GIS, independently of data model e.g. "buffer" - does not depend on choice of raster or vector, or require knowledge of technical detail functions are used to translate needs into specific GIS operations list of available functions is outgrowth of past GIS user needs emphasis on resource management applications because of strength of that market sector in last 10 years however, is no consensus on the possible domain of GIS, the total set of possible functions some GIS claim as many as 1,000 commands since functions and operations are defined at a higher level, each function may require several commands overhead - GIS Analysis Functions Measure results become attribute of objects measure length of line object measure area or perimeter of area object Coordinate transformation results in new coordinates for points register map to control points, transform coordinates accordingly change projection, scale, coordinate system (e.g. lat/long, State Plane, military grid) Generate objects by user input, e.g. mouse, digitizer tablet area, line, point objects circle around point, e.g. for query grid cell net, lat/long graticule from existing objects in the database buffer zones or "corridors" around points, lines, areas areas around points by assigning everywhere to the nearest point, producing polygons (Thiessen, Voronoi or Dirichlet polygons) - e.g. to create "trade areas" representative points in the middle of each area object (centroids) Select a subset of objects based on attributes, or regions, or "window" Modify attributes of objects by user input, e.g. keyboard by arithmetic based on existing attributes, e.g. find density by rules using relational and Boolean operators e.g. if white spruce and age > 50 years then new attribute is "y" Dissolve and merge area objects generates new, fewer objects Generalize or smooth lines reduce the complexity of a line or area boundary, or smooth it, or reduce the number of digitized points needed to represent it ("weed") diagram note: generalization is a very complex topic which is covered in detail in Unit 48 Compute statistics for a set of objects count them total or average a selected attribute compute statistical indices, e.g. standard deviation, correlation Topological overlay point in polygon line on polygon polygon overlay sliver polygon removal Operations on surfaces mostly for topographic surfaces recall there are several methods for digital representation, e.g. digitized contours, grid of heights (DEM), mosaic of triangles (TIN) - Unit 11 estimate height at a point find profile of surface along a line, e.g. a stream profile compute contours (line objects) from grid of heights, and vice versa compute grid of slopes, aspects find area objects of slope or aspect categories, e.g. slope <5% find watershed boundaries from DEM find the area visible from a point (viewshed) Network analysis many types of analysis can be carried out on networks, for transportation planning, utility management, airline scheduling, navigation find shortest path through the network between selected points determine whether one point on a stream network is downstream or upstream of another find the parts of the network which can be reached within a given travel time from a selected point Input and output management applications often consist of existing analytical packages running in conjunction with a GIS GIS does the "housekeeping" - handling data input and providing advanced output capabilities analytical package solves the problem REFERENCES Goodchild, M.F., 1988. "A spatial analytical perspective on GIS," International Journal of Geographical Information Systems 1:327-34. Examines the relationship between spatial analysis and GIS and discusses key issues. Goodchild, M.F., 1988. "Towards an enumeration and classification of GIS functions," Proceedings, IGIS: The Research Agenda. NASA, Washington, DC, II:67-77. Develops categories of analysis and provides examples. Unwin, D., 1981. Introductory Spatial Analysis, Methuen, London. A discussion of spatial analysis with a framework much like that provided by GIS: illustrates the range of spatial analysis. Upton, G.J.G. and B. Fingleton, 1985. Spatial Data Analysis by Example, Vol I: Point Pattern and Quantitative Data, Wiley, New York. EXAM AND DISCUSSION QUESTIONS 1. Compare the classification of map analysis functions in this Unit with that proposed for raster functions by Berry and referenced in Unit 5. 2. Can GIS functions be described and discussed generically, or is the raster/vector distinction unavoidable? 3. Make a list of 6 (10? 15?) functions which should be included in GIS because of their importance in various kinds of analysis of spatial data but which are not listed in this unit. 4. Define "object pairs" and give examples of their use in applications in (a) hydrology, (b) transportation planning. 5. "Spatial data is distinguished by its wealth of possible relationships between objects, and by the need to qualify such relationships in different ways". Discuss. 6. Design a database for an airline reservation system - what types of entities and relationships would be needed, and what attributes of each entity? Would the concept of an object-pair be useful? OUTPUT Compiled with assistance from Jeffrey L. Star, University of California at Santa Barbara A. INTRODUCTION Types of output B. TEXT OUTPUT Tables C. GRAPHIC OUTPUT Graphics peripherals Raster and vector devices D. HARDCOPY OUTPUT Line printers Dot matrix printers/plotters Pen plotters Optical scanners E. CRTS (CATHODE RAY TUBES) Storage tube technology Refresh image technology Color Bit planes and palettes 3-D display Memory and processing components F. GRAPHICS STANDARDS REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES OUTPUT A. INTRODUCTION output from GIS does not have to be a map in fact, many GIS are designed with poor map output capabilities Types of output text - tables, lists, numbers or text in response to query graphic - maps, screen displays, graphs, perspective plots digital data - on disk, tape or transmitted across a network other, not yet common computer-generated sound 3D images B. TEXT OUTPUT perhaps more important than maps for reporting results of analysis results might be a list or table of selected objects with attributes queries might result in numerical results, e.g. totals, distances, areas, counts text output might be delivered by voice generator, e.g. navigation instructions like "turn left at next traffic signal" Tables e.g. list of all cuttable areas of timber, giving area, species, age, estimate of yield in board feet list is not of great value without an accompanying map to identify each object in the list examples of specialized lists: personalized letter to be mailed to all households within 500 m of a planned expressway list of all hazardous materials stored within 100 m of a fire, transmitted by FAX to firetruck driving directions for a garbage collection route workorder and accompanying map and marked travel route for each service vehicle operated by utility company, giving day''s work locations, nature of work list and accompanying map of all city voting precincts ranked by degree of support for party in last election C. GRAPHIC OUTPUT Graphics peripherals provide graphic input and output of maps, diagrams and charts interactive graphics devices allow users to point to objects and identify them in their correct spatial context an early interactive version of the Space War computer game, developed at MIT in the early 1960s, ran on a PDP-1 computer using video displays television-like terminals became common in the early 1960''s, and are now the most common way users interact with computer systems in the next few sections we look at devices for graphic output and the ways their development has influenced GIS costs are approximate, correct to order of magnitude only Raster and vector devices graphic output devices can be classified into raster and vector groups raster devices build a picture by filling it with uniform picture elements, usually in rows e.g. line printers, dot matrix printers, scanners, most CRT terminals the elements of the picture are called pixels or pels resolution is sometimes expressed in megapels (approximately equal to 106 pixels) 640x480 pixel resolution is 0.3 megapels; 1280x1024 is 1.3 megapels vector devices build a picture by drawing lines, shading areas etc. e.g. plotters, storage tube CRT technology a raster device may be driven by vector commands, which it then converts for display, and vice versa conversion between raster and vector may thus occur at several points in a GIS between input and output D. HARDCOPY OUTPUT Line printers the first common device for computer output, cost $30,000 capable of printing 200-1,000 lines of text per minute each line composed of 132 characters in fixed positions entire line printed simultaneously by impact of hammers on ink ribbon limited set of printable characters image must be created row by row from the top drawbacks for graphical output because: rectangular cells resolution is fixed, e.g. 1/6 inch by 1/10 inch (4.2 mm by 2.5 mm) fixed cell location only limited set of characters can be printed difficult to draw continuous lines shades of grey must be generated by overprinting e.g. black = A + O + B + V + X results are best when viewed from a distance used as the output device for SYMAP, the earliest mapping package released in 1967 most useful for repeated mapping of statistics on a constant base, e.g. census atlas, weekly reports of crime statistics overhead - Line printer map Dot matrix printers/plotters image composed of rows of dots - often printed in blocks, e.g. 9 or 25 rows at a time to create shades of grey, control the fraction of dots which are printed in any small area the dots must be selected randomly, or "dithered", to avoid unwanted patterns early versions used hammers on ribbons for each dot - cost $500 more recent versions use lasers and xerography technology, resolutions up to 300 dots per inch - cost $2,000 color versions are available - squirt ink from jets of three or four primary colors - cost $2,000 electrostatic plotters use rows of dots, create images of map size - cost $40,000 Pen plotters create images by moving a pen under computer control most are incremental - draw a line using large numbers of movements of fixed size in fixed directions many have two motors, one for x and one for y get diagonal movement in only eight directions by using combinations of one or both motors however, step size is so small that lines appear to go equally easily in all directions diagram from a GIS perspective, an important advantage is the ability to plot on top of pre-printed base maps this avoids having to have all of the base map information in digital form Types of plotters lowest cost - $2,000 - are desktop size, take standard A4 or similar paper paper is flat pens can be changed automatically to generate different colors mid-range - $25,000 - are map size paper rolls over drum movement of pen is parallel to axis of drum second direction of movement is by rotating drum problems keeping paper in registration since it may move and stretch during construction of the map/graphic problems keeping pens moist during long plotting jobs typical plot jobs can last 3 to 6 hours top range - $100,000 are high precision used for drafting, map production usually flat ("flatbed"), medium held on exactly flat surface pen may be replaced by cutting tool on scribe coat demo - display a map from a plotter Optical scanners output on photographic paper paper mounted on inside of a rotating drum image created in helical fashion by rotating drum and moving light source along axis of drum common output devices for remote sensing, image processing other devices output directly to 35 mm slide film OUTPUT CRTS (CATHODE RAY TUBES) earliest (ca. 1968) could display rows of characters in fixed positions, little use for showing maps or images Storage tube technology Tektronix introduced terminals based on storage tube technology ca. 1970 major breakthrough in low-cost graphic display ($5,000) images drawn by moving electron beam over screen under computer control image is permanent, not refreshed, so must be erased completely - no selective deletion possible Refresh image technology terminals with refreshed images began to replace storage tube technology ca. 1975 significantly lower cost image redrawn from internal memory 25-50 times per second image created by lighting dots in fixed positions resolution determined by number of rows and columns of dots - some common screen resolutions: IBM PC Color Graphics Adapter (CGA) - 320x200 Enhanced Graphics Adapter (EGA) - 640x350 Video Graphics Array (VGA) - 640x480 1280x1024 is a common resolution for high quality graphics Color color is created by using groups of 3 dots, glowing red, green and blue respectively when illuminated by different electron guns different percentages of illumination create different colors overhead - Colors for RGB Bit planes and palettes recall from Unit 3: a bit is a unit of computer storage - it can be on or off a byte is a group of 8 bits a K (kilo) is 1024 - 64K bytes equals 65,536 bytes to display a black and white image, the terminal or display adapter must have one bit of storage per pixel 320x200 or 64,000 bits or 8,000 bytes for a CGA monochrome image to display color we use several bits per pixel 2 bits can have 4 combinations of on and off, so can display any of 4 colors to display any of 16 colors requires 4 bits per pixel if there are 4 bits per pixel, we say there are 4 bit planes a device with 20 bit planes (common for high resolution graphics) can display any of 220 colors in each pixel hint: to convert powers of 2 to powers of 10 approximately, multiply the exponent by 0.3 - 220 is about 106 or 1,000,000 - in fact 1,048,576 the number of bit planes determines how many colors can be displayed simultaneously this may be different from the number of possible colors e.g. the VGA has 4 bit planes, allows simultaneous display of 16 colors, but these can be defined using any combination of 64 levels of red, 64 of green and 64 of blue, for a total of 262,144 possible colors the limited set of colors chosen at any one time is called the palette the storage requirements of a VGA adapter (4 bit planes, 640x480 resolution) are 4x640x480 = 1,228,800 bits = 153,600 bytes = 150 Kbytes the requirements for a 20 bit plane, 1280x1024 device are 20x1280x1024 = 26,214,400 bits = 3,276,800 bytes = 3200 Kbytes 3-D display some vendors are now offering 3-D stereo display devices these create the illusion of depth by rapidly switching between two images, one for the left eye and one for the right a filter in front of the screen polarizes the images differently the user wears eyeglasses containing clear polarizing filters because each of the two images must be refreshed 25-50 times/second, the display must operate at 50-100 images/second Memory and processing components 1. Object memory some display devices have an optional local object memory to store the entire set of objects/vectors in the image to be displayed (often called "display list") allows rapid redisplay of objects without new input most useful for pan or zoom, or rotation of 3-D objects 2. Vector-raster converter since display is always by pixels it is necessary to compute raster images from vector input e.g. which dots must be turned on to display this line? 3. Display memory stores the color number to be displayed in each pixel 4. Color lookup table identifies the combination of red, green and blue corresponding to each color in the current palette by changing the color lookup table, can produce very rapid changes of color patterns on the screen without affecting display memory 5. Digital/analog (D/A) converter converts the digital signal stored in display memory to a voltage applied to the CRT F. GRAPHICS STANDARDS the large number of graphics input and output peripherals have a confusing array of data requirements the instructions sent to a laser printer may have nothing in common with those sent to create the same picture on a desktop plotter there have been many attempts to create standards for communicating with devices the format used by the early Tektronix storage tube terminals has been extended many times but is still widely used - known as Tektronix or 4010 format many devices recognize the format established by Hewlett-Packard and known as HPGL several companies have introduced common formats and provided "drivers" to convert these for specific devices, e.g. DI-3000 and DISSPLA the most successful recent format is PostScript, recognized by a large number of output devices REFERENCES Maguire, D.J., 1989. Computers in Geography, Wiley, New York. A good general introduction to computer use and spatial data: chapters 5 and 11 have excellent reviews of hardware. DISCUSSION AND EXAM QUESTIONS 1. Low-quality dot matrix printers use 9 hammer pins in a vertical column to create each line of print at 6 lines per inch, and thus achieve a resolution of about 50 dots per inch or 2 dots per mm. Compare the output of this device to the contents of the standard topographic map. 2. Compare the desktop plotter and the color CRT screen as display devices. Which would you find more useful as a location analyst for a major retailer? 3. Cartography is constrained by the two-dimensional nature of its paper medium. What new ways of displaying spatial data can you devise to take advantage of 3-D display capabilities? 4. "Raster and vector are not only different approaches to constructing a picture, but fundamentally different ways of looking at the world". Discuss. GRAPHIC OUTPUT DESIGN ISSUES A. INTRODUCTION Topics covered B. LABEL PLACEMENT Imhof''s basic rules Overposting Polygon labeling Some simple methods C. PRINCIPLES OF GRAPHICAL EXCELLENCE Graphical excellence D. DESIGN OF GRAPHIC OUTPUT Scale Base map General graphic design Screen display Scene generation REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES GRAPHIC OUTPUT DESIGN ISSUES A. INTRODUCTION Topics covered B. LABEL PLACEMENT Imhof''s basic rules Overposting Polygon labeling Some simple methods C. PRINCIPLES OF GRAPHICAL EXCELLENCE Graphical excellence D. DESIGN OF GRAPHIC OUTPUT Scale Base map General graphic design Screen display Scene generation REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES This unit introduces some fundamental concepts of graphic design. If you wish to do a more thorough job of this, consider combining this unit with the material in Unit 49. UNIT 17 - GRAPHIC OUTPUT DESIGN ISSUES A. INTRODUCTION previous unit described technical aspects of GIS output much GIS output is in the form of hard copy maps or graphic displays design of graphic output is critical if information is to be conveyed effectively to the user graphic output from GIS is often poorly designed e.g. colors used randomly without appropriate scaling conventional scale of colors used to display elevation on standard atlas maps has been optimized over centuries of cartographic experience design can benefit from principles of cartographic design developed in cartography screen display introduces a new set of issues because of greater capabilities compared with paper maps also see more general treatment of visualization of spatial data in Unit 49 Topics covered technical issues of label placement general principles of graphical excellence introduction to principles of map design B. LABEL PLACEMENT features shown on maps and displays can be differentiated and identified in various ways: symbols, e.g. church, bridge colors sizes labels labels provide the greatest flexibility to attach descriptions to point, line and area features names of administrative divisions, lakes, rivers etc. elevations of contours, spot heights highway numbers in cartography, positioning labels is a complex and sophisticated process there have been few attempts to write down the rules used (Imhof, 1975 is a well-known exception) it has proven difficult to emulate these rules in automated map production or GIS positioning labels on screen displays is especially difficult because of low resolution (e.g. 640 by 480 pixels), and the importance of speed by comparison, a plotted map may have an effective resolution of 300 dots per inch, and an hour computing time may be acceptable Imhof''s basic rules names on maps should: be legible be easily associated with the features they describe not overlap other map contents be placed so as to show the extent of the feature reflect the hierarchy of features by the use of different font sizes not be densely clustered nor evenly dispersed it may not be possible to satisfy all of these rules perfectly the best solution will balance conflicting objectives, e.g. need to associate name with feature vs. need to avoid overlap of contents label placement is a complex problem because of the vast number of possible positions that have to be searched and the number of conflicting objectives two labelling problems are particularly significant in automated mapping and GIS: Overposting when features are densely packed on a map or screen, it is difficult to keep labels separated labels may overlap (overposting) labels must be positioned to avoid overposting, but without destroying the eye''s ability to associate labels with appropriate features e.g. point features optimum position for a label is above and to the right below and to the right is less acceptable least acceptable positions are to the left label can be turned (non-horizontal) if necessary, but only by a small amount overposting is a problem because the computer must search a vast number of possible positions in practice, must limit the number of positions somehow some solutions define a fixed number of possible absolute positions, like a raster other solutions define a fixed number of positions relative to the feature diagram Polygon labeling labelling polygons has become notorious within automated mapping as a difficult and challenging programming problem the label should be central to the feature, may be reoriented or curved to fit the feature diagram in some cases the label may be connected with the feature by an arrow diagram Some simple methods 1. label centered on the polygon centroid problems: centroid may lie outside the polygon a long label may have to be multi-line to fit inside solution fails to meet Imhof''s criterion of showing the extent of the feature 2. variable rectangle positioned inside the polygon search for feasible positions for a rectangle wholly enclosed within the polygon ratio of width to height should be as high as possible solution will not curve the label to fit the feature largest enclosed rectangle may be in an inappropriate part of the polygon diagram 3. Skeleton shrink the polygon by moving its edges inward at a uniform rate the vertices trace out a network known as the skeleton (discussed in more detail in Unit 33) diagram position the label along the central part of the skeleton best for polygons like Florida which require curved labels practical labeling methods use combinations of rules for different shapes, sizes of polygons many developers have used the term expert system to describe label placement software an expert system works with complex sets of rules in a rule base the objective of the expert system is to emulate the complex decision process of a cartographer C. PRINCIPLES OF GRAPHICAL EXCELLENCE some very broad principles apply to the design of graphic output in general (includes graphics and charts) the following discussion relies heavily on Tufte (1983) Graphical excellence gives the viewer the greatest number of ideas, in the shortest time, with the least ink, in the smallest place maximize the data/ink ratio erase non-data ink erase redundant data-ink revise and edit the graphic it is difficult to get a good graphic first time around mobilize every graphical element, perhaps several times over, to show the data maximize data density and the number of data entries shown, within reason if the nature of the data suggests the shape of the graphic, follow that suggestion - otherwise, move toward horizontal graphics about 50% wider than tall GRAPHIC OUTPUT DESIGN ISSUES DESIGN OF GRAPHIC OUTPUT for GIS, graphic output must show: features appropriately symbolized or labeled objects computed by the GIS, e.g. buffer zones relationships it may be difficult to display the results of some forms of GIS data analysis because of the constraints of 2D display, e.g.: 3D data interaction data (migration, flows of goods) global data Scale the scale of output should be consistent with input scale e.g. inappropriate to digitize from 1:1,000,000 map, display at 1:24,000 because data will not be sufficiently accurate also inappropriate to digitize at 1:24,000, display at 1:1,000,000 without adequate generalization features will be too dense, too detailed scale on a CRT screen is as important as on a plotted map in principle a spatial database is "scale-free", but in practice scale is a crude indicator of data accuracy GISs should record and track scale in the database, but do not Base map to be useful, a map must include information for visual locational reference output of computed information alone is rarely useful need base map features as well e.g. map of cuttable forest stands needs to show locations of roads, watersheds, streams and lakes, besides cuttable stands, so user can locate stands on the ground, make decisions based on correct spatial context particularly important in raster systems display of a single layer is rarely useful without some form of basemap for locational reference basemap information will normally be vector, or at higher resolution than the raster this will be difficult if the raster system does not have vector capabilities input of basemap information can be expensive difficult to justify digitizing of data just to support interpretation of graphic output can plot output on top of pre-printed base map avoids need to digitize base map information base map must be accurately registered some GIS support this function General graphic design often desirable to create good-looking finished product e.g. as part of professional report, presentation undesirable to have map look "computer-produced", excessively abstract or schematic high cost of providing cosmetic output functions in GIS e.g. map border neatlines, symbols, north arrows, legends complexity of programming for these features may be much greater than for analytic functions time to plot these features may be high, particularly for pen plotters some GIS map products are now almost indistinguishable in quality from manual cartography is appearance really important in a map drawn to support decision-making? GIS output maps are to be used directly, not destined for walls or map libraries should GIS products be simple, schematic, avoid high cost of manual cartographic quality? marketplace seems to say "no" Screen display issues are different here because screen is: smaller, lower resolution than a printed or plotted map more flexible zoom, pan, interaction with user, animation, use of color principles of design of screen displays are still poorly developed black background or white? affects perception of color tradition (PC and mainframe terminals) is black background, Mac and many workstations use white hard copy map must display as much information as possible to satisfy possible user requirements because system is interactive, screen can display limited information but provide for access to more e.g. user "clicks" on or "picks" an object with a mouse, accesses lengthy text description access to an object''s attributes is not limited by constraints of static display Scene generation maps show geographic variation using symbols, objects, other abstractions of reality GISs do not have to do this - why not show a picture of the reality? - artist''s impression? scene generation is set of techniques for simulating real physical appearance e.g. GIS is used to plan a ski area on a mountain which is currently forested plan could be shown as a map, with contours, green tint for remaining forest, line objects for ski lifts scene generation would show oblique perspective view, cover hill with trees of varying height current technology allows appearance of trees to be varied depending on species, age we are still some way from having hardware fast enough to do this in "real time" REFERENCES Freeman, H. and J. Ahn, 1984. "AUTONAP - an expert system for automatic map name placement," Proceedings, First International Symposium on Spatial Data Handling, Zurich. Imhof, E., 1975. "Positioning names on maps," The American Cartographer 2(2):128-44. Robinson, A.H., R.D. Sale, J.L. Morrison and P.C. Muehrcke, 1984. Elements of Cartography, 5th edition, Wiley, New York. Excellent source of map design principles. Tufte, E.R., 1983. The Visual Display of Quantitative Information, Graphics Press, Cheshire, CT. Contains numerous examples of graphical excellence (and its opposite) in map design. Zoraster, S., 1986. "Integer programming applied to the map label placement problem," Cartographica 23(3):16-27 A session on automatic names placement at AutoCarto 9, Baltimore, April, 1989 provides several reviews of the use of expert systems for map design: Doerschler, J.S., and H. Freeman, "An expert system for dense- map name placement," pp. 215-224. Ebinger, L.R., and A.M. Goulette, "Automated names placement in a non-interactive environment," pp. 205-214. Johnson, D.S., and U. Basoglu, "The use of artificial intelligence in the automated placement of cartographic names," pp. 225-230. Jones, C.B., and A.C. Cook, "Ruled-based cartographic name placement with Prolog," pp. 231-240. EXAM AND DISCUSSION QUESTIONS 1. "Map output is essential to GIS, yet GIS designers have ignored or failed to implement many of the well-known principles of map design". Discuss. 2. Examine and discuss a collection of GIS output maps, such as the ARC/INFO Maps volume produced annually by Environmental Systems Research Institute, Redlands, CA, or the slides supplied with Unit 53. How effectively do they implement principles of map design? 3. Discuss the differences in graphic design principles as applied to manually produced maps, GIS hard copy output and screen displays. 4. Review the Cartographic Design chapter of Elements of Cartography, and discuss its contents in the context of GIS output. MODES OF USER/GIS INTERACTION A. MODES OF INTERACTION Product mode Query mode Example queries by application Continuum B. TYPICAL QUERIES 1. Simple recall of data 2. Where is object A? 3. What is this object? 4. Summarize attributes of objects within distance x 5. Summarize attributes of objects within a region 6. What is the best route? 7. Show all of the objects satisfying the criteria 8. Use of relationship between objects C. CHARACTERISTICS OF QUERY MODE User training D. PRODUCT MODE Personnel requirements E. USER INTERFACES Command driven Questions or prompts Menu driven Icons Windows Plain language interfaces REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES MODES OF USER/GIS INTERACTION MODES OF INTERACTION Product mode system generates information products - lists, maps - which are later used for decision-making user of products does not interact with the system directly e.g. student records system generates class lists, transcripts which are used by committees, faculty to make decisions about student progress faculty need know nothing about student record system except some conceptual understanding of (a) what information is in it and (b) what its capabilities are e.g. can you give me a set of mailing labels for all students who took at least one course in geography last year? - anticipates (a) that the necessary data is in the system and (b) that the system has the functionality to do this resource manager should have similar levels of understanding of the agency''s GIS what is stored in it (data layers) what functions a GIS can perform e.g. can you make me a map showing all lands visible from this proposed waste disposal site? market researcher should have similar understanding of capabilities of firm''s GIS e.g. report numbers of people age 25-34, income over $30,000, living within concentric bands 1/4 mile wide around this proposed restaurant location Query mode user interacts directly with system, perhaps through an operator, to obtain answers to queries Example queries by application common type of query in land records office - "what easements and zoning restrictions are on the property?" GIS can handle different ways of identifying property, e.g. by pointing to map, street address, subdivision plan, adjacency to other property GIS can answer queries like "what police precinct(s) cover(s) this property?" by overlay of precinct layer and check for overlap common queries for publicly accessible records in a municipal data base: e.g. locations of water lines, sanitary and storm sewers, current, historical and proposed Official Plan designations, historical property designation, site plan details, state of approvals navigation - "how do I get there from here?" Continuum query and product modes are two extremes on a continuum vendors position themselves differently on this continuum e.g. vendors of navigation and utilities management systems provide products at the query end while vendors of resource management systems tend to focus at the product end choice of mode also depends on frequency of use and training e.g. in a city records office, a clerk will operate the GIS to determine answers to specific queries while the general public will refer to updated maps printed at regular intervals from the GIS e.g. travel agent learns commands to formulate queries for airline reservations while the traveler consults printed timetables (generated from same database) needing no technical knowledge of reservation system B. TYPICAL QUERIES 1. Simple recall of data given a way of identifying the object by unique attributes (name, ID number, street address, account number) get list of attributes other queries require searching for objects satisfying requirements there are several different types of search queries: 2. Where is object A? given a way of identifying the object by unique attributes (name, ID number, street address, account number) show the location of the object on the screen along with its surroundings scale of the surroundings depends on the application might show house in relation to neighborhood, then use an inset map to show neighborhood in relation to city common example of this query is address matching - finding the coordinates or location of a house/customer/lot from its street address address matching is used extensively in market research, processing of census returns, dispatching emergency vehicles to fires in some instances address matching is the only required function of a GIS special case is the automobile navigation system which shows location of the vehicle on a constantly updated map on a small screen next to the driver 3. What is this object? inverse of (2) object is identified by pointing ("picked") with an interactive device - mouse, cursor, light pen system returns attributes of object e.g. street address of lot, name of owner, price of house at last sale, production of oil well special case - given a grid of points, estimate the value of its attribute here e.g. with a DEM, estimate the elevation at the point indicated 4. Summarize attributes of objects within distance x extension of (3) to multiple objects related by distance give me (summary of, total of) attributes of objects within distance x of this point (pointing to screen) common query in site selection give me totals for potential customers within 1/4 mile rings of this proposed location, broken down by census information, e.g. income, age, sex, occupation statistics will be stored as attributes of point or area objects (reporting zones) and will be aggregated to respond to queries this query service is offered by many market research companies client dials up with coordinates (lat/long) of proposed site, queries database large chain (e.g. bank, supermarket, convenience store) will do this many thousands of times a year 5. Summarize attributes of objects within a region extension of (4) to user-defined polygons rather than circles centered on a point give me (summary of, total of) attributes of objects within this area (outlining polygon on screen) e.g. tell me how much prime agricultural land is within this area (floodplain of proposed dam) e.g. tell me how much prime timber was burnt by the fire which burned this (query) area common query in political, school districting report numbers of students or voters in proposed district 6. What is the best route? what is the best (least cost, least impact, fastest) route between these two points? database model may be discrete (links and nodes of a network) or continuous (raster or grid) discrete case used to dispatch fire trucks, emergency vehicles, cabs etc. requires constant updating of link attributes to include road construction, maintenance, congestion continuous case used to route transmission corridors, pipelines to minimize impact used by aircraft flying N. Atlantic routes to minimize impact of jetstream if flying from East to West, maximize benefit of tailwind if flying the other way military applications ("trafficability") in determining routes for e.g. tanks 7. Show all of the objects satisfying the criteria show all of the objects satisfying the following criteria (defined on their attributes) e.g. show all oilwells producing over 1 m3 per day extension of (2) to many objects 8. Use of relationship between objects some queries require use of relationships between objects if relationships are not currently stored they must be computed e.g. show all the links in the stream network downstream of this link - requires the "flows into" relationship between links e.g. show all the oilwells in the same county as this one - requires the "is contained in" relationship between oilwells and counties e.g. show the nearest road to this point - requires the "is nearest to" relationship between roads and the point e.g. show the counties adjacent to this one - requires the "is adjacent to" relationship between counties C. CHARACTERISTICS OF QUERY MODE provide soft copy operate in real time - maximum time allowed for response is a few seconds often reported verbally over phone response must be precisely what was required e.g. phone call to information asking for number for "Bill Smith" does not want 20 possible numbers, only most likely GIS query system often replaces anecdotal memory of staff traditional approach to address matching relied on personal knowledge of city - e.g. London cab drivers required to know every street in London as qualification for license address match by traditional approach can take minutes per enquiry GIS system provides answers which are more precise, more rapid and can handle many more transactions User training query requires a high level of user expertise user must make frequent use of system in order to justify and maintain the level of familiarity use must involve a relatively small number of functions unless user is to commit large amounts of time to training query mode works best when queries are repetitive, e.g. "onecall" operation to identify locations of underground pipes, cables prior to construction - query is always "what is near here" query requires a friendly interface menus, icons may be preferred over commands which are tedious to type on the other hand trained users may prefer cryptic commands to other methods of access, e.g. airline reservations systems are command-driven, have high level of functionality, speed MODES OF USER/GIS INTERACTION PRODUCT MODE product mode is necessary for more complex or recurring information products, often involving manipulation, analysis, and recombining stored data to derive new information produces hard copy which is useful for several months after production e.g. phone book valid for 1 year e.g. airline guide valid for 1 month e.g. class list valid for 1 semester e.g. map of current census boundaries good for 10 years products often need to be updated and reproduced on a regular basis typical resource management agency may have 50-100 standard products to generate monthly or annually some agencies may require generation of as many as 100 different map sheets to cover jurisdiction since there is no immediacy of demand, production can be treated as a batch processing operation according to a planned schedule contrast this with query mode which places variable load on the system to maintain minimum response time system must be configured for worst-case load, e.g. 10am due to the repetitive nature of the update of these products, their production can be pre-programmed through the use of macros a macro is an ordered list of computer operations designed to generate a standard result macros, initiated with a single command, will cause the execution of the long and complex set of required operations Personnel requirements GIS Analyst analyst required to translate user needs into products identify necessary data layers develop appropriate data collection strategies, plans design sequences of GIS functions to generate products from layers design products to meet needs of users design of products requires personnel with ability to: conceptualize the sequence of GIS processes required construct algorithms for compiling the data design the report format so that the information that is provided is worthwhile to the users these people must have: understanding of subject matter to interact effectively with decision-makers who need information level of technical expertise to develop GIS operation sequences to produce specified product clear understanding of limitations of technology and data GIS technicians need to know technical aspects of the operation of the software and hardware will use the macros to produce required products users generally product mode requires little technical expertise on the part of ultimate users of the data since products will usually be in traditional form - e.g. maps and tables E. USER INTERFACES overhead - Types of user interfaces there are several types of user interfaces: Command driven user types commands at generic prompt - e.g. C> commands usually cryptic user must absolutely follow system-defined syntax by using precise spelling and punctuation rules can be frustrating for poor or slow typists may be very large number of many commands, e.g. near 1,000 in some GISs online help may reduce need to learn all rules and syntax, especially for the infrequently used commands "toolbox" metaphor used to describe the collection of possible functions available at any time Questions or prompts system asks user for responses in sequence to determine parameters for analysis and output limited range of operations common in programs with minimal or restricted functionality Menu driven user picks options from menus by pointing or by typing single letters or numbers menus present the only options which are possible at that time consequences of choice may be listed beside each option with familiarity, complex menu systems become tedious to use may not provide the flexibility of command driven systems Icons a form of menu system providing symbolic icons to represent options available user drives system by pointing to icons for some operations and using menus for others some users feel that symbolic systems can be more intuitive to learn and operate than verbal ones e.g. measure of ease of use is whether "a child of 10 can do something useful with the system without any prior training" Windows GIS interfaces should take advantage of the nature of spatial data there are two natural modes of access to spatial data - through maps and through attributes the more sophisticated systems now use multiple windows with different ones for textual and graphic images windows allow several views of the map at once i.e. full area and zoomed in image Plain language interfaces efforts to construct plain language interfaces for GIS have not been successful to date range, syntax of geographic queries is too large stressing language ignores the importance of visual access through maps, pointing etc. REFERENCES Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment, Clarendon, Oxford. See pages 9-10 for a short review of different types of queries. EXAM AND DISCUSSION QUESTIONS 1. Describe the differences between query and product mode interaction with GIS. Which mode would you select in constructing a GIS for a state department of transportation? 2. Discuss the use of product and query modes in accessing (a) phone directory information, (b) airline flight information, (c) TV program guides. 3. How will geographical access to records most affect people''s lives in the coming decade? 4. What do you anticipate will be the level of production of 1:100,000 quad sheets by the US Geological Survey in the year 2010 - higher, lower than today or zero? GENERATING COMPLEX PRODUCTS . STEPS IN DEFINING A GIS PRODUCT B. EXAMPLE GIS PRODUCT DEFINITION Decisions Information needed Data needed Processing steps Summary of functions needed C. PRACTICAL PROBLEMS Management demands Data not available Data is available, but there are problems Data in wrong format Complexity of decision rules D. SITE SUITABILITY Spatial search Assigning suitability Decision theory Sensitivity REFERENCES EXAM AND DISCUSSION QUESTIONS EXERCISE NOTES GENERATING COMPLEX PRODUCTS A. STEPS IN DEFINING A GIS PRODUCT what decisions have to be made? what information products are needed to make those decisions? e.g. decision is whether student should be allowed to graduate information product is student''s transcript, generated from student records database e.g. decision is where to put access road information products may include perspective plots, location of timber stands what data must be input or available to the system to make those products? need to know geographical coverage required, thematic data needed and sources of data what GIS functions need to be carried out on the data? B. EXAMPLE GIS PRODUCT DEFINITION Decisions National Forest must manage forest land for multiple uses one use is recreation which may conflict with other uses, e.g. wildlife, timber harvesting question is "Where are the most accessible areas which could be considered for the development of recreation facilities?" accessibility is defined in this example in terms of proximity to public roads the best areas are large and close to roads Information needed a map showing forest lands, classified according to accessibility for recreation a scale of 1:24,000, or larger if possible the map should show zones and associated accessibility classes for Forest Service land base map information - roads, railroads, cities and towns, Forest Service boundary Data needed roads and railroads - from standard 1:24,000 topographic map Forest Service management area - have been drafted on 1:24,000 topographic map from legal descriptions shown as many individual areas, most are contiguous but some are not city and town boundaries - have been drafted on 1:24,000 topographic map from legal descriptions data will be input as 3 layers overhead - Accessibility analysis data management area boundaries as area objects (layer A1) roads and railroads as line objects (layer B1) city and town boundaries as area objects (layer C1) Processing steps overhead - Project flowchart overhead - Project steps (6 pages) 1. using the forest service areas data (layer A1), assign a new attribute FORESTLAND, value = 1 if area is forest service land, 0 otherwise 2. dissolve boundaries between areas with the same FORESTLAND value, and merge areas to create new area objects with one attribute - FORESTLAND - call the new layer A2 3. using the transportation map B1, select public access roads only - call the new layer B2 4. generate buffers 0.5 miles wide around all objects in layer B2 - call the new layer B3 - assign the attribute INHALF a value of 1 for the buffer area, 0 outside 5. generate buffers 1.0 miles wide around all objects in layer B2 - call the new layer B4 - assign the attribute INONE a value of 1 for the buffer area, 0 outside 6. topologically overlay the objects in layers A2, B3 and B4 (some systems may require two steps to overlay three layers) to obtain layer B5 with area objects with the following attributes: FORESTLAND INHALF INONE 7. using the city and town boundary layer (C1), assign a new attribute URBAN, value 1 for areas of cities or towns, 0 otherwise 8. topologically overlay the objects in layer C1 with those in layer B5 to obtain layer B6, adding attribute URBAN to the three attributes in B5 9. assign a new attribute ACCESS to the objects in B6 using the following rules: Value Criteria 0 not forest land (FORESTLAND=0) FU forest land and urban (URBAN=1) 1 forest land, non-urban and within 0.5 miles of rail/public road (INHALF=1) 2 forest land, non-urban, outside 0.5 miles but within 1.0 miles of rail/public road (INONE=1) 3 forest land, non-urban, outside 1 mile of rail/public road (INONE=0) note that the criteria are ordered so that in each step only areas that have failed all of the previous tests are considered e.g. test for FU assumes that the object has already failed the prior test FORESTLAND=0 10. Dissolve boundaries and merge areas with the same value of ACCESS call this new layer B7 assign unique ID numbers to each new object 11. Measure the areas of objects (in hectares) in B7 and assign this attribute to each object as AREA 12. Modify attribute ACCESS for cases where ACCESS ="1" and area is greater than 2500 to "1A": If ACCESS = "1" and AREA >= 2500 then ACCESS="1A" 13. Create a plot showing: Forest Service ownership boundary (layer A2) All roads and railroads (layer B1) All cities and towns (layer C1) Area objects in layer B7, shaded by value of ACCESS attribute and labelled with ID number assigned in step 10 14. Create a list of all area objects in layer B7, showing the following attributes: ID (number assigned in step 10) ACCESS AREA Summary of functions needed Assign a new attribute (from existing attributes based on mathematical or Boolean operators) Dissolve boundaries and merge areas (based on value of specified attribute) Select objects (based on attributes satisfying mathematical or Boolean operators) Generate buffers (to a specified width around line objects) Topologically overlay (two or more layers of area objects to obtain a new layer of area objects) Measure area (of area objects, assigning values to a new attribute) Modify an attribute (selectively, based on mathematical or Boolean operators) Create a plot (of specified classes of objects, showing selected attributes, using various symbol, shade and label options) Create a list (of a specified class of objects, showing selected attributes, plus subtotals, totals etc.) once this sequence of operations has been worked out, it is very easy to design a macro which will automatically execute this sequence of steps whenever a layer is updated GENERATING COMPLEX PRODUCTS PRACTICAL PROBLEMS practical problems commonly encountered when trying to produce information products include: Management demands to demonstrate the value of the GIS to management, and thus ensure continued support, some useful products will need to be available very early in GIS system implementation but products cannot be generated until all the needed data have been input data input, production need to be coordinated so that some useful products appear quickly mandates change, the responsibilities of the agency change from time to time "drop everything you''re doing - we need x" difficult to operate a systematically designed approach to GIS when agency''s responsibilities are poorly defined or too flexible Data not available information product may require input data which is not currently available as a digital data layer data may have to be collected, compiled and input to support the product most agencies do not plan their data acquisition systematically to support their decision-making mandates introduction of GIS into an agency often forces much more systematic data planning has led to the development of interagency committees to allow for sharing of data needed by several agencies Data is available, but there are problems scale of data is much too small e.g. a geology coverage is essential, but none is available at a suitably large scale tempting to use the small-scale coverage, but results will be questionable geographical coverage is incomplete, or currency varies, or accuracy varies e.g. parts of the city are accurately mapped, other parts are poor quality e.g. topographic maps have widely different update dates how to warn the user when the quality of data changes within a data layer Data in wrong format data may already be digital, but in inappropriate format e.g. forest management agency has forest inventory maps digitized as raster cells 16 by 16 array of cells for each 1 sq km of the area managed each cell contains stand number stand numbers point to attribute table of stand attributes these data had been used for simple tables, measurement of area based on counting cells data must now be merged into a vector GIS to support forest management functions raster/vector conversion creates area objects, boundaries follow old pixel edges diagram Complexity of decision rules rules for assigning new attributes can be complex, difficult to specify precisely e.g. rules for areas loggable for "Mature Sawlogs": overhead - Decision rules - Mature Sawlogs species association: S = spruce SH = spruce/hardwood HS = hardwood/spruce primary species: WS = white spruce texture: requires fine through moderately coarse soils drainage: requires inundated through wet soils as logging occurs when soils are frozen objects must satisfy all of conditions 1 through 5 to be acceptable assignment of "suitability" attributes can involve as many as 100 attributes D. SITE SUITABILITY Spatial search spatial search uses attributes to search for most suitable or most profitable or least noxious locations for activities or facilities the activity/facility might require a single point location, a line or an extended area point location examples: wells, observation towers line location examples: power transmission, oil and gas pipelines, highways area location examples: campsites, logging, airports, waste disposal sites requires measure of suitability derived from many underlying layers or attributes by assignment rules Assigning suitability the process of combining many data layers into a single layer of suitability has been called cascading many tens of layers may be involved in creating one index of suitability cascading rules can include arithmetic, conditional, logical operations, recoding e.g. a study to locate a power transmission corridor through an area about 100 km across used 30,000 cells each 500 m square used over 100 data layers which were cascaded into a single index of suitability ranging from 0 (impossible for route) to 5 (best) some example layers: existing power corridor (yes/no) soil capability for agriculture (score from 1 - best - to 7) urban area (value = population density) some types of spatial search are atomistic suitability depends only on the characteristics of the place itself other types of search are holistic suitability depends not only on characteristics of the place but also on locations of other facilities e.g. a point is not good for a firetower if there is already a firetower one kilometer away e.g. it makes no sense to consider individual pixels as locations for a highway route - the route as a whole has to make sense how to determine rules for assigning weights to contributing factors? different decision-makers will have different preferences - how to find consensus? e.g. committee has to recommend route for new power transmission corridor one member wants to preserve agriculture, suitability gives negative weight to farmland another wants to preserve natural areas, suitability gives negative weight to wetlands, woodlands, positive weight to farmland one wishes simply to minimize construction cost of the power line by using the shortest route Decision theory provides methods which have been used to implement complex choice rules a single utility function (SUF) defines the importance given to each value of an attribute by a decision-maker e.g. to the agricultural representative, the attribute CROP_TYPE may be valued as follows: cornland is worth 0.6, pasture 0.2, irrigated tobacco 0.8 a multiple utility function (MUF) defines the importance given to each attribute (or group of attributes) in the overall measure of suitability e.g. to the agricultural representative, CROP_TYPE gets 0.9, CONSTRUCTION_COST gets 0.1 SUFs and MUFs can be elicited from decision-makers by systematically presenting combinations of options and asking for preferences since different individuals give different weights to factors, there are methods for combining preferences which have been expressed using different MUFs decision theory is important to GIS because of the number of applications using spatial search see Unit 57 for more on multiple criteria decision making Sensitivity many choices have to be made in defining GIS operations, attribute assignment rules, SUFs and MUFs these choices may be difficult to make how to balance conflicting objectives? to assist, it may help to know how sensitive the results are to these choices when the results are sensitive, choices need to be made carefully and accurately when the results are insensitive, choices can be made less carefully e.g. must be very sensitive to the impact on endangered species (cannot use sites with endangered species) while different slope aspects may have little effect on the result need to distinguish between sensitivity in principle and in practice in principle, preserving wetlands may be a high priority concern in practice, there may be no wetlands in the study area in practice, wetlands may extend across the study area, so any route will have to cross them and create the same impact Unit 46 discusses sensitivity in more detail REFERENCES Massam, B.H., 1980. Spatial Search, Pergamon, London. Excellent discussion of spatial search and applications of decision theory. French, S., 1986. Decision Theory: An Introduction to the Mathematics of Rationality, Halsted, New York. Good source on decision theory. EXAM AND DISCUSSION QUESTIONS 1. How much flexibility is there in the sequence of operations in the recreation accessibility example? What changes could be made to the sequence without affecting the result? Can you devise a diagram or flow chart to show this? 2. Describe the relevance of decision theory to spatial search using GIS, with examples. 3. What is the difference between sensitivity in principle and sensitivity in practice in the result of a spatial search? 4. Describe the process you would follow as a consultant working with a resource management agency to determine the information products required from a GIS, and to plan the development of the GIS database. EXERCISE The Milk Marketing Board of Dairyland has been developing a network system for the management of the collection and distribution of milk across Dairyland. What is involved is the routing of 425 trucks varying in capacity from 9 000 to 45 000 liters, to collect approximately 2.3 billion liters of milk per year from 9 800 producers. The milk is carried over several thousands of kilometers of roadways to processing plants, within a very confining time period of two days. Design a GIS database to include information on the road network, production quantities at farms, processing capacities at dairies, requirements at markets (cities) and amounts shipped and shipping costs from each farm to each processor and from each processor to each market. Define the functions the system will need to (a) produce maps of the producers, processors, markets and shipments, (b) produce tables of quantities produced, processed, marketed and shipped, (c) evaluate changes such as closure of a processor, expansion of a market, change in production levels. Write a proposal for such a system to be submitted to the Milk Marketing Board. GIS AS ARCHIVES INTRODUCTION Archive databases Project databases B. NATURE OF ARCHIVES Currency of data Use of archives Data for digital archiving C. EXAMPLES OF SPATIAL DATA ARCHIVE SYSTEMS CGIS MIDAS STORET LUNR NARIS MLMIS D. FATE OF ORIGINAL ARCHIVE SYSTEMS Platform Static data Distribution User interface Costs vs benefits Development of general purpose systems E. SPATIAL DATA ARCHIVES TODAY REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES GIS AS ARCHIVES . INTRODUCTION Archive databases are spatial databases developed as stores of information for general use provide coverage of some political jurisdiction, e.g. globe, nation, state, province, county purposes often not clearly articulated system provides limited functionality oriented toward data retrieval Project databases are spatial databases developed to support specific projects coverage for study area only purposes usually better articulated system provides functionality adequate for project B. NATURE OF ARCHIVES map library is traditional archive data is partitioned by both theme and geography by theme, e.g.: base maps (usually topographic) include roads, railways, surface hydrology, topography, depending on scale individual thematic map series, e.g. vegetation, soils, transportation, energy by geography, e.g.: USGS 1:24,000 sheets are organized by state, within state by name of sheet index map shows sheets in correct geographical positions most atlases and map libraries partition data primarily by geography, secondarily by theme digital spatial data archives tend to be organized in an opposite way primary key is thematic topographic data is produced by USGS using DLG (Digital Line Graph) and DEM (Digital Elevation Model) format street network data is produced by Bureau of the Census in TIGER format remotely sensed images are produced by NASA and other space agencies secondary key is geographical topographic data organized by map sheet TIGER organized by county Landsat images organized by scene Currency of data to be suitable for archiving, data must be stable through time some geographical data never changes - e.g. Census data and satellite images as a representative slice in time never change some changes rarely - e.g. topography and hydrography some changes rapidly - e.g. street networks in rapidly developing cities in some cases geographical data needs to be available both in archived and current forms e.g. street network data from TIGER needs to remain in census period format so that census data can be referenced to it needs to be current for use in navigation systems, postal delivery advantage of digital archives is that updating, when it does occur, does not require reissuing of hardcopy products updates can be made to central archive updated version can be transmitted by high speed links or distributed as digital tapes Use of archives archives should be established to meet user needs however, users and their needs are often not adequately assessed need to ask: who really needs this type of information? for what purposes? would it get used? does it need to be digital? consider the example of a commonly used archive, the phone directory how often does the need to look up phone numbers occur? must be frequently enough to justify the cost of providing phone books with every phone however, even this obvious need for access to phone numbers has not yet justified the development of an on-line database accessible to individual phone customers - access must be through operator (in France the phone directory is available on-line to individuals through an interactive TV system - however, this may be due more to the facts that the TV and telecommunications systems are state-owned and the government is strongly interested in promoting new technology - presumably benefits do not justify the cost of similar systems in North America and elsewhere) what does this say about the need for digital geographical archives to replace such things as road maps and area code maps? the phone book archive is very simple compared to the complexity of databases and queries for geographical archives would a digital road map be used frequently enough to justify its cost and updating? Data for digital archiving therefore, data suitable for archiving must be of sufficiently general use to justify cost remain current for sufficient period of time, or be capable of constant update be sufficiently self-explanatory that users do not encounter significant problems of interpretation e.g. user must have access to definition of each object or attribute or attribute value e.g. user should have access to a data quality report GIS AS ARCHIVES C. EXAMPLES OF SPATIAL DATA ARCHIVE SYSTEMS many large raster GIS databases have been built as inventories of natural resources, land use, etc. CGIS started 1962 Canada Geographic Information System designed to allow computer-assisted analysis (measuring area, overlaying different themes) of the data collected by the Canada Land Inventory many technical innovations one of two claims to original development of the term GIS - the other at Northwestern University (Marble) MIDAS 1964 - U.S. Forest Service grid cell overlay and modeling first full service GIS for natural resource management STORET 1964 - U.S. Public Health Service Division of Water Supply and Pollution Control. standardized the data collected by different organizations relating to water quality, flows, treatment processes and location LUNR is a well-documented example of an early inventory project for the State of New York (see Tomlinson et al., 1976) cells were 1 km square which is too large for most planning purposes values could be recorded for percentages within cells, but with no information on where these proportions were located in the cell NARIS Natural Resource Information System active in Illinois in 1970s raster system based on 40 acre cell funded by Ford Foundation, state agencies located at Center for Advanced Computation, University of Illinois software never adopted for any other purpose, despite the large investment MLMIS is a successful resource inventory for the State of Minnesota - see Unit 9 the software developed as part of the analysis and delivery effort for MLMIS, EPPL, has been widely distributed for general purpose use MLMIS cells are 40 acres, but further subdivision is used D. FATE OF ORIGINAL ARCHIVE SYSTEMS most of these original archive systems have become obsolete, expensive experiments that are no longer used Why? Platform mainframe systems can not compete with low-cost mini and micro platforms which appeared in the 1980s mainframes tend to have unique operating systems which limit portability of software and data many mainframe GIS systems of 1960s and 1970s were never "ported" to newer platforms (e.g. CGIS) entire investment in system on a large mainframe may have to be abandoned because of costs of converting to new platform or operating system high cost of keeping large databases on-line (accessible within seconds) on 1970s platforms on-line storage available on typical 1989 workstation would have cost $000s/week on 1970s mainframe these recurring costs could not be offset by revenue generated by database alternative was to store data on tape, incur access delays and maintenance costs Static data systems assumed static data, did not allow for updating Distribution high cost of communicating with mainframes over long distances in 1970s - no high speed, low cost digital communication links access effectively restricted to users at central site difficult for remote site, e.g. county planning office, to justify cost of on-line connection User interface command syntax like plain English but rules are still extensive, difficult to learn how many users would need the data often enough to justify learning the syntax? e.g. how long does it take a travel agent to forget how to use the airline reservation system? command-driven interface requires lengthy, accurate typing difficult to make interface user-friendly, forgiving terminals of 1970s did not offer color, operated at frustratingly slow communication speeds Costs vs benefits use failed to materialize at sufficient levels to justify costs inadequate user survey system "over-sold" inadequate anticipation of high recurring costs cost over-runs on development Development of general purpose systems many of the old, special-purpose in-house systems of the 1970s like NARIS have been replaced by general-purpose, vendor-supplied vector systems in the early 1980s these use smaller platforms like the VAX, Prime, Sun costing 1/10 of the large mainframes special-purpose functionality of archival systems will not support other types of data, other needs high cost of development of system cannot be offset by sales for other uses by contrast, vendor incurs high development costs of general-purpose GIS but can offset costs by multiple sales into diverse market simple low-cost general-purpose raster systems offer comparable functionality without development costs general-purpose systems can survive where general-purpose databases cannot ultimately, interest in CGIS and many other archival GISs was in system, not data archive E. SPATIAL DATA ARCHIVES TODAY cheaper platforms, lower recurring cost allows functionality of older systems to be supplied by low-cost workstation better awareness of value of digital data archives and GIS-based solutions to resource management has produced a large market for GIS important archives today are very general databases like TIGER, USGS digital cartographic data, where user community demand is strong and supporting agency has specific mandate for data collection REFERENCES Mounsey, H. and R.F. Tomlinson, 1988. Building Databases for Global Science, Taylor and Francis, New York. A review of the current status of global spatial data archives Tomlinson, R.F., D.F. Marble and H.W. Calkins, 1976. Computer Handling of Spatial Data. UNESCO Press, Paris. Compares and assesses several archival systems, including CGIS, LUNR, MLMIS. Wiggins, J.C., R.P. Hartley, M.J. Higgins and R.J. Whittaker, 1987. "Computing aspects of a large geographic information system for the European Community," International Journal of Geographical Information Systems 1(1):77-78. A description of the Co-ordinated Information on the European Environment (CORINE) program, a multi-national natural resources GIS project. EXAM AND DISCUSSION QUESTIONS 1. Review and discuss a selected state natural resource data archive, such as MLMIS, CGIS. How does the system compare with NARIS? 2. The volume Building Databases for Global Science listed in the references contains chapters discussing a number of efforts to construct global databases. What particular problems do they present? 3. Design a study to assess the need for a state natural resource data archive, and to evaluate its potential benefits to the user community. 4. The unit drew a parallel between spatial databases and map libraries. Discuss the validity of this analogy, and the dangers in using it to make design decisions for spatial databases. THE RASTER/VECTOR DATABASE DEBATE . INTRODUCTION Raster or vector Basic issues B. COORDINATE PRECISION Raster precision Location of coordinates in raster Vector precision Data precision C. SPEED OF COMPUTING D. MASS STORAGE Raster storage Vector storage E. CHARACTERISTICS OF PHENOMENA Raster sampling Vector sampling Features, entities and objects F. SUMMARY Combinations of raster and vector REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES THE RASTER/VECTOR DATABASE DEBATE INTRODUCTION Raster or vector Unit 4 introduces definitions of raster and vector arguments about which was better have been commonplace since the earliest systems were created raster databases are appealing simplicity of organization speed of many operations, e.g. overlay, buffers especially appealing to the remote sensing community who are used to "pixel" processing on the other hand, there are many situations in which the raster approach may appear to sacrifice too much detail cartographers were appalled by the crude outlines of parcels that resulted in the "pinking shear" effect of diagonal boundaries represented by grid cell edges diagram surveyors were dismayed by the "inaccuracy" caused by the cells when portraying linear features and points situations in which the raster approach sacrificed too much detail however, computing times for overlaying vector based information can be excessive early polygon overlay routines were error-prone, expensive, slow today, there are situations in which it is clear that one approach is more functional than the other e.g. using "friction" layer to control width of buffer is only feasible in raster e.g. viewshed algorithms to find area visible from a point are feasible with elevation grids (raster DEMs), not with digitized contours e.g. land survey data can only be represented with precise lines an important current trend involves linking raster and vector systems, displaying vector data overlying a raster base raster data may be from a GIS file (perhaps a remotely sensed image) or from a plain scanned image file therefore, the question has evolved from "Which is best?" to "Under what conditions is which best and how can we have flexibility to use the most appropriate approaches on a case by case basis?" Basic issues four issues to the discussions of raster versus vector: coordinate precision speed of analytical processing mass storage requirements characteristics of phenomena B. COORDINATE PRECISION Raster precision e.g. MLMIS (Minnesota Land Management Information System, see Unit 9) early version used cell sizes of 40 acres depended on a rectilinear public land survey system "created" a state of perfectly square townships, composed of sections that were exactly one mile on each side variability in the original survey lines were only addressed if mis-alignment was more than an eighth of a mile locational precision was limited by size of cells - 40 acres, or 1/4 mile on each side all linear features had to be represented as 1/4 mile wide strips point features "occupied" 40 acres on the map at scales smaller than 1:250,000 these conditions were not difficult to accept for state-wide planning purposes, the database was adequate what about using smaller cells five meter cells have been scanned into systems this size is chosen on the basis that 5 m is the width of a #00 pen line on a 1:24,000 map of course, still not possible to represent objects smaller than 5 m e.g. fire hydrants, storm sewer grates, power poles precision not adequate for facilities managers on the other hand, at 5 m there is no appreciable loss of information for most natural phenomena, most occur at this scale or smaller Location of coordinates in raster in most cases it is unclear whether the center of the cell or one of its corners is the precise location of the coordinate e.g. top left corner of the grid may be referenced to a specific UTM coordinate, but it is unclear whether that location is in the middle of the top left cell or at the top left corner of that cell locational precision is thus 1/2 the cell''s width and height Vector precision can be encoded with any conceivable degree of precision precision is limited by the method of internal representation of coordinates typically 8 or 16 decimal digits are used ("single" or "double" precision) this limits precision to 1/108 or 1/1016 of the size of the study area respectively for equivalent raster precision we would need 108 by 108, or 1016 by 1016 cells respectively, neither of which is feasible, even with run length encoding however this argument may be artificial real vector data accuracy may be much worse than one line width e.g. digitizing from a 1:24,000 quadrangle map may appear to allow points to be recorded to the nearest 2 m, from a map that has common errors of 12 m Data precision vector precision is true for certain classes of data data captured from precision survey (Coordinate Geometry - COGO) plat maps created from land surveyors'' coordinates political boundaries defined by accurate survey few natural phenomena have true edges which can be accurately represented as mathematical lines soils, vegetation types, slopes, wildlife habitats, all have fuzzy boundaries due to the methods used to record the spatial information due to the transitional nature of variation in the phenomenon it can be argued that the fine lines from the vector system gave a false sense of precision lines on maps are typically 0.5 mm wide and are often assumed to represent the uncertainty in the location of the object in a raster system uncertainty is automatically reflected in the cell size true comparison in terms of precision is between raster cell size and the positional uncertainty of a vector object, not the coordinate precision THE RASTER/VECTOR DATABASE DEBATE SPEED OF COMPUTING raster data can be processed very quickly to answer most analytical questions involving overlays, proximity, and Boolean queries, no calculations are required to determine relative positions between layers in most cases, analysis requires cell-by-cell comparison of the contents of layers little arithmetic computation is required beyond simple conditional statements thus raster systems are suitable for small computers and have low program development costs the same questions require considerable computation in vector systems vector topology is complex complex geometrical problems must be solved, e.g. to find line intersection points complex geometrical algorithms are needed in polygon overlay to avoid generating spurious polygons calculation of distances may be complex depending on the projection/coordinate system used these requirements slow the response times of vector based systems may prevent large scale work, or multiple users on small computers on larger machines, the speed factor will still be evident: may limit the number of concurrent users may require that complex work to be done in batch mode at non-peak times D. MASS STORAGE Raster storage simplest raster data storage method requires one memory location (e.g. one or two bytes) per cell this is not at all efficient, but is used by several systems such systems severely limit the maximum numbers of rows and columns that can be used file compression is possible through a variety of approaches - see Unit 35 most common are forms of run length encoding degree of compression depends on spatial variability of data for very complex data the benefit of run length encoding can be negative - use of run length encoding should be optional there is a small overhead in packing and unpacking data compared to cell-by-cell storage Vector storage use very little storage for simple polygons memory requirements depend on complexity of objects also on precision of coordinates (i.e. single or double) volume also depends on which relationships between objects are stored in the database some systems store few relationships, require small amount of storage, compute other relationships as needed other systems offer more elaborate database models, store more relationships, require larger amounts of storage e.g. system A required 150 Kbytes to store 700 lots, system B required 5 Mbytes to store the same 700 lots, both using vector database models generally, vector systems should use less mass storage than a raster based system of high enough resolution to emulate the vectors assuming that the required resolution is defined by the line width on the input document, rather than by the width of the transition zone in reality E. CHARACTERISTICS OF PHENOMENA Raster sampling raster is a regularly spaced sampling of phenomena reflects lack of knowledge of spatial variation if we knew where the complex variation occurred, we would sample there more heavily - not wasting samples in areas of little variation e.g. knowing that population in California is concentrated in Los Angeles basin and San Francisco Bay area, we would collect more data in those areas than in the Mojave Desert raster is appropriate for remote sensing as the satellite is not intelligent enough to vary its sampling in response to variation on the earth''s surface data typically collected in raster format is satellite imagery (raw and classified) and elevation data Vector sampling vector representation permits more spatial variability in some areas than in others e.g. rapid variation at edge of area objects, none in the middle e.g. census tracts are small in urban areas and large in rural this is appropriate for social, economic, demographic variation which is much more intense in some areas than in others also appropriate for some natural phenomena e.g. variation in vegetation cover is more rapid near the Nile than in the Sahara e.g. variation in geology is instantaneous across a fault some objects are vector by definition variation in ownership is instantaneous at edge of lot variation in county is instantaneous at boundary data typically collected in vector format are coordinate geometry (surveyor''s records) and legal boundaries Features, entities and objects difficult to group cells together as an object with attributes in raster e.g. connect cells along a road e.g. connect cells as a numbered forest stand raster "sees" the world as populated by cells of uniform size raster arranges geography in fixed sequence - gives "sequential access" to world vector "sees" the world as populated by entities, represented in the database model as objects vector arranges geography in any sequence - gives "random access" to data operations on objects are easier in vector, e.g. analysis on a network - routing vehicles through a road network a point object must occupy a full cell in raster, this creates some problems: locations of water wells, emergency call boxes should not be indicated by raster cells, of some arbitrary cell size, in which they lie somewhere for environmental modelling of water quality, we may need to know precisely where the wells are to estimate costs of connecting wires to call boxes, we may need to be able to determine distances accurately on the other hand, some data can only be presented in aggregated form: e.g. census data is typically aggregated by census tract, both vector and raster representations may be appropriate depending on the application e.g. sensitive natural resource-related data, such as the location of rare plants or archaeological sites, may be presented in large cells to allow preservation of the phenomena by indicating its presence without identifying its location precisely F. SUMMARY the raster/vector debate can be summarized as a series of decision rules, for example: handout - Recommendations for the use of vector and raster structures Combinations of raster and vector is the best of both worlds available? to an extent it is, in two ways 1. can store data in one form and process it in another needs an efficient algorithm to convert from raster to vector and vice versa possible to capture and store data in vector mode, yet analyze it in raster form may save computing time and mass storage especially important for small machines 2. use combinations of systems which run raster and vector analytical systems in parallel e.g. install raster and vector systems on the same PC, use conversion functions in one or both systems e.g. overlay a vector based landuse parcel map over a Landsat image to improve the interpretation of the satellite image the image might then be used to correct a vector based vegetation parcel map REFERENCES Burrough, P.A., 1986, Principles of Geographical Information Systems for Land Resources Assessment, Clarendon, Oxford. See raster/vector summary on p. 169. Gahegan, M.N., and S.A. Roberts, 1988. "An intelligent object-oriented geographical information system," International Journal of Geographical Information Systems 2(2):101-110. Discusses an interface to a spatial analysis systems which allows the underlying geographical domain to be represented using a high-level, feature- oriented model. Star, J.L. and J.E. Estes, 1990. Geographic Information Systems: An Introduction. Prentice Hall, Englewood Cliffs NJ. Chapter 4 summarizes both sides of the raster-vector issue. EXAM AND DISCUSSION QUESTIONS 1. Summarize the dimensions of the debate between raster and vector database models. 2. Describe the design of a study to compare the amounts of storage used by raster and vector models in creating a digital representation of a specific map? If you feel this cannot be done, explain why not. 3. Summarize the arguments for raster GIS and the application areas in which it has distinct (a) advantages and (b) disadvantages. 4. What factors would determine the appropriate cell size for a raster GIS being developed for a power transmission corridor study? 5. In physics, light appears to behave sometimes as particles, sometimes as waves. Discuss whether a useful analogy exists between this debate in physics and the raster/vector debate in GIS. THE OBJECT/LAYER DEBATE A. INTRODUCTION B. THE LAYER VIEW Data models C. THE OBJECT VIEW Scale Time Object orientation Identity Inheritance Encapsulation D. COUNTER-ARGUMENTS FOR THE LAYER VIEW Do objects really exist? Environmental modeling Gradients Enlightenment E. AREAS OF APPLICATION Resource management (layer view) Utilities (object view) Transportation, hydrology (mixed) F. EXCEPTIONS Transport networks G. CONVERSION REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES THE OBJECT/LAYER DEBATE A. INTRODUCTION if we see the task of building a database as one of representing the contents of maps, then the choice is between: raster - divide the map into a sequence of identical, discrete elements and list the contents of each vector - list the features present on the map and represent each as a point, line or area object however the real purpose of the database is to represent the world maps are highly efficient ways of showing geographical variation, but representing the contents of maps is not the same as representing the world the objective of a map is visual - to capture geographic information and pass it to the user through the processes of visual perception the objectives of a database - measurement, analysis, modeling - may conflict with the objectives of a map the raster/vector debate is only one part of a much larger set of issues concerned with representing the world in spatial databases in meaningful ways this unit looks at the debate between layers and objects both use databases containing points, lines and areas to describe geographic variation the difference is in how the contents of the database represent the real world B. THE LAYER VIEW the real world is continuous an infinite number of places exist in the world locations are specified by some system of coordinates our ability to specify exact location is limited only by the precision of measuring devices in principle, location could be specified using coordinates with any number of digits geography can be described using a number of variables, e.g. elevation, soil type, mean January temperature, population density, county each of these variables can be determined at any place given suitable measurement instruments each variable can be conceptualized as a layer each layer captures the variation of one variable over the surface of the earth the database can be interrogated to determine the value of any variable at any place the result can be checked by visiting the specified place (ground truth) Data models GIS has developed certain data models for representing the layer view of the world, among them: raster - continuous geographic variation is approximated by finite-sized pixels polygon - the world is divided into irregular pieces, and the variable is assumed to be constant within each piece and to change suddenly at each boundary TIN - the world is divided into triangles, and variation is approximated by a plane within each triangle each of these models provides a way of capturing the variation of one variable over the earth''s surface each uses objects of various kinds - points, lines or areas - but the objects exist in the database, for the purpose of describing variation, and not in the real world e.g. contours are line objects, but exist to capture varying elevation C. THE OBJECT VIEW humans see the world as an empty space littered with various types of objects objects are used in speaking, writing and thinking about the world we find it more convenient to describe places in relation to objects of known location, than using coordinates of any kind e.g. the island of Mauritius is described as "in the Indian Ocean", not by its latitude and longitude objects are not simply artificial constructs used in describing variation (the layer view) but fundamental to our understanding of geography objects can be points, lines or areas any place can be occupied by any number of objects, and can be empty ("there''s nothing there") the layer view is inefficient when variables are defined only over limited geographic areas or classes of objects e.g. date of last forest fire is defined only over burned areas e.g. name of city is defined only over city points or areas, but name of county is defined everywhere in the US Scale the same object can be represented differently at different scales e.g. "San Francisco" can suggest: a city and county with legal geographic limits the entire metropolitan area of San Francisco Bay Time the object view has obvious advantages when well-defined objects move through time, e.g. people poorly-defined objects, e.g. clouds, present enormous problems e.g. how to track the movement of an object through time from one image to another when objects are changing form, splitting and merging Object orientation is a set of concepts originating in Computer Science and dealing with the design of both databases and processing algorithms includes programming languages (e.g. Smalltalk) and databases, many of which are still experimental argues that it is artificial and confusing to separate the definition of objects from the operations performed on them in GIS, to separate the nature of geographical variation from operations (analysis, modeling) performed on geographical variation has very recently stimulated debate within GIS about the nature and role of geographical objects contains several key concepts: Identity objects have identity which persists through various kinds of processing e.g. "Indian Ocean" is an identifiable and persistent geographical concept, even though it is not possible to delimit it precisely at any scale in GIS, object identity can persist through scale change and also change of graphic representation (e.g. city as point to city as polygon) Inheritance when new objects are created they can inherit the properties of their parents e.g. when a land parcel is created from survey records, it should inherit properties of those records, including the name of the surveyor and the date of the survey in GIS, it is sometimes desirable to deal with complex objects formed as collections of simpler objects, and to have the complex inherit properties of the simple objects, or vice versa e.g. the complex object "airport" might be composed of the simpler objects "runway", "tower", "perimeter" etc. Some of the attributes of "airport" should be accessible at the level of the simpler objects, such as "airport name" Encapsulation rather than separate the objectives of description and processing, it is desirable to couple objects with the operations performable on them in GIS, a "line" may be part of a county boundary, a river course, a highway or a contour - each has its own set of appropriate operations D. COUNTER-ARGUMENTS FOR THE LAYER VIEW Do objects really exist? if an object is real, it must be possible to visit a place and determine precisely whether the place lies in or on the object this is possible for certain mathematically defined objects, e.g. the Northern Hemisphere, the Equator for the object "house" to be real, every place on the earth''s surface must be either inside or outside we would need to define the precise footprint of the house we would need to be able to measure a place''s location exactly, and this is not possible because of limitations in geodesy and surveying technologies objects like "woodlot", "Indian Ocean" are not precisely defined in the layer view, very few geographical objects exist in any precise sense Environmental modeling theories of atmospheric, oceanic, geophysical processes are compatible with the layer view e.g. meteorology models fields, a field being a variable defined everywhere in two or three dimensions examples of such fields are temperature, wind speed and direction, atmospheric pressure environmental scientists use layers to describe soil and vegetation types, biodiversity, species ranges etc. much data for environmental modeling comes from remote sensing, which implies a layer view, at least until the data is analyzed and interpreted Gradients the object view of the world as populated by discrete objects is less compatible with notions of continuous geographical change: e.g. gradients, transition zones, zones of uncertainty, fuzziness, slopes, ecotones, clines Enlightenment people see the world, learn and talk about it and navigate using objects scientific understanding often requires a new perspective e.g. understanding electromagnetic radiation requires the concept of continuous, invisible fields e.g. atmospheric science requires continuous fields of temperature, pressure the scientist''s view of the world may have to be radically different from more traditional modes of human thought the layer view may be the scientist''s way of imposing a more enlightened perspective on the world THE OBJECT/LAYER DEBATE AREAS OF APPLICATION some areas of GIS application are more likely to take the object view, some the layer view Resource management (layer view) geographic variation can be described by a relatively small number of variables there are measurement problems in many areas, e.g. biodiversity conceptualization does not change radically from one scale to another difficulties occur in handling information on moving individuals, e.g. tracking grizzly bear, using the layer view Utilities (object view) the notion of empty space littered with well-defined objects fits the problems of managing large numbers of pipes, valves, meters etc. two objects may occupy the same location, but be separated vertically the notion of a variable measurable everywhere on the earth''s surface has little relevance Transportation, hydrology (mixed) roads, railways and streams are mostly well-defined line objects area objects are sometimes needed for analysis e.g. lake, noise buffer around highway some types of hydrologic modeling require a layer view e.g. overland, subsurface flows are not often confined to objects, better modeled as fields using layers F. EXCEPTIONS some information is not well suited to either view Transport networks often modeled as line and point objects (links and nodes) lying in the plane the DIME and TIGER databases of the US Census use a layer view to model street networks street intersections are nodes (0-cells) streets between intersections are links (1-cells) bounded by 0-cells blocks are areas (2-cells) bounded by 1-cells all places are in exactly one 2-cell or on a 1-cell as a consequence of this planar enforcement, all crossings of streets are also intersections - both overpasses and grade intersections are 0-cells this makes it difficult to use the model for routing in transportation, all information is limited to links and nodes - places that are not on links or nodes cannot have attributes of any kind the links and nodes form a 1-dimensional space embedded in the geographical 2-dimensional space objects are restricted to points on the network, or line segments of the network - there is no equivalent of the area object positions on the network can be specified in terms of link number and distance from the link''s beginning node in the object view, it would be necessary to create links and nodes wherever attributes change, thus fragmenting the network unnecessarily G. CONVERSION it is possible to convert from one view of the world to the other this is commonly done in the construction of a layer database from digitized "spaghetti", e.g. in digitizing a soil map the lines on the map (boundaries between soil classes) are digitized as separate objects objects may cross, one place may be empty or occupied by any number of objects the "building of topology" or process of planar enforcement essentially converts this object view of the world into a layer after conversion, every place either lies on a boundary, or has exactly one soil type REFERENCES Egenhofer, M. and A.U. Frank, 1987. "Object oriented databases: database requirements for GIS," Proceedings International Geographic Information Systems Symposium: The Research Agenda 2:189-211. Gahegan, M.N. and S. Roberts, 1988. "An intelligent, object- oriented geographical information system," International Journal of Geographical Information Systems Vol 2:101-10. Goodchild, M.F., 1989. "Modeling error in objects and fields," in M.F. Goodchild and S. Gopal, editors, Accuracy of Spatial Databases, Taylor and Francis, New York, 107-13. An accuracy perspective on the layer/object debate. Herring, J.R., 1987. "TIGRIS: topologically integrated geographic information system," Proceedings, AutoCarto 8:282-91. Herring, J.R., 1989. "A fully integrated geographic information system," Proceedings, AutoCarto 9:828-37. Maguire, D.J., M.F. Worboys and H.M. Hearnshaw, 1990. "An introduction to object-oriented GIS," Mapping Awareness 4(2):36-39. A good general introduction to the object oriented GIS. Worboys, M.F., H.M. Hearnshaw and D.J. Maguire, 1990. "Object-oriented data modeling for spatial databases". International Journal of Geographical Information Systems. EXAM AND DISCUSSION QUESTIONS 1. It has been said that geographical data is special because it is inherently infinite - the world contains an infinite number of variables defined at an infinite number of places. Do you agree? Are there any other examples? 2. Consider a set of driving directions from your house to your supermarket. Do they imply a layer or an object view of the world, or is neither appropriate? 3. Which layers would you identify as most important in describing the natural environment? HISTORY OF GIS A. INTRODUCTION B. HISTORIC USE OF MULTIPLE THEME MAPS C. EARLY COMPUTER ERA D. CANADA GEOGRAPHIC INFORMATION SYSTEM (CGIS) Purpose Technological innovations Key innovative ideas in CGIS Key individual E. HARVARD LABORATORY The Harvard packages Key individuals F. BUREAU OF THE CENSUS DIME files Urban atlases G. ESRI REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES GIS MARKETPLACE A. INTRODUCTION Purpose B. MARKET POTENTIAL C. PRODUCT CHARACTERISTICS Data model Platforms Cost Domain Performance issues Query/product mode Ease of integration Training needs User interface Maintenance Potential for creativity D. VENDORS'' PRODUCTS GIS products and vendors REFERENCES EXAM AND DISCUSSION QUESTIONS NOTES Current information on GIS vendors and products are available in several publications. Trade journals are also a source of current information. You may want to assign an exercise in which students will assess different products on the characteristics that are discussed here. UNIT 24 - GIS MARKETPLACE Compiled with assistance from Doug Banting, Ryerson Polytechnical Institute, Toronto A. INTRODUCTION Purpose to review current GIS marketplace products change all the time unfortunately this means the material in this unit will become outdated quickly sources for updating this material are provided in the References no intention to evaluate/compare/recommend different application areas are dominated by GISs of different functionality, approach even within application areas, capabilities vary widely different sets of capabilities meet different needs needs of each GIS installation vary widely no list can be exhaustive recent surveys list around 1,000 "GIS" vendors vagueness of definition of GIS makes it difficult to draw the line B. MARKET POTENTIAL many reports indicate the rapid growth of the GIS market C. PRODUCT CHARACTERISTICS GIS products can be evaluated and compared on a multitude of different aspects different characteristics are important for different applications and users variables to compare are include: Data model vector, raster or "integrated" Platforms range from Macintosh and PC to workstations to mainframes programs which work on micro-computers are generally very limited in their ability to handle large databases most major GIS programs now operate on several different platforms Cost cost ranges from free (several public domain GISs, e.g. GRASS from US Army Corps of Engineers) to tens of thousands of dollars for the major commercial programs Domain related to cost commercial products represent major investments in product development by a private company are designed to meet market needs and generally respond fairly quickly to changing needs are usually well supported by easily accessed customer service departments are regularly updated and enhanced public domain refers to the ownership of the software, however, practically speaking, it implies low cost but often a low level of support public domain programs can be freely copied and shared further listings of GIS-type products are published regularly in many newsletters and journals includes: raster based systems: GRASS - U.S. Army Corp of Engineers vector based systems: MOSS - U.S. Bureau of Land Management SAGIS - National Park Service ODYSSEY - developed at the Harvard Lab, now available from Department of Geography, University of Washington ROOTS - Laboratory for Computer Graphics and Spatial Analysis, Harvard University a third type of program are the educational programs which have been developed as teaching tools these programs general have significant limitations on the size of the database work on micro-computers are very low cost include extensive tutorial materials are updated regularly are not supported by extensive customer service departments includes: IDRISI - from Clark University OSU MAP - from Ohio State University GIS MARKETPLACE Performance issues systems can be arranged on three performance dimensions, each identified with one corner of an equilateral triangle corner 1 - volume of data corner 2 - richness of functionality corner 3 - speed of response no system can optimize for all three current systems choose two to optimize, sacrifice the third any system can be positioned somewhere in the triangle depending on what combination of factors it tries to optimize the nearer to a corner, the more that factor is important or optimized in the system "linear" (i.e. network) systems typical of AM/FM-type applications, need fast response, large volume, but can sacrifice functionality to achieve them locate between corners 1 and 3 "Choropleth" systems are those in which most objects are areas primary applications are with demographic, socio- economic and natural resource data for demographic and socioeconomic, areas are reporting zones for natural resources, areas are defined by homogeneous attributes these systems focus on integrity of polygon topology rather than the connectivity of vector nets applications in these areas require sophisticated analysis - simple query is relatively unimportant users can often afford to wait for product richness of functionality, volume of data more important than fast response to query these systems fall between corners 1 and 2 "Precision" systems are those in which absolute locations and boundaries must be identified these are large scale (high resolution) with lot boundaries, streets as double lines - "parcel" database analysis is unimportant mapping (subdivision plans) is very important these systems need large volume, low functionality, fast response not necessary CAD (computer assisted design) and computer mapping packages are adequate, topological database model not essential these are at corner 1 Query/product mode as noted previously, applications tend to be focused along a continuum of from strictly query oriented systems to strictly product oriented systems Ease of integration some GIS programs will import several different types of files from other programs but do not provide export facilities on the other hand, some GIS programs are designed specifically to operate in parallel with other programs and external databases Training needs some programs are easy to operate, some require week long training programs it is important to note that large, complex command structures are not necessarily a design flaw knowledgeable users will appreciate the flexibility and adaptability of these large programs also, ease of learning is not necessarily always a positive characteristic e.g. many statistical packages do not have simple "idiot-proof" interfaces specifically because their designers wish to ensure that only users who are knowledgeable about the statistical methods will understand how to access them User interface different types of interface were reviewed in Unit 18 Maintenance frequency of updates, cost Potential for creativity some programs allow many, flexible "tools" that can be combined in innovative ways to deal with problems not foreseen by the program''s designers D. VENDORS'' PRODUCTS GIS products and vendors note: GIS Sourcebook, 1989 includes an excellent summary of current GIS products and vendors listing cost, date first introduced, support, data structures, functionality an update will be published in 1990 and it may continue to be an annual publication REFERENCES American Farmland Trust, 1985. A Survey of Geographical Information Systems for Natural Resource Decision Making, Washington. Useful if fugitive survey. GIS Sourcebook, 1989, GIS World, Fort Collins, CO. Includes an excellent summary of current GIS products and vendors listing cost, date first introduced, support, data structures, functionality. An update will be published in 1990 and may continue to be an annual publication. Tomlinson, R.F., 1987. "Review of North American Experience of Current and Potential Uses of GIS," International Journal of Geographical Information Systems, 1(3):203- 218. Wheate, R., 1988. "GIS/Image Processing: A Summary of Software Packages Available for the PC," Operational Geographer, 14:30-33. Excellent review of PC software, including image processing. Several useful papers appear in the proceedings of an Atlanta GIS conference in 1986, published as Proceedings of GIS Workshop by ASPRS, Falls Church, VA: Caron, L. and J. Merchant, 1986. "Geographic information systems for non-urban local-level jurisdictions: existing alternatives," p. 110 Fleet, H., 1986. "SAGIS: a full-function public-domain GIS for micro and minicomputers," p. 301. Reeve, C. and J. Smith, 1986. "The varied viewpoints of GIS justification: a survey of vendors and users," p. 396. EXAM AND DISCUSSION QUESTIONS 1. Describe the "linear", "choropleth" and "precision" views of the world used in this unit. 2. Why are there so few vector-based GISs designed for small computers (e.g. PC)? 3. Describe the three-corner scheme used in this unit, and explain the nature of each corner or pole. Why is it not possible to develop a system which optimizes all three measures of performance? 4. Compare several different GIS products across the range of characteristics listed in this unit. Make some conclusions about the applications each of the products would be best suited for. TRENDS IN GIS A. INTRODUCTION this unit discusses trends in computer hardware and software for GISs new applications of GIS technology new sources of data B. HARDWARE Fast geoprocessing computing power is often measured in MIPS Million Instructions Per Second MIPS measures oversimplify the measurement of computing power but are nevertheless useful as broad bases for comparison arithmetic calculations can require execution of as little as two and as many as 100 instructions arithmetic on real numbers is better measured in MFLOPS ("megaflops") or millions of floating point (i.e. decimal) operations per second MIPS="meaningless information processing statistics" current personal computers and workstations used for GIS range from 1 to 5 MIPS within the next five years, 20 to 30 MIPS workstations are likely at roughly similar prices the power of personal workstations is currently increasing by close to a factor of 2 per year workstations will likely be coupled to 1,000 MIPS file servers which can extract software and data at high speed for analysis in personal workstations over the next decade, 1,000 MIPS machines will become common in large organizations to this point, advances in computing power have always found new areas of application. What new applications of GIS will take advantage of higher speeds? larger data sets, higher levels of spatial resolution more complex models more complex analysis for decision-making better methods of display and visualization Parallel Processing trend toward different computer architectures away from single processors operating on data in sequence parallel processors can perform tasks on several different processors simultaneously within the same computer what GIS processes will be suitable for parallel processing? analyses which require repeating the same steps everywhere on the map easier to see applications for raster data than for vector since each pixel is independent e.g. finding route for a vehicle across a rugged terrain e.g. image processing applications such as image classification, visualization, scene generation Memory trend is toward lower costs for ever larger computer memories cost of storing large GIS datasets will come down more data can be placed "on-line" for faster access the nature of geographical data (high volume, infrequent update) is suitable for optical storage once written, cannot be changed CD-ROM - 5 1/4 inch disks with 250 MBytes, enough to store all streets in Los Angeles once a "master" has been created, the unit cost of CD-ROM data is only about $10 optical WORM - 12 inch disks with 2 GBytes, enough for the contents of 100 topographic maps erasable optical disk is available (e.g. NEXT computer) very high density of data Workstations "dumb" terminals connected to a central processor are gradually being eliminated in favor of desk top computers especially popular are "workstations", which have excellent graphic performance and sophisticated user interfaces the exact distinction between personal computers and workstations is unclear workstations are generally more powerful workstations are generally more expensive ($10,000 vs. $4,000) workstations generally use UNIX operating system rather than DOS workstations have more powerful graphics capabilities (1280x1024 rather than 640x480) originally, workstations developed for the scientific and engineering market workstations function effectively as nodes on a network devoted to GIS processing a large CPU (a fileserver) for database management and centralized processing may still be required as part of the network, while most processing is done at individual workstations data can be distributed around the disks on the network Networks the dominant hardware system architecture of recent years, the multiuser host, is giving way to multiuser network architectures the network integrates compute servers, file servers, workstations and shared peripherals any user can access data, peripherals across the network the network will likely be linked to other networks through "gateways" this architecture requires fast and economical data transfer and the availability of powerful workstations data transfer rates of MBytes/second are common at low cost only hundreds of bytes per second were possible 15 years ago this change of cost has had enormous impact on the ways people organize computing hardware manufacturers are beginning to offer network architectures, and networking among hardware of different vendors is likely to become more available over time requires interchangeability of parts; standards for communication, data and software; common operating systems this will lead to reduction in the power of data processing centers in favor of "information management" systems based on control of transactions the role of the "computer center" based on a large mainframe is changing rapidly Hardware for specialized processing functions compute servers, file servers, sort servers (e.g. TRW''s Fast Data Finder) and search servers (e.g. Excel''s Sorting Engine) are now being developed for networks these are specialized computers attached to networks for specific functions map overlay using hardware intersecting tools will be developed in the future, perhaps other GIS functions as well this trend will continue since such hardware tools can provide enhanced system performance for the entire network Operating systems continued diversity is likely for the immediate future UNIX is making gains, perhaps especially in inexpensive machines, and in scientific and engineering applications, but is it likely to become universally supported? this will make software development and networking more difficult, and puts a premium on GIS software, database management systems, and applications which can work on different vendors'' machines Peripheral devices excellent raster devices are now available (e.g., electrostatic and laser printer/plotters) for graphic/cartographic output costs of these systems are directly related to the size of the product scanning has not taken over from digitizing in GIS applications, and is unlikely to do so until some means, perhaps artificial intelligence, can be applied to separating extraneous information Specialized workstations a data entry device is needed which will allow correction of data as they are acquired will probably have a large flat display, multiple graphic memory planes, and interactive data capture capabilities the workstation should be able to "check-out" and "check-in" work areas (e.g. mapsheets) from the larger database maintained by a server on the network an "electronic sandbox" will be useful for interactive, GIS-based analysis/modeling and land use planning the design of this workstation will require some very creative thinking workstations specialized for particular uses (e.g., land planning, water resources, forestry) are likely to be developed as the number of users increases in such specialized fields analyzing data on the globe (e.g. oceans, atmosphere) will require a specialized workstation which can display data on the globe''s curved surface e.g. the globe could be "browsed" using a track-ball to rotate the image as GIS becomes a standard decision support technology, entire conference rooms will be devoted to its use containing specialized GIS workstations, large GIS display devices, and GIS planning/conference tables TRENDS IN GIS SOFTWARE Database management systems while present DBMS''s are effective for managing tabular data, they are not effective for the "long transactions" required when cartographically referenced and topologically related data are altered such "long transactions" occur because simple changes in cartographically referenced data may require changes in topologically related items, and because the whole process of updating geographic information differs from updating tabular data transactions on geographical data could be confined to a single specialized workstation queries and analysis typically do not modify the data, so these are easier to execute across the network, e.g. between a workstation and a central fileserver Relational DBMSs trend toward using relational DBMSs (often with SQL style user interfaces), because of their "open architecture" it is becoming easier to exchange one DBMS for another within a GIS object-oriented database structures are being proposed for GIS use, but they present some problems e.g. they are not well adapted to storing natural feature information not well suited to complex spatial analysis often have embedded proprietary data structures DBMS versus Fourth Generation Languages the DBMS approach often involves highly structured application programming, often at the expense of ad hoc query capabilities user must learn complex rules of syntax may be a valid approach for static databases which are only used for simple, repetitive queries the trend in GISs is toward the use of Fourth Generation Languages (4GLs) which provide commands, tools, procedures, and report writers to permit easy ad hoc querying of a database these provide intelligent interfaces close to natural language, however, definitions become vague, less rigorous use of 4GL may detract from the GISs ability to perform complex analysis GIS system integration the marketplace increasingly demands compatibility between diverse hardware and GIS software at the same time, GIS software needs to interface to an increasing diversity of DBMSs, because different applications often require different DBMSs in many applications, records are already stored in a DBMS when the GIS capability is added to allow geographical access to these records, it must interface with the existing DBMS Display products improved cartographic products are certain, since there is both intense user demand and the technology required to support such improvements map output will continue to be judged against hand- made products 3D displays, overlaid with both cartographic data and representations of the built environment, are likely early developments in GIS technology but major questions remain about how to gather, compile, model and structure 3D data because of these problems, it is not clear what kinds of analysis are needed or appropriate for 3D data Interfaces to other technologies interfaces between GIS, CADD, remote sensing, image processing, architectural graphics, and other technologies are going to be increasingly easy to create the differing data types produced by these technologies will be more frequently combined in shared databases User interfaces more sophisticated, flexible and well managed graphic user interfaces are inevitable users are becoming increasingly impatient with software which requires any training or support from the vendor training and support add high and probably continuing costs to GIS acquisitions D. NEW APPLICATIONS OF GIS TECHNOLOGY because GIS technology is becoming more affordable, more reliable, more widely used and better known, new applications of GIS technology are likely to rapidly increase, just as the applications of computer graphics have increased Modeling and decision support Geographic Information Modeling System (GIMS) technology will be developed and used in providing decision support in a growing number of fields e.g. current interest in GIS applications in transportation planning, requires some modifications to standard GIS models, addition of new functions e.g. modifications to allow lines which cross but do not intersect e.g. functions to measure distances between objects via the network e.g. functions to solve standard problems in transportation, such as predicting traffic flows can applications in areas as different as transportation planning and forestry be served by the same GIS software? will there be fragmentation of the GIS field by area of application? Sciences and mathematics GIS technology will be applied widely in the sciences in the near and middle term future e.g. 3D capabilities for geology, geophysics, hydrology, mining GIS modeling in landscape ecology in the longer term, applications for GIS technology may develop in areas of image processing, e.g. X-rays, other types of medical imaging, where superimposition of data, analysis may have similar importance global issues - tropical deforestation, acid rain, greenhouse effects, endangered and threatened species, and similar problems are likely to be analyzed using GIS technology in the 1990s GIS networks, similar to the global weather monitoring and prediction network, may evolve they will probably make use of super computers, parallel processing, and artificial intelligence to cope with the massive databases and the complex models involved such models are currently in very rudimentary form, e.g. global climate models used to predict greenhouse warming effects use very large cell sizes (5 degrees lat/long) GISs will play a role in managing the global environment, perhaps used to identify the world''s most sensitive habitats; then a country''s agreement to conserve these habitats may be exchanged for forgiveness of international debts E. NEW SOURCES OF DATA Remote sensing following Landsat, ERTS, Thematic Mapper and now EOS, and competing with them to supply the demand for satellite- borne imaging systems are the SPOT system and systems which the Japanese, Russians, and others may bring on line in the 1990s as data resolution increases, costs fall, service becomes more reliable, and user demand increases, satellite remote sensing will become more important in supplying data for GIS use already the supply of data vastly exceeds our ability to analyze it better methods of scanning, archiving large amounts of data will be needed remote sensing cannot provide the information required for many kinds of analyses in the longer term (and with such new technologies as the fabrication of ultraminiature sensing devices on silicon chips), the trend will be toward "an instrumented universe" instruments to monitor the globe will be broadcast over the earth, probably telemetering their information to networks of users Error/uncertainty as more data and more forms of data are gathered, and there is increased pressure to combine these data for analyses, increasing attention will have to be paid to data error and uncertainty Data sharing essential to reduce costs, solve widespread problems, and fully utilize available technology security considerations, political divisions, and other factors will continue to inhibit sharing because problems will increasingly be recognized as crossing international boundaries and having global implications, and because the necessary technology is now becoming available, more global databases will be built in some situations, private businesses will compete with government agencies in supplying data to both government and the public questions will increase about how to best serve the public interest in such cases data gathering by government will always be under fire because of its cost F. CONCLUSION the immediate future will be a time of explosive growth in the development and use of GIS technology the signs are that current growth rates will continue it is very likely that these predictions about the period will prove to have been too conservative predictions about the value of technological change often prove to be totally wrong e.g. the development of microchip technology which was the key to cheap computers was driven first by the need to save space and weight in space vehicles REFERENCES Dangermond, Jack and Morehouse, Scott. 1987. "Trends in Hardware for Geographic Information Systems," Proceedings AUTOCARTO 8, ASPRS/ACSM, Falls Church, VA. (Also available from ESRI) Dangermond, Jack, 1987. "Trends in Geographic Information Systems Software," Proceedings IGIS: The Research Agenda, NASA, Washington, DC. (Also available from ESRI) U.K. Department of the Environment, 1987. Handling Geographic Information, Report of the Committee of Enquiry chaired by Lord Chorley (the "Chorley Report"), Her Majesty''s Stationery Office, London. DISCUSSION/EXAMINATION QUESTIONS 1. Imagine a few important future developments which you believe ought to occur in GIS technology. Do they depend chiefly on hardware or software, or both? Explain. 2. Select a particular discipline, field, or specialty to which you think GIS technology might be applied. Discuss some specific future applications and their implications for professional practice in that field. What specialized GIS features would these applications require? 3. What are the characteristics of an "ideal" source of data for use in a GIS? List them and discuss each one. Now, are there any potential sources of GIS data which have not yet been utilized but which meet most of these criteria? 4. Design a GIS workstation for the support of global science (i.e. the analysis and modeling of data for the entire globe). What operations, database models and user interface features would it have? GENERAL COORDINATE SYSTEMS INTRODUCTION B. PLANE COORDINATE SYSTEMS - CARTESIAN COORDINATES Determining Coordinates Measuring Distance C. STORING COORDINATES Integers vs real numbers Computer Precision Precision of Cartesian Coordinates Propagation of coordinate errors D. PLANE COORDINATE SYSTEMS - POLAR COORDINATES E. GLOBAL COORDINATES - LATITUDE AND LONGITUDE Determining Coordinates Important Terms Measuring distance Question of Precision F. DETERMINING POSITION REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES DISCRETE GEOREFERENCING A. INTRODUCTION B. STREET ADDRESS Using addresses in GIS Method Example - Addmatch using TIGER C. POSTAL CODE SYSTEMS US ZIP Codes Canadian Postal Code Problems D. US PUBLIC LAND SURVEY SYSTEM PLSS References E. GEOLOC GRID GEOLOC References Precision F. CENSUS SYSTEMS Converting to georeferences G. ISSUES CONCERNING DISCRETE GEOREFERENCING Hooks Purpose Standardization REFERENCES DISCUSSION OR EXAM QUESTIONS NOTES This lecture concludes the module on geocoding. Several important practical issues are raised here that will be important particularly for those who will be working with economic and demographic databases. UNIT 29 - DISCRETE GEOREFERENCING A. INTRODUCTION the georeferencing methods covered so far (latitude- longitude, Cartesian, projections from latitude/longitude to the plane) are continuous this means that there is no effective limit to precision, as coordinates are measured on continuous scales will now look at discrete methods - systems of georeferencing for discrete units on the earth''s surface many of these methods are indirect this means that the method provides a key or index, which can then be used with a table to determine latitude/longitude or coordinates for example: a Zip code is an indirect georeference rather than give latitude/longitude for a place directly, it provides a unique number which can be looked up on a map if coordinates are needed because these methods are indirect, it is important to consider the precision of these systems precision is related directly to the size of the discrete unit which forms the basis of the georeferencing system many methods of indirect or discrete georeferencing are in common use following are 5 of the most common B. STREET ADDRESS the precision of street addresses as georeferences varies: is highest for apartments or houses in cities is lowest for rural addresses or post office box numbers, where the address may indicate only that the place is somewhere in the area served by the post office Using addresses in GIS general approach is to match address to a list of streets (called address matching or "addmatch") spelling and punctuation variations make this difficult e.g., Ave. or Avenue, apartment number before or after street number a failure rate of 10% is regarded as good, 40% is not uncommon. In such cases it is necessary to find the street by hand, which may take as much as 5 minutes per address in large cities Method 1. identify the block containing address from table of address ranges in each block i.e., 551 B St. lies in the block running from 501 to 599 2. estimate position of house using the coordinates of the end points the exact position of the house can be estimated by linear interpolation i.e., 551 is roughly half way down the block such estimates are crude in many countries (e.g. India) addresses are not sequential along the street, but reflect date of construction if the street is curved the estimate can be improved by using intermediate points (called shape points) shape points are associated with the same information that block endpoints have, including building numbers and other georeferences databases to support addmatching exist in most industrialized countries. in the US, DIME files were developed for this purpose in the late 1960s by the Bureau of the Census, and are now being replaced by more comprehensive TIGER files see Unit 8 for an introduction to TIGER handout - TIGER system: An overview (9 pages) Example - Addmatch using TIGER handout - Map of west central Columbia, MO note: intersection of West Blvd and W. Broadway is W of center handout - Portion of the TIGER file (Boone County, MO) demonstration - the solution of this problem could be demonstrated using the TIGER file for Boone County, MO TIGER files can be readily accessed and displayed using the SAFARI package from Geographic Data Technologies, Inc Problem: find the latitude and longitude of 950 West Broadway Procedure: 1. search the TIGER file for Boone County for features with the name "West Broadway" or equivalent (W. Broadway, Broadway W. etc) get about 30 matches for the length of W. Broadway 2. find the record that lists the address range which includes address 950: record #6714 covers the block from Greenwood to West Blvd, and includes the following data: longitude 92.3503 to 92.3527 latitude 38.9519 to 38.9522 (indicating that the street has been coded from east to west) ZIP code 65203 on both sides census tract 6 on the left side, 7 on the right address ranges 900 to 998 on the left, 901 to 999 on the right no shape points, so we assume the block is straight 3. determine the coordinates of number 950: assume that the houses are evenly spaced along the street, and that the full range of addresses is used (this is not necessarily a good assumption, but it''s the best that can be done without more information). longitude is: 92.3503 + {(950-900) * (92.3527-92.3503) / (998- 900)} = 92.3515 latitude is: 38.9519 + {(950-900) * (38.9522-38.9519) / (998- 900)} = 38.9521 note that the results are given to the same precision as the block endpoints we could have calculated more digits, but they would have been meaningless given the accuracy of the inputs problems with determining georeferences by address matching: cases where matching fails (10 - 40% common) rural areas and box numbers where there are no street addresses long blocks with uneven houses street addresses do not always identify a parcel or lot, and some parcels have many street addresses (e.g., apartments, condominiums) address matching is very commonly used to determine georeferences for marketing and retailing, health and the collection of social statistics C. POSTAL CODE SYSTEMS postal code systems have been set up in many countries these often provide a high level of spatial precision US ZIP Codes in the US, zip codes are designed to assist with mail sorting and delivery the codes are hierarchically nested, states are uniquely identified by one or more sets of the first 2 numbers a 5 digit ZIP code identifies the area served by a single post office this gives precision of many city blocks the 9 digit ZIP potentially provides a much higher level of spatial resolution, but problems exist buildings may have different codes for different floors overlapping and fragmented boundaries Problems: addresses associated with a single zip code were developed from lists of addresses representing postal walks, rather than from maps. Addresses were seen as points along the streets rather than parcels of land as a result, the area associated with a single 5 digit zip code does not necessarily have a well- defined geographical boundary therefore, areas are sometimes not well defined, and they may overlap it is possible for the faces of a city block to have different ZIPs - boundaries of the zip code areas have been interpolated and files giving the coordinates of 5 digit ZIP code boundaries are available from a number of vendors warning: some of these have used simple Thiessen polygons to delineate associated areas i.e. the area of a ZIP code has been defined as the area closest to the corresponding post office, instead of the true area overhead - Rennie''s ZIP code map of Los Angeles note unusual shapes of zones and boundaries Canadian Postal Code the first 3 digits of the Canadian postal code define a Forward Sortation Area which is a useful unit for mapping (average population around 20,000) and is hierarchically nested within provinces the full 6 digits provide resolution of a few block faces files exist which allow the 6 digit code to be converted to census reporting zones and latitude/longitude Problems postal code systems have great potential as discrete georeferences however, they have not been designed for this purpose, hence the problems noted above since their purpose is, in principle, internal to the postal system, it is also difficult to ensure stability through time (codes frequently change) however, there is great demand for statistics based on postal georeferences because of their applications in retailing and marketing and the ease with which they can be merged with customer account data DISCRETE GEOREFERENCING D. US PUBLIC LAND SURVEY SYSTEM PLSS is the basis for land surveys and legal land description over much of the US unlike the previous systems, it is designed to reference land parcels because it is a comprehensive, systematic approach it is possible to use it as a georeference commonly used by agencies such as the Bureau of Land Management and the US Forest Service, and within the oil and gas industry. packages exist to convert PLSS descriptions to latitude/longitude PLSS References handout - US public land survey system (not included, see Strahler and Strahler 1987, pp. 485-487). begin with a surveyed principal Meridian, several of which were laid out as north-south baselines in the Western US the area on both sides of the meridian is then blocked off in 6 mile by 6 mile areas, identified by township and range numbers since this is a square grid system the township and ranges must be offset as one moves NS along the meridians the 36 square mile sections within each township are numbered from the top in a standard order each section is divided into four quartersections, and these can be further divided if higher spatial resolution is needed, as for example in describing the location of an oil well PLSS is most effective where the simple rules were followed closely, however: much of the Northeast was settled long before the advent of the PLSS there are major variations in the Southwest where the PLSS runs up against areas of early Spanish land tenure errors in the early surveys have become embedded in the system and must be replicated in packages which offer PLSS to latitude/longitude conversion E. GEOLOC GRID handout - GEOLOC description (3 pages) an elaborate and more systematic example is provided by the GEOLOC geographical referencing system (see Whitson and Sety, 1987), which can be used to index every 100 acre parcel in the continental US GEOLOC References the first level of partition consists of 2 rows and 3 columns, each partition or tile being 25 degrees of longitude by 13 degrees of latitude these tiles are ordered row by row from the top left (Pacific Northwest) and numbered 1 to 6 at the next level, each tile is divided into 26 rows of one half degree latitude and 25 columns of one degree longitude, the area covered by one 1:100,000 USGS quadrangle. each of these subtitles is given a two letter designation using a letter to represent the row (A through Z) and one to represent the column (A through Y) each subtile is divided into 4 rows and 8 columns of 7.5 minute quads, numbered row by row from 1 to 32 at the next level, these are divided into 4 rows and 2 columns, designated by assigning the letters A through H row by row finally, each of these divisions is divided into 5 rows, lettered A through E, and 10 columns numbered 0 through 9 to produce 50 cells of approximately 100 acres each an example of a full designator for a 100 acre parcel (in the Los Angeles area) is 4FG19DC6 Precision hierarchically nested systems like GEOLOC, and to some extent PLSS, allow the user to vary spatial precision depending on the application 4FG19 would identify a 7.5 minute quadrangle, or an area roughly 9 miles across the full 4FG19DC6 gives an area roughly 2000 ft across F. CENSUS SYSTEMS note: this topic was introduced in Unit 8; a handout and overhead from that unit are reproduced here the major source of social and economic data in many countries is the Census statistics are collected and reported using a complex system of several different types of reporting zones: political or administrative units used for reporting (province, county, city, electoral district) units defined for ease of data collection (block, block group, enumeration district) but often too small to use for data reporting due to privacy regulations units designed to be homogeneous for ease of analysis (census tract) in the US the major units are: block group (formerly enumeration district) the smallest reporting unit, about 1000 population census tract primarily in large cities, about 5000 population, intended for analysis Minor civil division (mostly on township boundaries) County State overhead - Hierarchy of census areas, 1990 census handout - US census units Converting to georeferences for the larger units, the main method of converting from census zone to georeference is through boundary files, which are digitized boundaries established for most of the major units and readily available from vendors or the Bureau for a smaller unit such as the block group (formerly ED) it is often possible to obtain from the Census Bureau a representative point or centroid which can be used as a georeference for units with uneven population distribution the centroid may be located in the area of highest population density G. ISSUES CONCERNING DISCRETE GEOREFERENCING Hooks is useful to consider how many different reference systems are related to specific datasets i.e. TIGER has street addresses, census zones and lat/long associated with each record allows linking of many different data sources Purpose many of these systems were set up for special purposes, and have only later become the basis for general georeferencing e.g. post office does not have a mandate to maintain these systems for georeferencing purposes, therefore will only add ZIPs when mail is delivered to the location zones may change without notice or record e.g. census is only updated every 10 years as a result, these systems do not necessarily have "quality control" in the georeferencing sense no agency maintains a file of new addresses Standardization general purpose systems such as GEOLOC use regular divisions of the earth''s surface, while special purpose systems tend to use irregular divisions in the past, efforts have been made to impose greater regularity on discrete georeferences e.g. "gridiron" system of rectangular street networks (Washington, DC) in the last century some city names were changed so that no two places in a single state had the same name introduction of the ZIP code however, such standardization efforts generally are not consistent or long-term rectangular street networks are no longer in fashion referencing systems such as PLSS are now fairly chaotic despite simple principles ZIPs are not consistent given their usefulness, is it possible to set up a single, common system of discrete georeferencing? REFERENCES Strahler, A.N. and A.H. Strahler, 1987. Modern Physical Geography, 3rd edition, Wiley, New York. Contains a thorough description of the US PLSS. U.S. Department of Commerce, Bureau of the Census, 1988. Tiger/Line File: Boone County, Missouri, Technical Documentation, Washington, D.C. Whitson, J. and M. Sety, 1987. "GEOLOC Geographic Location System", Fire Management Notes, 46:30-32. DISCUSSION OR EXAM QUESTIONS 1. Determine the resources available to you in geocoding street addresses for your local area. What sources exist for obtaining (a) street index (DIME or TIGER) files, (b) address matching software, (c) maps with address ranges marked on streets? Estimate the time it would take to geocode 1000 addresses in this area using various combinations of these resources. What percentages of hits and misses would you anticipate? Estimate the cost per address which you would have to charge a sponsoring agency for such a project. 2. Discuss the usefulness of the PLSS as a georeferencing system in your local area. How complete is it? What local agencies or organizations make use of the PLSS? What is its relationship to the local system of land tenure? 3. Determine the 5 discrete georeferences described in this unit for your own residence. What problems do you have in doing this? What is the potential or actual precision of each method? 4. Discuss the ways in which the system of discrete georeferencing in the US (or your own country) might be improved. What is the appropriate level or agency of government to sponsor or undertake such an improvement? Which existing system of georeferencing should it be based on? Who are the potential users of such a system, and how might cost be shared? STORAGE OF COMPLEX OBJECTS A. INTRODUCTION B. REPRESENTATION OF SIMPLE SPATIAL OBJECTS C. STORAGE OF OBJECT ATTRIBUTES D. REPRESENTATION OF TOPOLOGY Relationships in networks Relationships between areas The CanSIS data structure E. DISADVANTAGES OF ARC-BASED REPRESENTATIONS F. OTHER ISSUES ABOUT DATA STRUCTURES REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES UNIT 30 - STORAGE OF COMPLEX OBJECTS Compiled with assistance from David H. Douglas, University of Ottawa A. INTRODUCTION previous units have been concerned with specifying and transforming locations GIS deals with objects such as lines and areas occupying extended locations, and with the complex relationships between them spatial data normally represented in vector systems as: objects (points, lines and areas) attributes associated with objects relationships between objects this unit considers how to construct objects out of sets of coordinates, and how to create digital representations of attributes and relationships? many alternatives exist for structuring spatial data within a digital store here we review some of the most common which have been proven useful by years of experience and application B. REPRESENTATION OF SIMPLE SPATIAL OBJECTS spatial objects - points, lines, areas - can be coded as x,y coordinate pairs: point: (x,y) line: (x1,y1), (x2,y2), ... , (xn,yn) area: (x1,y1), (x2,y2), ... , (xn,yn) note that the digital representation of the three spatial objects is identical, n=1 in the first case note the convention used throughout this unit: the name of the record type, followed by a colon, then the items forming the record to construct a line or area, we simply connect each consecutive pair of points with straight lines in the case of an area object we might insist that the last point be the same as the first, or alternatively assume that the last point is connected to the first by a straight line to close the area points need not always be connected by straight lines - see later discussion in this unit C. STORAGE OF OBJECT ATTRIBUTES attributes of objects can be stored as tables for points, the coordinates can be included as two additional attributes for each object, so that the entire data structure can be a simple table this is not possible for lines and areas because of the variable number of coordinates the data structure usually consists of two parts: coordinates in one file, each set representing a single object identified by a unique ID attributes in a table with one attribute identifying the objects to which each is linked in various GIS products are a number of different names used for these associated files: Attributes: Descriptive Data Set (DDS), Polygon Attribute Table (PAT) Coordinates: Geometry, Image Data Set (IDS), Locational Data. Geography databases of this type, populated by objects and their attributes, are common in cartographic or CAD (computer assisted design) databases. many common packages for mapping use this structure SAS/GRAPH and ATLAS (from Strategic Locations Planning) are examples D. REPRESENTATION OF TOPOLOGY the key to a GIS data structure, as distinct from cartographic databases, is the emphasis on the coding of relationships between objects in GIS, the term topology is used to refer to these relationships between objects however, the term topology has a much more precise meaning in mathematics topological properties are those which are preserved when an object is stretched or distorted, and are therefore distinct from geometrical properties e.g. a circle can be stretched to form any shape of polygon, but no amount of distortion will make it into a cube there is an enormous range of possible relationships between objects (see Unit 12 for a detailed discussion of relationships) simple examples include "nearest to", "crosses", "is connected to" these expressions can be used to relate two objects together for example, each object might be given an attribute which is the ID of the nearest other object in the same class, thus coding a relationship between pairs of objects two specific types of relationships are often coded in GIS databases: relationships in networks relationships between areas Relationships in networks networks consist of two types of objects: lines, also known as links, edges or arcs nodes, also known as intersections or junctions a simple way to code relationships between links and nodes is to give each link two additional attributes - the IDs of the nodes at each end (to-node and from-node) overhead and handout - Relationships in networks thus there will be two types of records: 1. arc coordinates: (x1,y1), (x2,y2), ... , (xn,yn) 2. arc attributes: to-node, from-node, length, attributes using this structure, it is possible to navigate from link to link by searching for links with matching node numbers the DIME datasets created by the Bureau of the Census for the 1970 Census used this concept to code US street networks each node or intersection was given a unique ID the TIGER line files do not have unique node IDs, but the network can be navigated by matching the (x,y) coordinates of link endpoints however, searching for matches is not particularly efficient a better data structure would list the links at each node this could be done by adding a third type of record: 3. node: (x,y), adjacent arcs (positive for to- node, negative for from-node) see overhead finally, it is awkward to have to store a variable number of arc IDs for each node, so it might be better to use two files: 1. a simple ordered list which compresses the node file into a string of arc IDs 2. a table in which the position of the first arc in the list is stored for each node positionarc nodeposition 1 1 a 1 2 -5 b 3 3 3 c 6 4 2 d 9 5 -1 6 -4 7 -2 8 5 9 4 10 -3 Relationships between areas knowing adjacency is important when working with area objects many programs are more efficient if we know which areas share common boundaries many systems store boundaries as several individual arcs and include arc attributes (pointers) which indicate which polygon falls on each side of the arc by storing common boundaries, instead of complete polygon boundaries, can avoid: duplication in digitizing problems which arise when the two versions of each common boundary do not coincide many systems would store this set of three areas using three datasets: overhead/handout - Relationships between areas a polygon attribute table an arc attribute table a set of (x,y) pairs representing the arc geometry note: in ARC/INFO these are referred to as the .PAT, .AAT and .ARC files respectively disadvantages: to construct polygons, must search for arcs with correct polygon IDs and then match node numbers for polygon B above, the result would be arcs 3, 4 and 5, with 5 in reverse order this data structure cannot represent area objects which are fragmented - islands, for example The CanSIS data structure an example of a more fully developed data structure is the database of the Canadian Soil Information System (CanSIS) developed by the Canadian Department of Agriculture in the 1970s has four interrelated datasets, with pointers a very simple summary of the CanSIS structure''s four datasets is: 1. Object: attributes, first-polygon, last- polygon soil types would be coded as objects an object can describe many discontiguous polygons sharing the same attributes 2. Polygon: object ID, next-polygon, first-arc, last-arc here "object" is the object of which the polygon is a part 3. Arc: R-polygon, L-polygon, next-R-arc, next-L- arc, previous-R-arc, previous-L-arc, first-point, last-point the arc pointers are to the next arcs around the left and right polygons diagram first-point and last-point identify the first and last (x,y) pairs of this arc in the point data below 4. Point: (x,y) the points owned by each arc are stored in sequence in this dataset note how each type of record points to records of other types e.g. each object points to the first and last polygons forming the object e.g. each arc points to the polygons on its left and right, and also to other arcs STORAGE OF COMPLEX OBJECTS E. DISADVANTAGES OF ARC-BASED REPRESENTATIONS areas do not always exhaust the space the method may be inefficient for coding data sets which consist of isolated polygons, e.g. woodlots in an agricultural area, various types of land use, house footprints on an urban map areas often overlap a database of old burns in a forest contains polygons which may overlap and do not exhaust the space, so there are few if any common boundaries although the great majority of programs work better for arcs than for polygon representations, it is sometimes necessary to rebuild complete polygons from arcs, e.g. for display when a polygon is to be filled F. OTHER ISSUES ABOUT DATA STRUCTURES the network and area data structures discussed above reflect common practice in existing GIS, but are far from comprehensive a data structure must be chosen to balance the need for: 1. efficient processing arcs are more efficient than polygons for many operations 2. accurate modeling of reality objects are abstractions of reality; the conditions imposed, e.g. non-overlapping polygons, will affect the accuracy of the abstraction the conceptual structure of the data which the system presents to the user need not be closely related to the actual data structure the simple structures described above can be used to present much more complex views to the user Example: some systems allow the user to work with complex features, which are aggregates of simple features a simple feature such as a point can be part of several complex features this idea is useful in utilities applications, where it may be necessary to group together several objects, such as a house, land parcel, pipe, shutoff valve and gas meter, into a complex object ("account") Example: analysts of spatial information must often deal with the fact that reporting zones, such as counties, change from time to time Great American History Project - to analyze the spatial distribution of the US population by county since 1800 requires a database which can present the user with different views of the set of counties at different times, as boundaries change one solution is to define a common set of arcs, but to build them selectively into area objects at each time period the arcs list contains every line which has ever been a part of a US county boundary the boundaries of objects (counties) are defined differently at each time period an arc is part of the network of boundaries at time period t if the polygon IDs on its right and left belong to different objects at time t data structure would have these record types: 1. Object: attributes at time t 2. Polygon: objects to which polygon belongs at each time 3. Arc: L-polygon, R-polygon REFERENCES Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment, Clarendon Press, Oxford. See Chapter 2. Haralick, R.M., 1980. "A Spatial Data Structure for Geographic Information Systems," in H. Freeman and G.G. Pieroni, eds., Map Data Processing, Academic Press, New York. Peuker, T.K., and N. Chrisman, 1975. "Geographic Data Structures," American Cartographer 2(1):55-69. van Roessel, J.W., and E.A. Fosnight, 1984. "A relational approach to vector data structure conversion," Proceedings, International Symposium on Spatial Data Handling, Zurich, pp. 78-95. DISCUSSION AND EXAM QUESTIONS 1. Make a list of the kinds of relationships which can exist between pairs of spatial objects, for each pair of points, lines and areas, e.g. point to point, point to line, area to point etc. Are there any examples of relationships between triples of objects, e.g. point-point-point? 2. Write out the CanSIS data structure for a simple map of three or four polygons, forming an equal or smaller number of objects (include the x,y coordinate pairs) (need to include a drawn example). 3. The GIS industry has traditionally provided data models which assume that within any one layer of the database, polygon objects do not overlap, and exhaust the space available. Comment on the degree to which this assumption has limited the application of GIS databases in specific areas. Are these sufficiently significant to warrant a change of data models in the future? 4. Discuss areas of application in which the concept of a complex feature type would be useful. What operations would you want to perform on complex and simple features respectively? EFFICIENT STORAGE OF LINES - CHAIN CODES A. INTRODUCTION Terms B. TECHNIQUES FOR REPRESENTING IRREGULAR LINES 1. Straight line segments 2. Arcs of circles 3. Splines C. STORING CHAINS (ARCS) Advantages and disadvantages Freeman chain code Compressing code Repeating sequences Summary D. APPLICATIONS OF CHAIN CODES Vectorizing raster data Vectorizing scanner output REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES UNIT 31 - EFFICIENT STORAGE OF LINES - CHAIN CODES Compiled with assistance from David H. Douglas, University of Ottawa A. INTRODUCTION the previous unit looked at structures which encode certain types of spatial relationships in this unit we look at the coding of the geometry of lines and areas overheads - Examples of geographic lines note the different characteristics of each of these lines each of these lines suggests different geographic features - can you identify them? most systems store lines and areas as sequences of points connected by straight lines is simple, but best for representing lines with sharp breaks of direction, not smooth curves for example, a meandering river or a railroad could be represented very much more efficiently as a few smooth curves than as a large number of straight lines this could greatly reduce the effort of digitizing there are many methods of discretizing a line this unit looks at some of them, and the applications where they are useful Terms because area objects are mostly represented by straight line segments they are often referred to as polygons a variety of terms are used to describe an irregular line coded as a sequence of straight line segments, particularly when the line is the common boundary between two area objects these terms include arc, segment, edge, and chain of these, chain has been established as the standard terminology for US digital cartography when the line is a common boundary, but arc is probably used most often B. TECHNIQUES FOR REPRESENTING IRREGULAR LINES 1. Straight line segments if the line on the ground is curved, e.g. a river bank, then straight line segments can only approximate the truth the degree of approximation depends on the length of straight segment used, but the segments would have to be infinitely short to represent a true curve thus, straight line segments are a way of representing a continuous line using a finite amount of information, often the coordinates of the segment endpoints 2. Arcs of circles some systems oriented toward survey data allow both straight lines and arcs of circles between points for curves, the system must store the radius of curvature as well as the start and end points it must also correctly identify those segments which are straight lines and those which are arcs this approach is used in Prime''s System9 GIS is useful for engineered features, like highways and railroads, which are designed as straight lines and curves 3. Splines can be used for describing smoothly curving features like meandering rivers the curve is modelled as a spline function, a mathematical function that passes through specified points but has minimal curvature splines are often used to smooth the curves drawn by plotters as part of the contouring operation are more commonly used in CAD than in GIS C. STORING CHAINS (ARCS) there is often significant redundancy when a line is stored as a sequence of coordinate pairs for example, a curved street represented by four points in Columbia, MO might have coordinates as follows: (38.9519, 92.3503) (38.9519, 92.3510) (38.9522, 92.3511) (38.9522, 92.3527) note that the first four digits of each coordinate (latitude,longitude) are the same for every point can economize greatly on storage by storing the offset from the previous point, in units of 0.0001 degrees: (38.9519,92.3503) (+00,+07) (+03,+01) (+00,+16) this would allow every subsequent point to be stored in 4 decimal digits instead of 12, or roughly 12 bits instead of 36 also need one bit each to store the signs of the change in longitude and latitude, for a total of 14 however this will limit the maximum change between any pair of points to .0099 degrees latitude or longitude Advantages and disadvantages the advantage is reduction in storage volume the disadvantage is loss of generality: we now have a fixed spatial resolution (.0001 degrees) and maximum distance between points (.0099 degrees latitude or longitude) this creates problems when using global references and converting coordinates variations on this technique are particularly applicable for data obtained from scanners and digitizers these devices have built-in resolution scanners have a resolution equal to the pixel size of the scanner, and all displacements between points are actually whole numbers of pixels digitizers also work in discrete units Freeman chain code instead of strings of coordinate pairs, systems which rely heavily on scanner input often code lines as lists of incremental movements a fixed number of move directions is established, usually 8, and assigned the integers 0 through 7 (the "Queen''s case" move set): diagram a series of four moves up would be coded as four 0s the string "01012" would be used to code the following curve: diagram each step along the line requires a digit between 0 and 7, or three binary digits between 000 and 111 this technique of incremental coding of lines or chain code is usually associated with Herbert Freeman however, it was described much earlier in the 19th Century by Francis Galton as a means of transmitting information on fingerprints over a telegraph see Freeman (1961) for description of a number of algorithms based on chain codes Compressing code certain pairs of codes are much more likely to occur in chain codes than others lines generally turn slowly, and sharp bends are unusual in much geographic data the next move is most likely the same as the previous one next most likely is 45 degrees right or left 180 degrees is the least likely next move thus, a code 4 is most likely to be followed by another 4, or a 3 or a 5, and least likely to be followed by a 0, 1 or 7 this can be exploited by using a code whose length depends on the sharpness of the turn angle: 1 ahead 01 45 degrees right 001 45 degrees left 0001 90 degrees right 00001 90 degrees left 000001 135 degrees right 0000001 135 degrees left in this way, the total number of binary digits needed to represent a line can be reduced overhead - Compressing code Repeating sequences if the line being coded has long straight stretches, sequences of chain codes will be repeated overhead - Repeating sequences for example, to go from one point to the other requires 6 moves to the east and 3 to the northeast (3 1s and 6 2s). a sequence such as 222222111 would be a serious distortion of the straight line the straight line is best approximated by the sequence 221221221 repeats the 221 pattern 3 times straight is always best approximated by a homogeneous mix of codes to code lines with long straight sections, a way of representing runs of patterns is needed for example 3(221) might indicate four repeats of the 221 sequence Summary handout - Comparison of different encoding schemes EFFICIENT STORAGE OF LINES - CHAIN CODES D. APPLICATIONS OF CHAIN CODES chain codes are useful in two particular applications: Vectorizing raster data output from raster data along the boundary separating zones A and B would appear as follows: A A A B A A B B A B B B B B B B a line can be assumed along the pixel edges which separate the A''s from the B''s a vector representation of the line could be coded in chain code as 202020 Vectorizing scanner output when a line is read by a scanner it looks something like this: 0 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 0 1''s indicate where the scanner "saw" the line and 0''s where it "saw" the blank spaces around the line the first step in deriving a vector representation of the line is to "thin" the pixels using a standard algorithm are many different algorithms available for this a good source on these is Pavlidis (1982) the result of thinning will be: 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 the line can then be derived by following the remaining skeletal 1s represented in chain code as 111 (three moves northeast) usually stored with the x,y coordinates of the line followed with the chain code chain code is most useful in applications where vector data originated in raster form, from a scanner or an image since it discretizes both distance and direction (both can take a limited set of possible values), it is not particularly good for more general applications REFERENCES Freeman, H., 1961. "On the encoding of arbitrary geometric configurations," Institute of Radio Engineers, Transactions on Electronic Computers, EC10:260-8. Pavlidis, T., 1982. Algorithms for Graphics and Image Processing, Springer-Verlag, Berlin. DISCUSSION AND EXAM QUESTIONS 1. Using a sample line drawn on a sheet of paper, write out (a) the Freeman chain code representation of the line using single digits ranging from 0 to 7, (b) its representation using the binary code based on turn angles, and (c) its representation using run encoded chains. Which option gives greater data compression? (express the length of each code in bits, and remember that a number between 0 and 7 requires 3 such bits) (need to include a drawn example). 2. The answer to the previous question depends on the type of line being coded. Discuss the characteristics which would give greatest data compression for each of the three options, and describe examples of lines with these characteristics. 3. The accuracy with which a line is represented as a series of straight line segments depends on the process used to select points during digitizing. Discuss the criteria you would use to select points in order to obtain an accurate representation of a line. How would these criteria change depending on the nature of the line? 4. Using a simple classified raster, write out a vectorized representation using arcs and Freeman chain code. Include the left and right polygons and pointers to adjacent arcs as attributes of each arc, and the classes as attributes of each polygon. Show each line as a coordinate pair for the beginning point, followed by a chain code using digits in the range 0 through 7. 5. In preparation for the Unit 32, figure out how you would set up a program to determine where two lines, defined by their end points, intersect. Start with a diagram. Under what conditions will your program fail? SIMPLE ALGORITHMS I - INTERSECTION OF LINES A. INTRODUCTION B. DEFINITIONS Algorithms Heuristics C. SIMPLEST CASE Question Procedure Solution General form Simple program D. SPECIAL CASES Vertical lines Parallel lines Solution E. COMPLEX LINES Minimum enclosing rectangle Monotonic sections Sorting lines REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES UNIT 32 - SIMPLE ALGORITHMS I - INTERSECTION OF LINES Compiled with assistance from David H. Douglas, University of Ottawa and David M. Mark, State University of New York at Buffalo A. INTRODUCTION the intersection of two lines is a critical operation in GIS is used in polygon overlay operations, merging and dissolving polygons and lines is the basis for point in polygon operations and critical for sliver removal therefore, an efficient algorithm to determine the intersection of two lines is very important in any GIS. GIS algorithms for complex processes are often built up from simple ones this section will review a few simple algorithms, later sections will show how they can be built up into complex operations the first operation here determines if two lines cross begin by examining the algorithm for two straight lines, then go on to two complex lines or polygons. eventually will see that this algorithm forms the core of numerous GIS operations, including the point in polygon and polygon overlay processes. the algorithm illustrates one of the principles of this type of programming, that there are numerous special cases which have to be dealt with. B. DEFINITIONS Algorithms an algorithm is a procedure consisting of a set of unambiguous rules which specify a finite sequence of operations that provides the solution to a problem, or to a specific class of problems each step of an algorithm must be unambiguous and precisely defined the actions to be carried out must be rigorously specified for each case an algorithm must always arrive at a problem solution after a finite number of steps this must also be a reasonable number of steps every meaningful algorithm provides one or more outputs it is preferable that the algorithm be applicable to any member of a class of problems rather than only to a single problem in general, the cost of obtaining a solution increases with the problem size if the size of the problem is sufficiently small, even an inefficient algorithm will not cost much to run consequently, the choice of an algorithm for a small problem is not critical unless the problem has to be solved many times. Heuristics an heuristic is a rule of thumb, strategy, trick, simplification, or any other kind of device which drastically limits the search for solutions in large problem spaces they do not guarantee optimal solutions, in fact they do not guarantee any solution at all (definition from Feigenbaum, E.A. and J. Feldman, eds., 1963, Computers and Thought, McGraw-Hill, p.6) C. SIMPLEST CASE Question Does the line from (4,2) to (2,0) cross the line from (0,4) to (4,0)? If so, where? Procedure find the equations of the two lines, and solve them simultaneously for the intersection the equation of a line is: y = a + bx where b is the slope given two points on the line at (x1,y1) and (x2,y2), the slope b can be determined by the expression: b = (y1 - y2) / (x1 - x2) the value of a can then be found by solving the equation using either point Solution for line 1 above b = (2 - 0)/(4 - 2) = 1 using point 1 2 = a + 4 thus a = -2 the equation is y = -2 + x for line 2 similarly, y = 4 - x solving simultaneously, the two lines intersect at (3,1) General form in general, the two lines y = a1 + b1x and y = a2 + b2x intersect at: xi = - (a1 - a2) / (b1 - b2) yi = a1 + b1xi however, this merely finds the intersection point between two lines of infinite length passing through each pair of points it is not yet established that the intersection point lies between the pairs, rather than on the imaginary extensions of one or both lines: diagram an intersection point at xi lies between x1 and x2, i.e. on line 1, if: (x1 - xi) (xi - x2) >= 0 similarly the point lies on line 2 with endpoints (u1,v1) and (u2,v2) if: (u1 - xi) (xi - u2) >= 0 because either line can be vertical (e.g. x1 = xi = x2) we must also check similar conditions on the y coordinates to be sure the lines intersect Simple program overhead/handout - Simple program to compute the intersection of two lines the handout lists a rudimentary program for determining if two lines cross x and y are used for line 1, u and v for line 2 input x1,y1 input x2,y2 input u1,v1 input u2,v2 b1 = (y2-y1)/(x2-x1) (A) b2 = (v2-v1)/(u2-u1) (B) a1 = y1-b1*x1 a2 = v1-b2*u1 xi = - (a1-a2)/(b1-b2) (C) yi = a1+b1*xi if (x1-xi)*(xi-x2)>=0 AND (u1-xi)*(xi-u2)>=0 AND (y1-yi)*(yi-y2)>=0 AND (v1-yi)*(yi-v2)>=0 then print "lines cross at",xi,yi else print "lines do not cross" end if SIMPLE ALGORITHMS I - INTERSECTION OF LINES D. SPECIAL CASES unfortunately this program will get into trouble in certain special cases: Vertical lines if line 1 is vertical, the instruction labeled (A) will cause an error because of an attempt to divide by zero, as numerical processors cannot deal with infinity similarly if line 2 is vertical, line (B) will cause an error Parallel lines if the two lines are parallel, line (C) will cause an error Solution to deal with these special cases we must make the program a little more complex: handout - Revised program to calculate intersection these special cases occur in many simple geometrical algorithms. they can be avoided to some extent by using different approaches E. COMPLEX LINES consider two complex lines of n1 and n2 straight line segments respectively: diagram these can be processed for all intersections by looping the simple algorithm, testing every segment in one line against every segment in the other the amount of work to be done is proportional to the product (n1 x n2) can reduce the amount of work by using a heuristic that will save future computation although this requires an additional processing step, overall processing time should be reduced examples of such methods are: Minimum enclosing rectangle a minimum enclosing rectangle (MER) of a line is defined by the minimum and maximum x and y coordinates of the line diagram a very rough check for intersection can be made by seeing if the enclosing rectangles of two lines overlap overhead - Minimum enclosing rectangles if they do not intersect, the lines cannot intersect if they do intersect, then find the MERs for each straight segment of each line to see which, if any of these intersect Monotonic sections can divide each line into sections which are monotonically increasing or decreasing in x and y monotonic segments are such that: a straight line parallel to either x or y axis cuts the section at most once there is a break where ever x or y hits a local maximum or minimum overhead - Monotonic sections this sets up conditions which can be used to reduce the amount of work done in looking for intersections. as the segment continues to increase in one direction, it cannot turn and intersect the other line again overhead - Intersection of monotonic sections this is the approach used by ARC/INFO and other vendor GISs the number of calculations required to determine the intersection of two complex lines drops from a value proportional to (n1 x n2) to a value proportional to (n1 + n2) Sorting lines when there are many lines with many intersections to be processed, as in the overlay of two complex coverages can sort the lines by their ranges so that only lines in similar ranges will be considered together REFERENCES Douglas, David H., 1974. "It makes me so cross," a paper distributed by the Laboratory for Computer Graphics and Spatial Analysis, Graduate School of Design, Harvard University, September 1974. (This note has been reprinted numerous times in computing magazines, and in Marble, Calkins and Peuquet, 1984.) Lee, D.T. and F.P. Preparata, 1984. "Computational geometry: a survey," IEEE Transactions on Computers C-33(12):1072- 1101. A good introduction to basic algorithms for geometrical problems. Little, J.J. and T.K. Peucker, 1979. "A recursive procedure for finding the intersection of two digital curves," Computer Graphics and Image Processing 10:159-71. Marble, Duane F., Calkins, Hugh W. and Peuquet, Donna J., eds., 1984. Basic Readings in Geographic Information Systems, Williamsville N.Y., SPAD Systems Ltd., Saalfeld, Alan, 1987. "It doesn''t make me nearly as CROSS," International Journal of Geographical Information Systems 1(4), pp 379-386 Taylor, George, 1989. "Letters," International Journal of Geographical Information Systems 3(20):192-3. Another look at the line intersection problem. DISCUSSION AND EXAM QUESTIONS 1. Why is the "crossing segment" or line intersection problem so important in GIS? 2. Identify the rules which can be used to limit searching for intersections between two lines which are both monotonic in x and y. Each line will be one of four combinations - either increasing or decreasing in x, and either increasing or decreasing in y. You will need to deal with 16 combinations when discussing the options for two lines. 3. Review and discuss the technique for line intersection described in Little and Peucker, 1979. 4. Compare raster and vector approaches to the determination of the intersection between two lines. Are there circumstances under which a raster approach might be preferable? 5. Since geographical data is never perfectly accurate, the special cases identified in the algorithm should never occur precisely. Modify the algorithm to deal with special cases by treating data as imprecise. What are the advantages and potential problems with such an approach? SIMPLE ALGORITHMS II - POLYGONS A. INTRODUCTION B. POLYGON AREA Objective Method Problems Calculating areas of many polygons at once C. POINT IN POLYGON ALGORITHM Objective Generalization Strategy Sample code Special cases Fuzzy boundaries Many points in many polygons D. CENTROID LOCATION Method E. SKELETON Applications REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES UNIT 33 - SIMPLE ALGORITHMS II - POLYGONS A. INTRODUCTION the previous unit looked at a simple algorithm for determining the intersection of two lines this unit considers at a second simple algorithm and at some extensions B. POLYGON AREA Objective find the area of a polygon which is defined by a sequence of vertices Method common method is to use an algorithm that finds the area of a number of trapezoids bounded by: a line segment vertical lines dropped to the x axis the x axis overhead - Calculating area of trapezoid diagram 1. Drop vertical lines to the x axis from two adjacent vertices find the area of the trapezoid so defined: A = (x2-x1) * (y2+y1) / 2 diagram i.e. area is difference in x''s times average of y''s 2. Continue around the polygon, finding the area of each trapezoid defined by each line segment 3. Sum the areas of all trapezoids defined to get the area of the polygon: (note: must have (x(n+1),y(n+1)) = (x(1),y(1)) ) A = S (x(i+1)-x(i)) * (y(i+1)+y(i)) / 2 ) when x(i+1) < x(i) the contribution to area is negative this allows for trapezoids formed by the lower portion of the polygon to be subtracted so that, in the end, the total A is the area of the polygon it self Problems this algorithm works fine when the polygon has holes and overhangs does not work with polygons whose boundaries self-cross (i.e. figure 8) if the polygon is digitized counterclockwise its area is negative problems occur when the polygon has negative y values: cannot "drop" a perpendicular to the x axis solutions: add a large value to all y''s drop vertical lines to points below the lowest y value if the coordinate system is very precise, i.e. UTM with large numbers, will lose accuracy as a result of the comparative lack of precision of the computer therefore, does not work on small polygons in large coordinate systems solution: move the origin temporarily up to a location near the vertices this reduces the size of the coordinate values Calculating areas of many polygons at once if polygons are defined by arcs with L and R polygons identified can do calculations of trapezoid area for each arc keep a running total of the area of each polygon L polygon gets negative area R polygon gets positive area after processing each arc, we have the areas of each polygon outside world will have area equal to the negative of the total area of all polygons C. POINT IN POLYGON ALGORITHM Objective determine whether a given point lies inside or outside a given polygon Generalization find out which polygon of a set contains each of a set of points examples determine which county contains each of a set of customers identified by their coordinates, using a county boundary file identify the attributes of the land use polygon containing a given house notation the point is at (u,v) the polygon has n points, (x(i),y(i)), i=1,...,n, and is closed by making (x(n+1),y(n+1)) = (x(1),y(1)) Strategy 1. draw a vertical line upwards from the point to infinity 2. count the number of times this line intersects the boundary of the polygon if the count is odd the point is inside if it is even the point is outside Sample code overhead - Simple point in polygon program handout - Simple point in polygon program ni = +1 for i=1 to n if x(i+1) <> x(i) then (A) if (x(i+1)-u) * (u-x(i)) >= 0 then (B) if x(i+1) <> u or x(i) <= u then (C) if x(i) <> u or x(i+1) >= u then (D) b = (y(i+1)-y(i)) / (x(i+1)-x(i)) a = y(i)-b * x(i) yi = a+b*u if yi > v then ni = ni*(-1) end if end if end if end if end if next i b is the slope of the line from (x(i),y(i)) to (x(i+1),y(i+1)) by substituting x=u in the equation y=a+b*x we get the y coordinate of the intersection between this line and the vertical line through x=u then if y(i) > v the intersection must be above the point, and so is counted the value of ni alternates between +1 and -1 it is flipped every time an intersection is found at the end, ni=-1 if the point is in, +1 if the point is out Special cases lines (A) through (D) deal with special cases, as follows: A. if the line from (x(i),y(i)) to (x(i+1),y(i+1)) is vertical there can be no intersection diagram B. there can only be an intersection if the ith point is on one side of the vertical line through u and the i+1th point is on the other side diagram C.D. if either (x(i),y(i)) or (x(i+1),y(i+1)) lies exactly on the line, i.e. the point at (u,v) lies directly below a vertex, we may miscount: diagram solution to this is: if (x(i),y(i)) lies exactly on the line count an intersection only if (x(i+1),y(i+1)) lies to the right, or if (x(i+1),y(i+1)) lies exactly on the line count an intersection only if (x(i),y(i)) lies to the left Polygon with isolated islands diagram works fine Polygon has hole with an island diagram works fine Concave polygons diagram works fine What if the point is on the boundary? some point in polygon routines deal with this explicitly, but this does not sometimes it finds the point inside, sometimes outside Fuzzy boundaries Blakemore (see Blakemore, 1984) described a "fuzzy" generalization of the point in polygon problem: location of the boundary is uncertain a band of width epsilon is drawn around the boundary, and represents the band within which the true boundary is assumed to lie points can now be classified as "definitely out", "probably out", "probably in" and "definitely in" Many points in many polygons polygons will likely be stored by arc check all arcs against vertical lines drawn upward from each point overhead - Many points in many polygons if an arc between polygons A and B intersects, record an intersection with the boundary of both A and B, irrespective of which side of the arc A and B lie on this will require a counter for each point/polygon combination not very efficient a better algorithm is as follows: when an arc is found to intersect a vertical line, record the y value of the intersection and the polygon lying below the intersection (right polygon if the arc runs from W to E, left polygon otherwise) for each point, keep account of: (a) the lowest such intersection found, and (b) the polygon below that intersection after all arcs have been examined, the polygon below the lowest intersection is the containing polygon now have just two things to store for each point will still need to check every arc against every point, unless some sorting is done first SIMPLE ALGORITHMS II - POLYGONS D. CENTROID LOCATION overhead - Examples of centroid locations the centroid is potentially useful as a representative, central point in the polygon the centroid can be defined as the point about which the polygon would balance if it were cut out in plywood of uniform thickness and suspended unfortunately the centroid is not always inside the polygon, which reduces its usefulness as a central point some programs calculate the centroid by averaging the x and y coordinates of the polygon vertices this does not give the centroid diagram Method a better algorithm uses trapezoids dropped to the axes, as in the area algorithm since each trapezoid consists of a triangle and a rectangle, the centroid of the polygon can be found from a weighted average of triangle and rectangle centroids overhead - Centroid location code handout - (cont) Centroid location code X = S ((y(i) - y(i+1)) (x(i)2 + x(i)x(i+1) + x(i+1)2)/6A) Y = S ((x(i+1) - x(i)) (y(i)2 + y(i)y(i+1) + y(i+1)2)/6A) where A is the area of the polygon note: as with the area algorithm, the polygon must be coded clockwise and all y coordinates must be non- negative E. SKELETON is the network of lines inside a polygon constructed so that each point on the network is equidistant from the nearest two edges in the polygon boundary nodes on the network are equidistant from the three nearest edges a skeleton is what remains when a polygon contracts each of its straight edges moves inward at a constant rate can be thought of as the opposite of the buffer operation overhead - Polygon skeleton at the convex corners of the polygon, the bisectors of their adjacent edges form lines that are traced inward at each concave corner, the reduced polygon consists of the arc of a circle centered on the corner as the process continues the bisectors and arcs eventually merge, forming a tree-like structure as the polygon gets smaller, it may form into two or more islands ultimately, the polygon is reduced to a point this point is: furthest from the original boundary the center of the largest circle one could draw inside the original polygon Applications finding good locations for labels for polygons label might be drawn: along the skeleton axis of a polygon at the point remaining after the polygon has been shrunk to a point breaking a city block into polygons, one for each face of the block, for labeling REFERENCES Blakemore, M., 1984, "Generalization and error in spatial databases," Cartographica 21:131-9. Shamos, M.I., and F.P. Preparata, 1985. Computational Geometry, Springer-Verlag, Berlin. The standard but technically complex work on geometric algorithms. DISCUSSION AND EXAM QUESTIONS 1. Discuss the results of using the polygon area and point in polygon algorithms when the polygon is not correctly structured (e.g. unclosed, figure-of-eight). 2. Modify the point in polygon algorithm to determine correctly if the point lies on the boundary of the polygon, in addition to inside or outside. 3. Derive the equations for polygon area and centroid from first principles. 4. Modify the polygon area algorithm to test for and accommodate negative y coordinates. THE POLYGON OVERLAY OPERATION A. INTRODUCTION Traditions of polygon overlay use B. GENERAL CONCEPTS OF POLYGON OVERLAY OPERATIONS Operations requiring polygon overlay C. OVERLAY ALGORITHMS Objective Given Procedure D. COMPUTATIONAL COMPLEXITY E. INTERSECTION PROBLEMS 1. Adjacent lines 2. Sliver polygons Removing slivers during overlay Removing slivers after overlay REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES UNIT 34 - THE POLYGON OVERLAY OPERATION Compiled with assistance from Denis White, Environmental Protection Agency, Corvallis, OR A. INTRODUCTION the simple algorithms discussed previously form the basis for one of the most complex operations of vector GIS systems - polygon overlay Traditions of polygon overlay use the three traditions of polygon overlay are: 1. Landscape planning superimposing layers of geographical data (e.g. environmental and social factors) so that their spatial relationships can be used in making land use decisions a key reference is McHarg, 1969, Design with Nature 2. Set theoretic a polygon can be thought of as representing a set when two sets (polygons) A and B are overlaid, we have a graphic interpretation of the set concepts of intersection and union diagram the area of overlap of A and B is A.AND.B (intersection) the combined area is A.OR.B (union) overhead/handout Sixteen combinations of Boolean operations it is possible to identify 16 such combinations, or Boolean expressions including e.g. A.AND.(NOT.B), i.e. all of the area of A which is not overlapped by B NOT.(A.OR.B), i.e. the outside world (= (NOT.A).AND.(NOT.B) in most polygon overlay operations it is the intersection which is of most interest, i.e. the area which is common to A and B 3. Area interpolation Given: the population of area A and the fact that areas A and B overlap Estimate: the population of area B diagram this problem can be solved by apportioning the population of A so that the amount in the area covered by B is proportional to the area of overlap this is a simple way of estimating the population of areas from statistics based on other areal units e.g. estimating populations of voting districts from populations of census tracts this method assumes that density is uniform within A, but more realistic versions of the technique exist see Unit 41 B. GENERAL CONCEPTS OF POLYGON OVERLAY OPERATIONS in GIS, the normal case of polygon overlay takes two map layers and overlays them each map layer is covered with non-overlapping polygons if we think of one layer as "red" and the other as "blue", the task is to find all of the polygons on the combined "purple" layer diagram attributes of a "purple" polygon will contain the attributes of the "red" and "blue" polygons which formed it can think of this process as "concatenating" attributes usually a new attribute table is constructed that consists of the combined old attributes, or new attributes formed by logical or mathematical operations on the old ones number of polygons formed in an overlay is difficult to predict there may be many polygons formed from a pair of "red" and "blue" polygons, with the same "purple" attributes when two maps are overlaid, will result in a map with a mixture of 3 and 4 arc intersections four arc intersections do not generally occur on simple polygon maps Operations requiring polygon overlay 1. Windowing the windowing operation, in which a window is superimposed on a map and everything outside the window is discarded, is a special case of polygon overlay 2. Buffering buffering around points, lines and polygons is another case buffers are generated around each point or straight line segment the combined buffer is found by polygon overlay diagram 3. Planar Enforcement the process of building points, lines and areas from digitized "spaghetti" (see Unit 12) wherever intersections occur between lines, the lines are broken and a point is inserted the result is a set of points, lines and areas which obey specific rules: THE POLYGON OVERLAY OPERATION C. OVERLAY ALGORITHMS to overlay polygons it is necessary first to find all intersections between "red" and "blue" boundary lines arcs are more efficient than polygons for this operation because: intersections need to be found only once rather than four times more information is available about labels (see below) example of overlay operation: overheads - Simple overlay example Objective overlay two maps of different themes and determine the combined attributes of the new polygons e.g. overlay a soils map on a vegetation map and create a new set of polygons with a new set of attributes Given two overlapping polygons as follows: red map: a polygon with attribute A blue map: a polygon with attribute 1 the outside world labelled 0 on both maps two intersecting arcs defining the boundaries of the polygons: 1. Red Map (light lines): (0,1) (0,3) (2,3) (2,1) (0,1) Polygons - Right: A Left: 0 2. Blue Map (heavy lines): (1,0) (3,0) (3,2) (1,2) (1,0) Polygons - Right: 0 Left: 1 Procedure after intersections have been found, six new arcs are formed, three from arc 1 and three from arc 2: 1. (0,1) (0,3) (2,3) (2,2) 2. (2,2) (2,1) (1,1) 3. (1,1) (0,1) 4. (1,0) (3,0) (3,2) (2,2) 5. (2,2) (1,2) (1,1) 6. (1,1) (1,0) because the right and left polygon labels are known for each input arc, we know the labels of the new polygons as soon as the intersections have been found there are four new polygons their attributes combine red and blue attributes: 00, A0, A1 and 01 the arc right and left labels, deduced from the geometry of the intersections, are: Arc Right Left 1 A0 00 2 A1 01 3 A0 00 4 00 01 5 A0 A1 6 00 01 a more complex example: overhead - Complex overlay example in this case the right and left polygon labels for arcs 1, 2, 4, 5 and 7 would be known from the geometry of the intersections: 1R: A0 1L: 00 2R: A1 2L: 01 4R: A1 4L: 01 5R: 01 5L: 00 7R: A1 7L: A0 the labels of the remaining arcs must be determined labels can be passed from one arc to another around a polygon: 3R: must be the same as 2R and 4R 6L: from 2L, 4L arc 3 was part of the red network, so its soils labels are known, the remaining (blue) part of its left label must be the same as the blue part of its right label 3R is A1 - thus 3L is B1 thus 6R is B1 how to get the blue labels of arc 8? use a point in polygon routine to find the enclosing blue polygon use a data structure in which arcs on the inside of the polygon boundary "point" to arcs on the outside of any enclosed islands e.g. 5R -> 4L -> 6L -> 8L -> 2L -> 5R thus the labels of arc 8 are 8L: 01, 8R: C1 the final step in the algorithm is to identify all new polygons by following around each polygon from one arc to the next until every right and left side of every arc has been identified with a uniquely numbered polygon D. COMPUTATIONAL COMPLEXITY polygon overlay is numerically intensive and time consuming, therefore it is the most complex operation of most vector-based GIS programs notation: if computing time to process n objects is proportional to n, the computational complexity is "order n", or O(n) if it is proportional to n2, we way it is "order n squared" or O(n2) it is important to know: how long does it take to overlay a given number of polygons? what affects the amount of time taken? obviously, the number of arcs and polygons affects the number of computations required it is usually possible to determine the number of polygons being overlaid the number of arcs is roughly 3 times the number of polygons other factors, such as the wiggliness of boundaries, affect the time, but it is difficult to measure these if n1 polygons are overlaid on n2, how many polygons result? (assuming maps are different) minimum is n1+n2, polygons on the two maps do not intersect at all maximum is infinity, lines have infinite wiggliness typical is 3 or 4 times (n1+n2), discounting spurious polygons if every one of n1 "red" polygons has to be compared to every one of n2 "blue" polygons, the algorithm will be O(n1n2) if we could immediately find all "blue" arcs likely to intersect with a given "red" arc, we could build an algorithm which would be O(n1), which means it would be much more efficient for a given size of problem to find arcs in this way we would need an efficient method of spatial indexing one of the most successful methods uses the moving band concept: sort both "red" and "blue" arcs in ascending order of x process the arcs beginning at the left, moving to the right, sweeping a "band" across the map only those arcs falling in the band are examined since the arcs are sorted, we can find those in the band on either red or blue maps quickly some of the best polygon overlay routines now available in the GIS market operate in close to O(n1+n2) time a map with tens of thousands of polygons can be overlaid on another map with a similar number in an hour of computing time on a moderate-sized machine E. INTERSECTION PROBLEMS because of computer precision, lines will be represented in the computer with great precision even though the accuracy of the representation is low 1. Adjacent lines the following diagram represents a case where two lines cross diagram the following diagram represents a similar case where the two lines do not actually cross diagram it is necessary to provide algorithms which distinguish between these very different conditions 2. Sliver polygons overlay algorithms compute the exact intersections between lines (see Unit 14) in any overlay operation, it is likely that there will be pairs of lines which should coincide, but do not because of differences in digitizing these are called slivers, spurious polygons or "coastline weave" overhead - Overlay of two images if an arc or polygon of n1 points is overlaid on one of n2 points, up to: ( 2n1n2/(n1+n2) - 3 ) slivers may be generated two possible approaches: a. delete during the overlay operation, or b. delete afterwards Removing slivers during overlay this is the most common approach used in commercial GIS programs this approach deals with the line as if it were fuzzy diagram define a tolerance limit for each line which indicates the amount of uncertainty that exists regarding the true position of the digitized line provides a band of width "epsilon" around every line for digitized lines this width may be 1 mm using epsilon, can then conclude that lines which have a difference in position less than epsilon are the same line these two lines can then be collapsed to represent a single line Problem it is easy to get into difficulty: lines A and B within epsilon, therefore the same: lines B and C within epsilon, therefore the same: but lines A and C are not necessarily within epsilon of each other polygon overlay routines which do sliver removal "on the fly" must deal with this problem Removing slivers after overlay need intelligent criteria to distinguish between slivers and real polygons criteria for removal include: area: slivers are small shape: slivers are long and thin number of arcs: slivers generally have only 2 bounding arcs while real polygons rarely have only 2 alternating attributes: if a "red" arc between polygons A and B is overlaid on a "blue" arc between polygons 1 and 2, the slivers will alternate between A2 and B1 diagram junctions: slivers terminate in 4 arc junctions, but 3 arc junctions are more common in real polygons chaining: slivers tend to occur in chains it is likely that the confidence with which it can be concluded that two arcs are forming slivers will increase steadily as we work along the arc i.e. the attribute "sliver" is strongly correlated if a sliver is detected, it can be replaced by an arc along its center line REFERENCES McHarg, I.L., 1969. Design with Nature, Doubleday, New York Goodchild, M.F., and N. S. Lam, 1980. "Areal Interpolation," Geoprocessing 1:297-312. DISCUSSION AND EXAM QUESTIONS 1. McHarg described the overlay technique well before the advent of GIS and polygon overlay. Discuss the advantages and possible disadvantages of a computer implementation of the technique. 2. Write out and illustrate the 16 Boolean combinations of two polygons A and B. 3. Review and discuss the alternative forms of areal interpolation described by Goodchild and Lam, 1980. 4. Discuss the relative advantages of raster and vector approaches to polygon overlay. Identify the application areas likely to adopt each method given their advantages. RASTER STORAGE A. INTRODUCTION Why use raster? Objectives B. STORAGE OPTIONS FOR RASTER DATA What if there is more than one layer? What do raster systems store in each pixel? Raster/Vector combinations C. RUN ENCODING Problems D. SCAN ORDER 1. Row order 2. Row prime order (Boustrophedon) 3. Morton order 4. Peano scan (also Pi-Order or Hilbert) Comparing scan orders E. DECODING SCAN ORDERS Method Generalization REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES The latter portion of this unit and the following two require familiarity with numbering systems in base 2 and 4 as well as techniques for conversion between these and decimal. You may wish to provide your students with some background material on these topics before tackling these units. UNIT 35 - RASTER STORAGE Compiled with assistance from Donna Peuquet, Pennsylvania State University A. INTRODUCTION Why use raster? data are acquired in that form from remote sensing, photogrammetry or scanning is a common way of structuring digital elevation data raster assumes no prior knowledge of the phenomenon, sampling is done uniformly knowledge of variability would allow us to sample more heavily in areas of high variability (rugged terrain) and less heavily in smooth terrain data are often converted to raster as a common format for data interchange for merging with remote sensing images or DEMs raster algorithms are often simpler and faster e.g. buffer zone generation is simpler in raster raster may be appropriate if the solution requires uniform resolution, e.g. in finding optimum routes for linear features such as power lines, or in inferring the locations of stream networks from DEMs Objectives there are many options for storing raster data (many data structures) some are more economical than others in use of storage some are more efficient in access and processing speed this unit looks at some of the options and issues involved many of these issues were introduced in Unit 4, they are expanded upon here B. STORAGE OPTIONS FOR RASTER DATA by convention, raster data is normally stored row by row from the top left this is the European/North American reading order is also the order of scan of a TV image example the image A A A A A B B B A A B B A A A B would be stored in 16 memory positions, one for each pixel, in the sequence: A A A A A B B B A A B B A A A B What if there is more than one layer? two options: 1. store the layers separately this is the normal practice 2. store all information for each pixel together this requires extra space to be allocated initially within each pixel''s storage location for layers which might be created later during analysis this is usually difficult to anticipate (note in remote sensing, these concepts are called "band sequential" and "band interleaved by pixel" respectively) What do raster systems store in each pixel? some allow only an integer, in a fixed range, e.g. -127 to +127 (1 byte per pixel) or -32767 to +32767 (2 bytes per pixel) some allow integers, real (decimal) numbers and mixed alphabetic letters and numbers in each pixel in this case it helps if the system keeps track of what type of data is stored in each layer and stops the user doing wrong types of analysis on the data example: vegetation data is recorded as a class (A thru G) in each pixel elevation data is recorded as a decimal number (e.g. 100.3 m) the system should not allow the user to add the pixel values from the two layers (A + 100.3) or perform any other kind of arithmetic operation on the vegetation data Raster/Vector combinations many raster-based systems allow vector input Example: a polygon, defined by its vertices, is input convert this to a raster e.g. assign 1 to all pixels inside the polygon, 0 to all outside some forms of data are really hybrids of raster and vector: Freeman chain code has finite resolution based on pixels (raster-like) but defines lines and the boundaries of objects (vector-like) a raster can be used to define objects at fixed resolution if every pixel is given an object number instead of a value the object numbers are pointers to an attribute table: Raster ObjectAttributes 23 23 23 24 23 A 100.0 23 23 24 24 24 B 101.1 23 23 24 24 23 23 23 24 this gives us an object with its attributes, plus a list of pixels associated with the object instead of the object''s coordinates in this sense, a raster is a finite resolution geometry rather than an alternative way of structuring spatial data C. RUN ENCODING geographical data tends to be "spatially autocorrelated", meaning that objects which are close to each other tend to have similar attributes Tobler expressed it this way: "All things are related, but nearby things are more related than distant things" because of this principle, we expect neighboring pixels to have similar values so instead of repeating pixel values, we can code the raster as pairs of numbers - (run length, value) e.g. instead of 16 pixel values in original raster matrix, we have: 4A 1A 3B 2A 2B 3A 1B produces 7 integer/value pairs to be stored if a run is not required to break at the end of each line we can compress this further: 5A 3B 2A 2B 3A 1B = 6 pairs however, it helps to limit the possible size of the run so that we can use less space to store the run length, as the amount of space allocated must be sufficient for the maximum run length Problems layers now have different lengths depending on the amount of compression (lengths of runs) storing all layers together for each pixel now makes no sense run encoding would be little use for DEM data or any other type of data where neighboring pixels almost always have different values RASTER STORAGE D. SCAN ORDER 1. Row order described already are there better ways of ordering the raster than row by row from the top left? other orders may produce greater compression overhead/handout - Standard scan orders 2. Row prime order (Boustrophedon) suppose we reverse every other row: diagram this has the charming name boustrophedon from the Greek for "how an oxen plows a field" avoids a long jump at the end of each row, so perhaps the raster would produce fewer runs and thus greater compression this order is used in the Public Land Survey System: the sections in each township are numbered in this way one the original raster (page 35-3) it results in: 4A 3B 3A 3B 3A = 5 runs 3. Morton order overhead/handout - (cont) Standard scan orders Morton order is the basis of many efforts to reduce database volume named for Guy Morton who devised it as a way of ordering data in the Canada Geographic Information System however, this way of ordering or scanning a raster was well known long before Morton it is associated with the names of several mathematicians and geometers: Hilbert, Peano, and Koch coincidentally, Morton is the name of the lower left corner county in Kansas the strategy is to exhaust each area of the map in sequence, whereas row by row order scans from one side to the other this minimizes the number of large jumps diagram this is one of several hierarchical ordering systems it is built up level by level, repeating the same pattern at each level, as follows 2 3 10 11 14 15 42 43 46 47 58 59 62 63 0 1 8 9 12 13 40 41 44 45 56 57 60 61 2 3 6 7 34 35 38 39 50 51 54 55 0 1 4 5 32 33 36 37 48 49 52 53 10 11 14 15 26 27 30 31 8 9 12 13 24 25 28 29 2 3 6 7 18 19 22 23 0 1 4 5 16 17 20 21 it is only valid for square arrays where the numbers of rows and columns are powers of 2 e.g. 2x2, 4x4, 8x8, 16x16, 32x32, 64x64, etc. how does it do on our 4x4 array? 5A 3B 1A 1B 2A 2B 2A = 7 runs which is as long as row by row compression 4. Peano scan (also Pi-Order or Hilbert) the Peano scan or Pi-order is like boustrophedon in always moving to a neighboring pixel diagram the name Peano is associated with both this and Morton orders, though more often with this it is also hierarchical, but the pattern appears in different orientations at different levels Comparing scan orders it is useful to look at a comparison of the compression rates obtained by the different orders (see Goodchild and Grandfield, 1983) overhead/handout - Scan order comparison the comparison shown used a number of 64x64 pixel images: all pixels are either B or W vary from: images with large patches of black or white to very chaotic images in which each pixel is independently black or white in the table, H indicates the amount of chaos in the image: the higher H, the larger the patches low H corresponds to chaotic images each line in the table gives the numbers of runs required to code the same black and white image the values in the last line were calculated theoretically this table shows that scan order makes little difference to data compression the number of runs is not greatly affected by scan order for a given image however, the orders which move to adjacent pixels (boustrophedon and Peano) tend to do better than row and Morton E. DECODING SCAN ORDERS since Morton and Peano orders are useful but complex, two types of questions arise when they are used: 1. What are the row and column numbers for a given pixel? 2. What is the position in the scan order for a given row and column number? Method start by numbering the rows and columns from 0 up: 3 10 11 14 15 2 8 9 12 13 1 2 3 6 7 0 0 1 4 5 0 1 2 3 - row 2, column 3 is position 13 in the Morton sequence 1. How to go from row 2, column 3 to Morton sequence? a. convert row and column numbers to binary representations: 16s 8s 4s 2s 1s 1 0 row 2 1 1 column 3 b. interleave the bits, alternating row and column bits (called bit interleaving): 1 1 0 1 row col row col c. evaluate this sequence of bits as a binary number: Answer: 8 + 4 + 1 = 13 so to get the Morton position, interleave the bits of the row and column number 2. How to find row and column number from Morton position 9? a. convert the position number to a binary number 16s 8s 4s 2s 1s 1 0 0 1 (8 + 1 = 9) row col row col b. separate the bits: 1 0 row = 2 0 1 col = 1 Generalization can express the row and column number to any base, not just base 2 (binary), and including mixtures of bases example: row 6, column 15, using base 4 instead of base 2 64s 16s 4s 1s 1 2 row 6 = 1x4 + 2x1 3 3 col 15 = 3x4 + 3x1 interleaving: 1 3 2 3 1x64 + 3x16 + 2x4 + 3x1 = 123 answer: row 6 column 15 is position 123 what does this sequence look like? overhead - Base 4 x base 4 scan order arrays of 4 rows by 4 columns, scanned row by row, then repeated at higher levels can generate a wide range of possible scan patterns by interleaving digits of different bases the principle of digit interleaving is very widespread, and is built into the PLSS and the GEOLOC grid, as well as numerous systems for map indexing and georeferencing REFERENCES Abel, D.J., 1986. "Bit interleaved keys as the basis for spatial access in a front-end spatial database management system," Proceedings, Tesseral Workshop #2, Reading, England. Franklin, W., 1979. "Evaluation of algorithms to display vector plots on raster devices," Computer Graphics and Image Processing 11:377-397. Goodchild, M.F., and A.W. Grandfield, 1983. "Optimizing raster storage: an examination of four alternatives," Proceedings, AutoCarto 6, Ottawa, 1:400-7. Peuquet, D., 1981. "An examination of techniques for reformatting digital cartographic data, Part II, The vector-to-raster Process," Cartographica 18(3):21-33. DISCUSSION AND EXAM QUESTIONS 1. What systems are used for topographic map indexing in the US and other countries? Discuss the use of digit interleaving in this context, using different national examples. 2. The term metadata is used to refer to information carried with a map layer, such as its accuracy, numbers of rows and columns, type of data stored for each pixel, etc. Discuss the importance of metadata in limiting the operations which a user is allowed to perform on a map layer. 3. Raster and vector have developed as two partially independent traditions in GIS. Summarize the dimensions of the raster-vector debate, particularly in the importance of spatial objects in the two systems. 4. All of the scan orders discussed in this unit visit each pixel exactly once. Discuss the potential advantages, if any, of scan orders which visit certain pixels more than once. Give examples. 5. Any raster GIS places restrictions on what can be stored in each pixel of a map and what operations can be carried out. Discuss this point as it applies to IDRISI, and any other raster GIS to which you may have access. Will it let you store an alphabetic value such as A in a pixel and then allow you to carry out arithmetic operations on this layer? 6. Find out what raster storage option (row by row, run encoded, pixel by pixel, layer by layer, etc.) is used by IDRISI and any other raster GIS (GRASS, MAP, etc.) to which you have access. HIERARCHICAL DATA STRUCTURES A. INTRODUCTION B. INDEXING PIXELS Procedure Decoding locations C. THE QUADTREE Coding quadtrees Accessing data through a quadtree Comparison of different data structures D. VARIANTS OF QUADTREES E. ADVANTAGES OF HIERARCHICAL DATA STRUCTURES REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES UNIT 36 - HIERARCHICAL DATA STRUCTURES A. INTRODUCTION different scan orders produce only small differences in compression the major reason for interest in Morton and other hierarchical scan orders is for faster data access the amount of information shown on a map varies enormously from area to area, depending on the local variability it would make sense then to use rasters of different sizes depending on the density of information large cells in smooth or unvarying areas, small cells in rugged or rapidly varying areas unfortunately unequal-sized squares won''t fit together ("tile the plane") except under unusual circumstances one such circumstance is when small squares nest within large ones there are, however, some methods for compressing raster data that do allow for varying information densities B. INDEXING PIXELS overhead - Raster to quadtree I and II handout - Raster to quadtree consider the 16 by 16 array in which just one cell is different notation: row and column numbering starts at 0 thus the odd cell is at row 4, column 7 Procedure begin by dividing the array into four 8x8 quadrants, and numbering them 0, 1, 2 and 3 as in the Morton order quads 1, 2 and 3 are homogeneous (all A) quad 0 is not homogeneous, so we divide only it into four 4x4 quads these are numbered 00, 01, 02 and 03 because they are partitions of the 8x8 quad 0 of these, 00, 01 and 02 are homogeneous, but 03 is divided again into 030, 031, 032 and 033 now only 031 is not homogeneous, so it is divided again into 0310, 0311, 0312 and 0313 what we have done is to recursively subdivide using a rule of 4 until either: a square is homogeneous or we reach the highest level of resolution (the pixel size) this allows for discretely adaptable resolution where each resolution step is fixed this concept is related to the use of Morton order for run encoding if we had coded the raster using Morton order, each homogeneous square would have been a run 8x8 squares are runs of 64 in Morton order, 4x4 are runs of 16, etc the run encoded Morton order would have been: 16A 16A 16A 4A 1A 1B 1A 1A 4A 4A 64A 64A 64A if we allow runs to continue between blocks we could reduce this to: 53A 1B 202A i.e. a homogeneous block of 2m by 2m pixels is equivalent to a Morton run of 22m pixels Decoding locations the conversion to row and column is the same as for decoding Morton numbers except that in this case the code is in base 4 in the example the lone B pixel is assigned code 0311 1. convert the code to base 2 hint: every base 4 digit converts to a pair of base 2 digits thus 0311 becomes 00110101 2. separate the bits to get: row 0100 = 4 column 0111 = 7 so the numbering system is just the Morton numbering of blocks, expressed in base 4 however, sequence and data compression are not the most useful aspects of this concept C. THE QUADTREE can express this sequencing as a tree the top is the entire array at each level there is a four-way branching each branch terminates at a homogeneous block the term quadtree is used because it is based on a rule of 4 overhead - Quadtree each of the terminal branches in the tree (the ones having values) is known as a leaf in this case there are 13 leafs or homogeneous square blocks Coding quadtrees to store this tree in memory, need to decide what to store in each memory location there are many ways of storing quadtrees, but they all share the same basic ideas one way is to store in each memory location EITHER: 1. the value of the block (e.g. A or B), or or 2. a pointer to the first of the four "daughter" blocks at the next level down all four daughter blocks of any parent always occur together overhead - Coding quadtrees thus, the quadtree might be stored in memory as: Position: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Contents: 2 6 A A A A A A 10 A 14 A A A B A A (level):0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 the content of position 1 is a pointer indicating that the map is subdivided into four blocks whose contents can be found starting at position 2 position 2 indicates that the four parts of the 0 block can be found beginning at position 6 positions 3, 4 and 5 indicate that the other three level 1 blocks are all A and are not further subdivided Accessing data through a quadtree consider two ways in which this quadtree may be accessed: 1. find all parts of the map with a given value 2. determine the contents of a given pixel notation: if the array has 2n by 2n pixels there are n possible levels in the tree, or n+1 if we count the top level (level 0) use m for the number of leafs 1. to find the parts of a map with a given value we must examine every leaf to see if its value matches the one required this requires m steps as there are m leafs 2. to find the contents of a given pixel, start at the top of the tree if the entire map is homogeneous, stop as the contents of the pixel are known already if not, follow the branch containing the pixel do know which branch to follow: take the row and column numbers, write them in binary, interleave the bits, and convert to base 4 e.g. row 4, column 7 converts to 0311 at each level, use the appropriate digit to determine which branch to follow e.g. for 0311, at level 0 follow branch 0, at level 1 follow branch 3, etc. in the worst case, may have to go to level n to find the contents of the pixel, so the number of steps will be n Comparison of different data structures summary of the work needed to perform the two types of queries: OptionFind parts with given valueFind contents of pixel Quadtree m n Row by row 22n (a) 1 (b) Row by row run encoded m (c) m (d) Morton run encoded m (c) m (d) notes: (a) must examine every pixel in the array (b) can calculate the position of every pixel and access it directly (c) the number of runs will be approximately the same as the number of leafs although the orders are different, we decided earlier that order made little difference to the number of runs (d) each run must be examined to see if it contains the pixel thus, quadtree structures offer definite advantages over other systems for queries HIERARCHICAL DATA STRUCTURES D. VARIANTS OF QUADTREES the octree (or octtree) is a three-dimensional version of the quadtree, based on a rule of 8 the cube is divided recursively into eight pieces octrees are useful for 3D data, particularly in mining and geology, and in medical imaging see Unit 42 for more on this topic global data presents significant problems we might use a projection such as the Mercator, and represent the data as a raster on this projection the area and shape of the area represented by each pixel would be significantly distorted, particularly near the poles the relationships between neighboring pixels would be distorted as well in reality all of the pixels in the top and bottom rows are neighbors of each other across the poles diagram these problems create serious distortions in models based on such data a more suitable approach devised by Geoffrey Dutton is as follows: overhead/handout Global tesselation project the globe onto an octahedron, consisting of eight triangles the vertices of the octahedron are at the poles, and 90 degrees apart around the equator number these from 0 through 7 each triangle is recursively divided using a rule of 4 into 4 smaller triangles, by connecting the midpoints of the edges number these 0 (central triangle), 1 (vertically above or below), 2 (diagonally to the left) and 3 (diagonally to the right) level 20 in this scheme has a resolution (triangle size) of about 1 m on the earth''s surface its address requires one base 8 digit and 20 base 4 digits, or 43 (binary) bits E. ADVANTAGES OF HIERARCHICAL DATA STRUCTURES both coordinates are in a single address a single number indicates a 2D location every square meter on the earth''s surface has a consistent 21-digit address resolution is automatically known from the length of the address in the global scheme described above, a 21-digit address has 1 m resolution, for 1 km resolution we need only a 13-digit address in comparison, a lat/long address needs two numbers, and it is not always easy to tell resolution from the way the numbers are presented REFERENCES Gargantini, I., 1982. "An effective way to represent Quadtrees," Communications of the ACM 25:905-910. Mark, D.M., J.P. Lauzon and J.A. Cebrian, 1989. "A review of quad-tree based strategies for interfacing coverage data with digital models in grid form," International Journal of Geographical Information Systems 3(1):3-14. Samet, Hanan, 1984. "The quadtree and related hierarchical data structures," ACM Computer Surveys 16(2):187-260. Samet, Hanan, 1989. The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, MA. Contains an excellent review of quadtrees. Samet, Hanan, 1989. Applications of Spatial Data Structures. Addison-Wesley, Reading, MA. Contains an extensive review of applications of quadtrees. Shaffer, C.A., H. Samet and R.C. Nelson, 1990. "QUILT: a geographic information system based on quadtrees," International Journal of Geographical Information Systems 4(2):103-32. Waugh, Thomas C., 1986. "A response to recent papers and articles on the use of quadtrees for geographic information systems," Proceedings, Second IGU Symposium, Seattle, pp. 33-37. An interesting critique of quadtrees. DISCUSSION AND EXAM QUESTIONS 1. "Hierarchical data structures are one of the few genuinely new concepts to come out of GIS research". Discuss. 2. Discuss the arguments presented for and against quadtrees in Waugh, 1986. 3 Summarize the arguments for and against the octahedral decomposition of the globe presented in the unit, as a means of representing and analyzing global databases. 4. Modify the quadtree concept so that it is suitable as a method of storing digital elevation data. What are the advantages and disadvantages of this approach over a simple raster? 5. What would be the advantages and disadvantages of using Dutton''s global tesselation as the primary means of discrete georeferencing, rather than street address? 6. Three-dimensional spatial databases are of great interest in geology, geophysics, subsurface hydrology, atmospheric science, oceanography and the mining and oil and gas industries. With reference to one or more of these, discuss the suitability of the octtree as a data model. QUADTREE ALGORITHMS AND SPATIAL INDEXES A. INTRODUCTION Definition B. AREA ALGORITHM Procedure Example C. OVERLAY ALGORITHM Procedure Result D. ADJACENCY ALGORITHM Problem Definition Two cases Tesseral Arithmetic Determining Adjacency Length of common boundary E. AREA OF A CONTIGUOUS PATCH ALGORITHM Problem Method Algorithm Results F. QUADTREE INDEXES Indexing using quadtrees Setting up the index Using the index Generalizations G. R-TREE INDEXES Method Problem REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES This unit is very long and deals with more advanced algorithms. Depending on the abilities and interests of your students, you may want to omit the third and fourth algorithms included or consider providing this as extra handouts. Advanced students may be pleased to have the opportunity to examine the more subtle, complex nature of these advanced algorithms. The later section on indexes does not depend on material covered in the earlier sections. UNIT 37 - QUADTREE ALGORITHMS AND SPATIAL INDEXES A. INTRODUCTION the previous unit defined the basic idea of a quadtree this unit examines how quadtrees are used in several simple processes, including: measurement of area overlay finding adjacent leafs measuring the area of contiguous patches in addition, this unit will look at how quadtrees can be used to provide indexes for faster access to vector-coded objects finally, alternative forms of spatial indexing will be reviewed Definition to traverse a quadtree: begin by moving down the leftmost branch to the first leaf after processing each leaf in this branch, move back up to the previous branching point, and turn right this will either lead down to another leaf, or back to a previous branching point diagram overhead - First map several of the following examples use this simple raster and its associated quadtree B. AREA ALGORITHM Procedure to measure the area of A on the map: traverse the tree and add those leafs coded A, weighted by the area at the level of the leaf Example in the example quadtree, elements at level 0 have area 16, at level 1 - area 4, at level 2 - area 1 thus, area of A is: 1 (leaf 00) + 1(leaf 02) + 1 (leaf 03) + 4 (leaf 2) + 1 (leaf 32) = 8 units C. OVERLAY ALGORITHM overhead - Second map note: this overhead can be physically overlayed on First map Procedure to overlay the two maps: traverse the trees simultaneously, following all branches which exist in either tree where one tree lacks branches (has a leaf where the other tree has branches), assign the value of the associated leaf to each of the branches e.g. node 3 is branched on map 1, not on map 2 the leafs derived from this node (30, 31, 32 and 33) have values B, B, A and B on map 1, all 2 on map 2 the new tree has the attributes of both of the maps, e.g. A1, B2 Result overhead - First map + Second map D. ADJACENCY ALGORITHM Problem find if two leafs (e.g. 03 and 2) are adjacent Corollary: find the leafs adjacent to a given leaf (e.g. 03) note that in arc based systems adjacencies are coded in the data structure (R and L polygons), so this operation is simpler with vector based systems Definition here adjacent means sharing a common edge, not just a common point diagram Two cases leaf codes are: 1. same length (same size blocks, e.g. 01 and 02) or 2. one is longer than the other (different size blocks, e.g. 03 and 2) solving this problem requires the use of: 1. conversion from base 4 to binary and back base 4 because of the "rule of 4" used in constructing quadtrees 2. bit interleaving 3. a new concept called Tesseral Arithmetic Tesseral Arithmetic tesseral arithmetic is an alternate arithmetic useful for working with the peculiarities of quadtree addressing to add binary numbers normally, a "carry" works to the position to the left e.g. adding 1 to 0001 gives 0010 this is the same as decimal arithmetic except that carries occur when the total reaches 2 instead of 10 in tesseral arithmetic, a "carry" works two positions to the left e.g. adding 1 to 0001 gives 0100 the reverse happens on subtraction 1000 less 1 is 0010 not 0111, as the subtraction affects only the alternate bits in other words, if we number the bits from the left starting at 1 adding or subtracting 1 affects only the even- numbered bits adding or subtracting 2 (binary 10) affects only the odd-numbered bits Determining Adjacency handout - Determining adjacency 1. same size blocks: two leafs are adjacent if their binary representations differ by binary 1 or 10 (decimal 1 or 2) in tesseral arithmetic example: 01 and 03 are adjacent because 0001 and 0011 differ by binary 10, or decimal 2 example: 033 and 211 are adjacent because in tesseral arithmetic 001111 + 10 = 100101, or 100101 - 10 = 001111 2. different size blocks: taking the longer of the two codes: convert it from base 4 to binary tesseral-add and -subtract 01 and 10 to create four new codes reject any cases where subtracting was not possible (a "negative" code would have resulted, or a "carry" would have been necessary to the left of the leftmost digit) discard the excess rightmost digits in the resulting transformed longer codes convert back to base 4 to get the leaf the two blocks are adjacent if any of the transformed and truncated codes are equal to the shorter code example: Are 02 and 2 adjacent? convert 02 to binary = 0010 0010 + 1 = 0011 0010 + 10 = 1000 0010 - 1 (impossible) 0010 - 10 = 0000 truncating gives 00 and 10 these are equal to 0 and 2 in base 4 therefore, 02 and 2 are adjacent (also 02 and 0 are adjacent) example: Are 033 and 2 adjacent? convert 033 to binary = 001111 001111 + 1 = 011010 001111 + 10 = 100101 001111 - 1 = 001110 001111 - 10 = 001101 truncating to two digits gives 01, 10 and 00 these are equal to 1,2 and 0 in base 4 therefore, 033 and 2 are adjacent example: Find leafs adjacent to 03 in the first map above method: find the codes of adjacent blocks of the same size, then work down the tree to find the appropriate leaf (note: can only find equal or shorter codes - equal or bigger leaf blocks) 0011 + 1 = 0110 = 12 : leaf 1 0011 + 10 = 1001 = 21 : leaf 2 0011 - 1 = 0010 = 02 : leaf 02 0011 - 10 = 0001 = 01 : leaf 01 Length of common boundary the length of common boundary between the two blocks is determined by the level of the longer code can use this to construct an algorithm to determine the perimeter of a patch e.g. the length of the A/B boundary in the first example map diagram QUADTREE ALGORITHMS AND SPATIAL INDEXES E. AREA OF A CONTIGUOUS PATCH ALGORITHM Problem find the area of a contiguous patch of the same value, e.g. all A Corollary: How many separate patches of A are there? note: this is a general method which can be used in both quadtree and vector data structures i.e. find contiguous sets of quadtree blocks or irregularly shaped polygons, given that adjacencies are known or can be determined the following example uses the original raster map note that there are only two contiguous patches; the areas of A and B form only one patch each Method handout - Area of a contiguous patch (2 pages) create a list of leafs, with their associated codes, by traversing the tree allow space for a "pointer" for each leaf, and give it an initial value of 0 (see handout) Algorithm for each leaf i: find all adjacent leafs j with equal or shorter length codes (4 maximum) if the adjacent leaf j has the same value, determine which of i and j has the higher (larger value) position in the list, and set its pointer to the lower position (note: if a pointer has already been changed, it may be changed again or left, the result is the same) this produces the final pointer list Results 1. the number of contiguous patches will be equal to the number of zeros in the example, two pointers are zero, indicating two contiguous patches 2. the value of each patch can be obtained by looking up the values of leafs with 0 pointers in the example, leafs 00 and 01 have 0 pointers these have the values A and B respectively 3. to find the area of each patch, select one of the zeros and sum its area plus the areas of any leafs which point to it directly or indirectly the component leafs of each patch can be found by starting at with a leaf at the end (or beginning) of the list and following the pointers until a 0 is found e.g. leaf at position 10 (code 33) points to 8, which points to 7, which points to 5, which points to 2, which has a zero pointer therefore, leaf position 10 (code 33) is part of the same patch as leaf 2 (code 01) and has the value B the areas can be found by summing the leaf areas for the example: A leafs: 00 02 03 2 32 A positions: 1 3 4 6 9 Area of A: 1 + 1 + 1 + 4 + 1 = 8 B leafs: 01 1 30 31 33 B positions: 2 5 7 8 10 Area of B: 1 + 4 + 1 + 1 + 1 = 8 F. QUADTREE INDEXES Indexing using quadtrees indexes are used in vector systems to get fast access to the objects in a particular area of a map very useful in searching for potentially overlapping or intersecting objects therefore, they are an essential part of a polygon overlay operation in Unit 34, looked at the usefulness of a simple sort of objects on one axis (e.g. x) in the moving band operation for intersection calculations now will look at methods which can be thought of as sorting on both axes simultaneously these use 2D coding systems and a simple one dimension sort Setting up the index overhead - Quadtree indexes steps are: 1. for each object (point, line, area) in the database, find the smallest quadtree leaf which encloses the object some large objects will have to be classified as NULL, as they span more than one of the four leafs in the first branching (0, 1, 2 and 3) other smaller objects may be enclosed within a small leaf, e.g. 031 2. sort or index the objects by the enclosing quadtree leafs Using the index to find all objects which might intersect an area, line or point of interest find the quadtree leaf enclosing the object of interest starting at this point follow up the quadtree through all branching points that contain the original cell and down the quadtree to all branching points and leafs below the cell example: the area of interest is enclosed in leaf 31 of the original example quadtree (overhead - First map) the objects which may intersect the area of interest are those in leaf 31 and all leafs above it thus, these are 3 and the null leaf objects in other (remote) leafs cannot intersect the area of interest, so need not be checked example: the area of interest is enclosed in leaf 0 the objects which may intersect the area are in leaf 0, the null leaf and all leafs below 0 - 00, 01, 02, 03 there may be other leafs below these as well such as 010, 011, 012, 013, etc Generalizations quadtree indexing is most effective for small objects, particularly points large objects tend to require large enclosing leafs even though they may not fill much of the space (i.e. highway corridors) these objects will always need to be checked for intersection it may pay to subdivide objects so that the pieces fall entirely within smaller leafs indexing in this way is intuitively more efficient than indexing by x or y alone since the quadtree index is effectively two-dimensional the divisions at each branching need not be equal in size it may pay to have some blocks of smaller area and some of larger area, rather than four equal squares at each branching however, for general efficiency the blocks should be rectangular G. R-TREE INDEXES R-tree indexes are a response to the problem of indexing large areas R stands for "range", a concept similar to MER Method overhead - R-tree find two, possibly overlapping, rectangles (aligned with x and y axes) such that: as many objects as possible are wholly within one or the other rectangle there are roughly equal numbers of objects wholly enclosed in each rectangle the overlap between the rectangles is minimum indexing is determined by the rectangle in which the object is contained objects which are wholly within a rectangle are associated with that rectangle objects which are not wholly within either of the two rectangles are associated with the undivided map apply the procedure recursively, finding two new smaller rectangles within each existing rectangle this creates a tree structure similar to the quadtree every object is associated with some node in the tree to find the objects which might intersect a given area of interest: find the smallest rectangle used in the indexing procedure which wholly encloses the area of interest the objects are those associated with this rectangle and all nodes above and below it in the tree Problem although benchmark tests have shown that R-trees are generally more efficient than quadtrees and simple 1-D sorts, they are computationally intensive to construct REFERENCES Buchmann, A., O. Gunther, T.R. Smith and Y.-F. Wang. Design and Implementation of Large Spatial Databases, Unit Notes in Computer Science 409, Springer Verlag, Berlin. Contains a collection of papers on spatial data indexing. Guttman, A, 1984. "R-trees: A dynamic index structure for spatial searching," ACM SIGMOD, pp. 47-57. Mark, D.M., and J.P. Lauzon, 1984. "Linear quadtrees for Geographic Information Systems," Proceedings, International Symposium on Spatial Data Handling, Zurich, 2:412-430. Noronha, V., 1988. "A survey of Hierarchical Partitioning Methods for Vector Images," Proceedings, Third International Symposium on Spatial Data Handling, Sydney, Australia, pp. 185-199. Oosterom, P. van, 1990. "A modified binary space partitioning tree for geographic information systems," International Journal of Geographical Information Systems 4(2):133-46. The two Samet books listed as references for Unit 36 contain useful discussions of quadtree algorithms. DISCUSSION AND EXAM QUESTIONS 1. Compare the formal methods of indexing (quadtree, R-tree, 1-D sort) with informal methods in common use (e.g. continents, nation-states, major civil divisions, ZIP codes, etc.). 2. How would you design a study to compare the effectiveness of different indexing schemes? What data would you use? What measures would you compare? 3. Current vector-based systems use a wide variety of indexing schemes. Why is there no consensus as to the best? What methods are best for what purposes and area of application? 4. Devise a means of measuring the Manhattan distance between two quadtree blocks (assume the codes have the same length). DIGITAL ELEVATION MODELS A. INTRODUCTION What is a Digital Elevation Model? Creation of DEMs Uses of DEMs Hydrologic functions on DEMs B. ESTIMATING ELEVATION C. ESTIMATING SLOPE AND ASPECT D. DETERMINING DRAINAGE NETWORKS Determining the watershed Determining the network Characteristics of automatically derived networks REFERENCES DISCUSSION AND EXAMINATION QUESTIONS NOTES UNIT 38 - DIGITAL ELEVATION MODELS Compiled with assistance from Brian Klinkenberg, University of British Columbia A. INTRODUCTION surfaces such as the surface of the earth, are continuous phenomena rather than discrete objects to fully model the surface, would need an infinite amount of points there are various ways of representing continuous surfaces in digital form using a finite amount of storage Unit 11 introduces spatial database models that are used for continuous surfaces this unit will look at digital elevation models as one way of representing surfaces and will examine some important algorithms based on DEMs What is a Digital Elevation Model? the term digital elevation model or DEM is frequently used to refer to any digital representation of a topographic surface however, most often it is used to refer specifically to a raster or regular grid of spot heights this is the definition that is used here digital terrain model or DTM may actually be a more generic term for any digital representation of a topographic surface, but it is not so widely used the DEM is the simplest form of digital representation of topography and the most common a variety of DEMs are available, including coverage of much of the US from the US Geological Survey the resolution, or the distance between adjacent grid points, is a critical parameter the best resolution commonly available is 30 m, with a vertical resolution of 1 m coverages of the entire globe, including the ocean floor, can be obtained at various resolutions Creation of DEMs several different methods have been used to create DEM series like those from the USGS see USGS (1987) for more details on the following conversion of printed contour lines existing plates used for printing maps are scanned the resulting raster is vectorized and edited contours are "tagged" with elevations additional elevation data are created from the hydrography layer i.e. shorelines provide additional contours finally, an algorithm is used to interpolate elevations at every grid point from the contour data by photogrammetry this can be done manually or automatically: manually, an operator looks at a pair of stereophotos through a stereoplotter and must move two dots together until they appear to be one lying just at the surface of the ground automatically, an instrument calculates the parallax displacement of a large number of points e.g. for USGS 7.5 minute quadrangles, the Gestalt Photo Mapper II correlates 500,000 points extraction of elevation from photographs is confused by flat areas, especially lakes, and wherever the ground surface is obscured (buildings, trees) there are two techniques for choosing sample points when using manual photogrammetry: 1. profiling the photo is scanned in rows, alternately left to right and right to left, to create profiles a regular grid is formed by resampling the points created in this process because the process tends to underestimate elevations on uphill parts of each profile and overestimate on downhill parts, the resulting DEMs show a characteristic "herringbone" effect when contoured 2. contour following contour lines are extracted directly from stereopairs during compilation of standard USGS maps contour data are processed into profile lines and a regular grid is interpolated using the same algorithms used for manual profiling data DEMs from each source display characteristic error artifacts e.g. effects of mis-tagged contours in the products of scanned contour lines Uses of DEMs determining attributes of terrain, such as elevation at any point, slope and aspect finding features on the terrain, such as drainage basins and watersheds, drainage networks and channels, peaks and pits and other landforms modeling of hydrologic functions, energy flux and forest fires Hydrologic functions on DEMs the principal components of a drainage basin are its topographic form and the topologic structure of its drainage network the quantification of these components is tedious and time consuming when accomplished manually the automated determination of these components is an ideal application of GIS technology watersheds comprise one method of completely partitioning space and many environmental phenomena can be related to watersheds determination of the drainage network and the associated drainage divides provides an important first step in the creation of a hydrologic information system registration and segmentation of digital imagery can be enhanced if use is made of the drainage basin information knowledge of the drainage divides and of the drainage network can be used to provide better estimates of slopes and aspects (e.g., slopes should break at divides and at channels) in this unit we look at a number of simple algorithms for DEMs B. ESTIMATING ELEVATION to estimate the elevation of some point, we need to know first whether the point of interest is exactly at a point in the raster, or in between in the first case, the elevation can be taken directly from the database in the second case, we need to develop some method of interpolation, or estimation of elevation can use the elevation of the nearest point, but this leads to sharp changes of elevation halfway between points instead, the normal approach is to fit a plane to the nearby raster points, and use it to estimate elevation at any point the plane passing through these points is represented as: z = a + bx + cy since a plane will generally not pass exactly through all the points the plane which minimizes the sum of squared elevation differences between the plane and the data at each of the nearby points is often used can determine the equation of the plane as follows: use the four nearest grid points (known as the "neighborhood" of the point or the "2x2 window" around the point) define an origin in the middle of the 2x2 window, and give the four neighboring points the coordinates (-1,-1), (-1,1), (1,-1) and (1,1) since the four points are evenly spaced, the coefficients in the equation can be calculated from the following: a = (z1 + z2 + z3 + z4)/4 b = (-z1 + z2 - z3 + z4)/4 c = (-z1 - z2 + z3 + z4)/4 note: the coefficients can be solved using larger neighborhoods, e.g. the nearest 9 points (see handout) having determined the coefficients, the elevation (z) can be determined from: z = a + bx + cy DIGITAL ELEVATION MODELS C. ESTIMATING SLOPE AND ASPECT slope and aspect can be calculated from the fitted plane to estimate these at a raster point, a 3x3 window centered on the point is usually used slope is calculated from: / (b2 + c2) aspect is calculated from: tan-1 c/b normally a "slope map" or "aspect map" will display the attribute values generalized over areas (regions) instead of at points, such that within each area, all slopes fall into a certain range (e.g. 10-15%) or all aspects fall into a certain quadrant (e.g. NW) to generate such a map, slope or aspect is determined at each raster point, and then these values are aggregated into polygons based on a set of predefined ranges this way of representing slope or aspect is not as accurate as the original raster form since both slope and aspect are derivable from elevation by a simple process, is there any need to store them as separate layers? D. DETERMINING DRAINAGE NETWORKS a raster DEM contains sufficient information to determine general patterns of drainage and watersheds think of each raster point as the center of a square cell the direction of flow of water out of this cell will be determined by the elevations of surrounding cells algorithms to determine the flow direction generally use one of the following cases: 1. assume only 4 possible directions of flow (up, down, left, right - the Rook''s move directions from chess); or 2. assume 8 possible directions (the Queen''s move directions) in both cases, number the move directions clockwise from up water is assumed to flow from each cell to the lowest of its neighbors if no neighbor is lower, the cell is a "pit" and gets code "0" combinations which would be hydrologically impossible, such as a 4 to the left of a 6 in the 8 move case, are logically impossible in this scheme Determining the watershed a watershed is defined here as an attribute of each point on the network which identifies the region upstream of that point to find a watershed begin at the specified cell and label all cells which drain to it, then all which drain to those, etc. until the upstream limits of the basin are defined the watershed is then the polygon formed by the labeled cells Determining the network to draw the drainage network, connect the moves with arrows a zero on the edge of the array is interpreted as a channel which flows off the area since in natural systems, small quantities of water generally flow overland, not in channels, we may want to accumulate water as it flows downstream through the cells so that channels begin only when a threshold volume is reached accumulation of volume proceeds as follows: start by setting each cell to zero then beginning at each cell, add one to it and all cells downstream of it, following the directions indicated in the network to simulate actual stream channels, assume a channel begins only when the accumulated water passing through a cell reaches some critical value this means that small tributaries in the examples above will be deleted in the example, channels start only when the flow reaches a volume of 2 the networks found by this process can be thought of as estimates of real channel networks real networks consist of junctions or forks, links, and sources, and all of these can be identified on the simulated networks Characteristics of automatically derived networks how do networks obtained from DEMs differ from real ones? real streams sometimes branch downstream but this is impossible using this method, the simulated networks cannot bifurcate DEM data contains large numbers of ties of elevation, because the vertical resolution is not very high using this method, water cannot "flow" from one cell to an adjacent cell with the same elevation as a result, ties can lead to large numbers of unwanted pits e.g. in this example, using Rook''s case (4 directions) central cell has no outflow direction to avoid the problem, allow water to flow between neighbors at the same elevation, determining the direction of flow by evaluating local slope (i.e. over a larger window) e.g. here the local slope is to the south alternatively, deal with the problem by regarding the cell as a very small lake, and simulating its overflow (see next point) pits occur frequently on DEMs, largely as a result of data errors if a cell has no lower neighbors, it is a pit the pit can be "flooded" to form a "lake" by the following process: initiate a lake at the elevation of the cell, with a "shoreline" defined by the cell''s perimeter find the lowest cell adjacent to the shoreline, raise the lake to that level and expand the shoreline to include it if one of the neighbors is now lower than the lake, it is the outlet: terminate the process if the lowest neighbor is part of another lake, merge the lakes and continue the number of streams joining at a junction, known as the valency of the junction, is almost always 3 in reality, but may be as high as 4 with the 4-move case, and 8 with the 8-move case junction angles are determined by the cell geometry in the simulation, but in reality are a function of the terrain and the erosion process in areas of uniform slope the technique generates large numbers of parallel streams in reality streams tend to wander because of unevenness, and the resulting junctions reduce the density of streams in areas of approximately uniform slope drainage density is very high in the simulations in many types of terrain, channels are incised, and often the width of the incised channel is too small to show on the DEM this problem can be dealt with by searching the DEM for possible channels - see Band (1986) for example Summary these methods do well on highly dissected landscapes of high drainage density they do better at modeling watershed boundaries than drainage channels therefore, ideally, a spatial database for modeling runoff and other processes related to hydrology should include both the DEM and the stream channels themselves (the "blue lines" of a topographic map) REFERENCES Band, L.E., 1986. "Topographic partition of watersheds with digital elevation models," Water Resources Research 22(1):15-24. Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment, Clarendon, Oxford. Chapter 3 reviews alternative methods of terrain representation. Evans, I.S., 1980. "An integrated system for terrain analysis and slope mapping," Zeitschrift fur Geomorphologie 36:274-95. Marks, D., J. Dozier and J. Frew, 1984. "Automated basin delineation from digital elevation data," Geoprocessing 2:299-311. O''Callaghan, J.F. and D.M. Mark 1984. "The extraction of drainage networks with lakes," Water Resources Research, 18(2):275-280. Pfaltz, J.L., 1976. "Surface networks," Geographical Analysis 8:77-93. Discussion of surface-specific points and their relationship to ridge and channel lines. USGS, 1987. Digital Elevation Models, Data Users Guide 5, US Department of the Interior, USGS, Reston, VA. Describes the creation and data structures of USGS DEMs in detail. DISCUSSION AND EXAMINATION QUESTIONS 1. Discuss some of the problems encountered with algorithms which extract drainage networks from digital elevation models, and present some possible solutions to those problems. 2. How would the incorporation of hydrologic information-- such as drainage divides and stream networks--into a GIS assist a resource manager? 3. Discuss the problems of obtaining maps of slope and aspect from DEMs. 4. What possible ways are there for displaying a DEM on a computer screen? Discuss the advantages and disadvantages of each from the point of view of a) the users and b) the programmers. THE TIN MODEL A. INTRODUCTION The TIN model Creating TINs B. HOW TO PICK POINTS 1. Fowler and Little algorithm 2. VIP (Very Important Points) Algorithm 3. Drop heuristic C. HOW TO TRIANGULATE A TIN 1. Distance ordering 2. Delaunay triangulation D. ALTERNATIVE METHODS OF CREATING TINS Break lines TINs from contours E. STORING TINS 1. Triangle by triangle 2. Points and their neighbors Comparison of the two structures F. ALGORITHMS ON TINS Slope and aspect Contouring Finding Drainage Networks REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES UNIT 39 - THE TIN MODEL Compiled with assistance from Thomas K. Poiker, Simon Fraser University A. INTRODUCTION the Triangulated Irregular Network model is a significant alternative to the regular raster of a DEM, and has been adopted in numerous GISs and automated mapping and contouring packages the TIN model was developed in the early 1970''s as a simple way to build a surface from a set of irregularly spaced points several prototype systems were developed in the 1970''s commercial systems using TIN began to appear in the 1980''s as contouring packages, some embedded in GISs The TIN model irregularly spaced sample points can be adapted to the terrain, with more points in areas of rough terrain and fewer in smooth terrain an irregularly spaced sample is therefore more efficient at representing a surface in a TIN model, the sample points are connected by lines to form triangles within each triangle the surface is usually represented by a plane by using triangles we ensure that each piece of the mosaic surface will fit with its neighboring pieces - the surface will be continuous - as each triangle''s surface would be defined by the elevations of the three corner points it might make sense to use more complex polygons as mosaic tiles in some cases, but they can always be broken down into triangles for example, if a plateau is eroded by gullies, the remaining plateau would be a flat (planar) area bounded by an irregular, many-sided polygon. In the TIN model it would be represented by a number of triangles, each at the same elevation diagram for vector GISs, TINs can be seen as polygons having attributes of slope, aspect and area, with three vertices having elevation attributes and three edges with slope and direction attributes the TIN model is attractive because of its simplicity and economy in addition, certain types of terrain are very effectively divided into triangles with plane facets this is particularly true with fluvially-eroded landscapes however, other landscapes, such as glaciated ones, are not well represented by flat triangles triangles work best in areas with sharp breaks in slope, where TIN edges can be aligned with breaks, e.g. along ridges or channels Creating TINs despite its simplicity, creating a TIN model requires many choices: how to pick sample points in many cases these must be selected from some existing, dense DEM or digitized contours normally, a TIN of 100 points will do as well as a DEM of several hundred at representing a surface how to connect points into triangles how to model the surface within each triangle this is almost always resolved by using a plane surface however, if the surface is contoured, the contours will be straight and parallel within each triangle, but will kink sharply at triangle edges diagram consequently, some implementations of TIN represent the surface in each triangle using a mathematical function chosen to ensure that slope changes continuously, not abruptly, at the edges of the triangle B. HOW TO PICK POINTS given a dense DEM or set of digitized contours, how should points be selected so that the surface is accurately represented? consider the following 3 methods for selecting from a DEM all of them try to select points at significant breaks of the surface such breaks are common on terrain, absent on smooth mathematical surfaces 1. Fowler and Little algorithm this approach is based on the concept of surface-specific points which play a specific role in the surface e.g. represent features such as peaks and pits Procedure first examine the surface using a 3x3 window, looking at a small array of 9 points at each step label the 8 neighbors of the central point + if higher, - if lower a point is a peak if its 8 neighbors are all lower (8 +s) a point is a pit if its 8 neighbors are all higher (8 -s) a point is a pass if the +s and -s alternate around the point with at least two complete cycles, e.g + + - - - - + + (2 cycles) or + - + - - + - + (4 cycles) next the surface is examined using a 2x2 window except at the edges, every point appears in four positions of the window a point is a potential ridge point if it is never lowest in any position of the window a point is a potential channel point if it is never highest in any position of the window then starting at a pass, search through adjacent ridge points until a peak is reached similarly, search from the pass through adjacent channel points until a pit is reached Finishing the TIN the result of this process is a connected set of peaks, pits, passes, ridge lines and channel lines Fowler and Little recommend that the number of points in each ridge and channel line be reduced by thinning using a standard thinning algorithm it may be desirable to add additional points from the DEM which are not on ridges or channels if we can significantly reduce any substantial differences from the real surface by doing so triangles are built between all selected points the resulting surface will differ from the original DEM, perhaps substantially in some areas Comments the Fowler and Little algorithm is complex performs better on some types of landscape than others, particularly where there are sharp breaks of slope along ridges, and where channels are sharply incised it may require substantial "fine tuning" to work well 2. VIP (Very Important Points) Algorithm unlike the previous algorithm which tries to identify the major features of the terrain, VIP works by examining the surface locally using a window this is a simplification of the technique used in ESRI''s ARC/INFO Procedure each point has 8 neighbors, forming 4 diametrically opposite pairs, i.e. up and down, right and left, upper left and lower right, and upper right and lower left for each point, examine each of these pairs of neighbors in turn connect the two neighbors by a straight line, and compute the perpendicular distance of the central point from this line diagram average the four distances to obtain a measure of "significance" for the point delete points from the DEM in order of increasing significance, deleting the least significant first this continues until one of two conditions is met: the number of points reaches a predetermined limit the significance reaches a predetermined limit Comments because of its local nature, this method is best when the proportion of points deleted is low because of its emphasis on straight lines, and the TIN''s use of planes, it is less satisfactory on curved surfaces diagram 3. Drop heuristic this method treats the problem as one of optimization given a dense DEM, find the best subset of a predetermined number of points such that when the points are connected by triangles filled with planes, the TIN gives the best possible representation of the surface Procedure start with the full DEM examine each point in turn temporarily drop the point and modify the surrounding triangles accordingly diagram find the triangle containing the dropped point measure the difference between the elevation of the point, and the elevation of the new surface at the point restore the dropped point, storing the calculated elevation difference continue the process dropping each point in turn when all the points have been dropped, remove the point which produced the least difference and start the process again Comments the TIN will likely be more accurate if the differences are measured not only for the point being dropped, but for all previously dropped points lying within the modified triangles as well, but this would be time- consuming rather than select points from the DEM, the best solution (in the sense of producing the best possible TIN for a given number of points) may be to locate TIN points at locations and elevations not in the original raster these points may be chosen from air photographs or ground surveys THE TIN MODEL HOW TO TRIANGULATE A TIN having selected a set of TIN points, these will become the vertices of the triangle network there are several ways to connect vertices into triangles "fat" triangles with angles close to 60 degrees are preferred since this ensures that any point on the surface is as close as possible to a vertex this is important because the surface representation is likely most accurate at the vertices consider the following two methods for building the triangles in practice almost all systems use the second 1. Distance ordering Procedure compute the distance between all pairs of points, and sort from lowest to highest connect the closest pair of points connect the next closest pair if the resulting line does not cross earlier lines repeat until no further lines can be selected the points will now be connected with triangles this tends to produce many skinny triangles instead of the preferred "fat" triangles 2. Delaunay triangulation by definition, 3 points form a Delaunay triangle if and only if the circle which passes through them contains no other point diagram another way to define the Delaunay triangulation is as follows: partition the map by assigning all locations to the nearest vertex the boundaries created in this process form a set of polygons called Thiessen polygons or Voronoi or Dirichlet regions overhead - Delaunay triangles from Thiessen polygons two vertices are connected in the Delaunay triangulation if their Thiessen polygons share an edge this method produces the preferred fat triangles the boundary edges on the Delaunay network form the Convex Hull, which is the smallest polygon to contain all of the vertices Procedure there are several techniques for building the triangles: 1. since the convex hull will always be part of the Delaunay network start with these edges and work inwards until the network is complete 2. connect the closest pair which by definition must be a Delaunay edge search for a third point such that no other point falls in the circle through them continue working outward from these edges for the next closest point Problems Delaunay triangles are not hierarchical they cannot be aggregated to form bigger triangles if they are divided into smaller triangles, the results tend to be poorly shaped (not "fat") D. ALTERNATIVE METHODS OF CREATING TINS Break lines methods presented above concentrate on finding TIN vertices, then connecting them with triangles a major advantage of TINs is their ability to capture breaks of slope, if edges can be aligned with known ridges or channels this requires a different approach, where "breaklines" are incorporated into the triangle network as edges after the points have been triangulated the result is generally non-Delaunay, i.e. an edge need not be an edge in the Delaunay network of the vertices this approach is now incorporated into some TIN software, e.g. the ARC/INFO TIN module TINs from contours contours are a common source of digital elevation data rather than convert from contours to a grid (DEM) and then to a TIN, it is more direct to obtain the TIN from contours directly a TIN can be created by selecting points from the digitized contour lines selection may create a triangle with three vertices on the same contour (at the same elevation) such a "flat triangle" has no defined aspect, causes problems in modeling runoff several ways of avoiding this problem have been devised E. STORING TINS there are basically two ways of storing triangulated networks: 1. Triangle by triangle 2. Points and their neighbors overhead - Storing TINs 1. Triangle by triangle in this case, a record usually contains: a reference number for the triangle the x,y,z-coordinates of the three vertices the reference numbers of the three neighboring triangles since a vertex participates in, on the average, six triangles, repetition of coordinates can be avoided by creating a separate vertex file and referencing them in the triangle files 2. Points and their neighbors the alternative is to store for every vertex: an identification number the xyz coordinates references (pointers) to the neighboring vertices in clockwise or counter-clockwise order this structure was the original TIN structure (Peucker et al, 1978) Comparison of the two structures both structures are necessary, depending on the purpose slope analysis needs the first contouring and other traversing procedures work best with the second as long as one can be extracted from the other in close to linear time (i.e., without an exhaustive search per point), either will do the second generally needs less storage space however, the savings within different TIN structures is minor compared to the reduction of points from the regular grid to the triangular network F. ALGORITHMS ON TINS Slope and aspect compared to the DEM, it is simple to find slope and aspect at some location using a TIN - we simply find the slope and aspect attributes of the containing triangle Contouring overhead - Contouring TINs Example: find the 100 m contour begin by determining, with a different algorithm, that an edge intersects the contour level of 100 meters the vertex above the contour level is the "reference point" (r) and the one below the "sub-point" (s) establish a position in memory for the pair of points which we will call present_edge computing P1, the first contour point, is then a trivial linear calculation now shift over to the traversing procedure look at the third vertex in the triangle and ask whether it is a reference or a sub-point by testing whether it is above or below the contour level the result replaces the appropriate value in present_edge and the next contour point is calculated. the vertices in present_edge represent a second triangle whose third vertex is the next candidate Finding Drainage Networks two approaches can be used to find drainage networks and watersheds: 1. treat each triangle as a discrete element as with the DEM, water is passed from one triangle to another, selecting the neighbor in the direction of steepest slope in each case diagram 2. treat the surface as a mosaic of planes two forms of flow occur - channel and overland water flows over each triangle as a continuous sheet, and collects along edges in this model, it is possible for water to collect in a "channel" between two triangles, flow to a vertex, and flow into the top of one or more triangles in this case we must allow channel flow down the line of steepest slope from the apex of these triangles diagram if there is more than one such triangle, then a bifurcation is implied, with water flowing in more than one direction from the apex, and into more than one drainage basin this is awkward, as we no longer have a clear definition of watershed diagram REFERENCES Chen, Z., and J.A. Guevara, 1987. "Systematic selection of very important points (VIP) from digital terrain models for construction triangular irregular networks," Proceedings, AutoCarto 8, ASPRS/ACSM, Falls Church, VA, pp. 50-56. A description of ESRI''s VIP approach to constructing a TIN. Fowler, R.J., and J.J. Little, 1979. "Automatic extraction of irregular network digital terrain models," Computer Graphics 13:199-207. Heller, M., 1986. "Triangulation and Interpolation of Surfaces," in R. Sieber and K. Brassel (eds), A Selected Bibliography on Spatial Data Handling: Data Structures, Generalization and Three-Dimensional Mapping, Geo- Processing Series, vol 6, Department of Geography, University of Zurich, pp 36 - 45. A good overview with literature, mainly on triangulation. Mark, D. M., 1975. "Computer Analysis of Topography: A Comparison of Terrain Storage Methods," Geografisker Annaler 57A:179-188. A quantitative comparison of regular grids and triangulated networks. Mark, D.M., 1979. "Phenomenon-Based Data-Structuring and Digital Terrain Modelling," Geo-Processing 1:27-36. A very interesting conceptual article proposing a phenomenon-based approach to data structuring. Such an approach has to involve expert knowledge of the phenomenon. Peucker, T.K., R.J. Fowler, J.J. Little and D.M. Mark, 1978. "The Triangulated Irregular Network," Proceedings, American Society of Photogrammetry: Digital Terrain Models (DTM) Symposium, St. Louis, Missouri, May 9-11, 1978, pp 516-540. The basic description of the original TIN project. DISCUSSION AND EXAM QUESTIONS 1. Argue the differences between the regular grid and the triangular net approaches. Apply the argument to the computation of slope, contouring and visibility. 2. Mark''s article in 1979 argued that the TIN model was more appropriate to the nature of certain geographical phenomena. Do you agree? For what types of landforms is TIN most and least appropriate? 3. Discuss the various methods proposed for selecting TIN vertices from a DEM, and their relative strengths and weaknesses. 4. Describe how information on directions of flow can be obtained from a TIN, and the nature of the extracted stream network. How does this compare to networks derived from DEMs? SPATIAL INTERPOLATION I A. INTRODUCTION B. CLASSIFICATION OF INTERPOLATION PROCEDURES 1. Point Interpolation/Areal Interpolation 2. Global/Local Interpolators 3. Exact/Approximate Interpolators 4. Stochastic/Deterministic Interpolators 5. Gradual/Abrupt Interpolators C. POINT BASED INTERPOLATION - EXACT METHODS 1. Proximal 2. B-splines 3. Kriging Variograms Deriving the variogram Computing the estimates 4. Manual Interpolation or "eyeballing" D. POINT BASED INTERPOLATION - APPROXIMATE METHODS 1. Trend Surface Analysis 2. Fourier Series 3. Moving average/distance weighted average REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES UNIT 40 - SPATIAL INTERPOLATION I Compiled with assistance from Nigel M. Waters, University of Calgary A. INTRODUCTION spatial interpolation is the procedure of estimating the value of properties at unsampled sites within the area covered by existing observations in almost all cases the property must be interval or ratio scaled can be thought of as the reverse of the process used to select the few points from a DEM which accurately represent the surface rationale behind spatial interpolation is the observation that points close together in space are more likely to have similar values than points far apart (Tobler''s Law of Geography) spatial interpolation is a very important feature of many GISs spatial interpolation may be used in GISs: to provide contours for displaying data graphically to calculate some property of the surface at a given point to change the unit of comparison when using different data structures in different layers frequently is used as an aid in the spatial decision making process both in physical and human geography and in related disciplines such as mineral prospecting and hydrocarbon exploration many of the techniques of spatial interpolation are two- dimensional developments of the one dimensional methods originally developed for time series analysis this unit introduces spatial interpolation and examines point based interpolation, while the next looks at areal procedures and some applications B. CLASSIFICATION OF INTERPOLATION PROCEDURES there are several different ways to classify spatial interpolation procedures: 1. Point Interpolation/Areal Interpolation point based given a number of points whose locations and values are known, determine the values of other points at predetermined locations diagram point interpolation is used for data which can be collected at point locations e.g. weather station readings, spot heights, oil well readings, porosity measurements interpolated grid points are often used as the data input to computer contouring algorithms once the grid of points has been determined, isolines (e.g. contours) can be threaded between them using a linear interpolation on the straight line between each pair of grid points point to point interpolation is the most frequently performed type of spatial interpolation done in GIS lines to points e.g. contours to elevation grids areal interpolation given a set of data mapped on one set of source zones determine the values of the data for a different set of target zones e.g. given population counts for census tracts, estimate populations for electoral districts diagram 2. Global/Local Interpolators global interpolators determine a single function which is mapped across the whole region a change in one input value affects the entire map local interpolators apply an algorithm repeatedly to a small portion of the total set of points a change in an input value only affects the result within the window global algorithms tend to produce smoother surfaces with less abrupt changes are used when there is an hypothesis about the form of the surface, e.g. a trend some local interpolators may be extended to include a large proportion of the data points in set, thus making them in a sense global the distinction between global and local interpolators is thus a continuum and not a dichotomy this has led to some confusion and controversy in the literature 3. Exact/Approximate Interpolators exact interpolators honor the data points upon which the interpolation is based the surface passes through all points whose values are known honoring data points is seen as an important feature in many applications e.g. the oil industry proximal interpolators, B-splines and Kriging methods all honor the given data points Kriging, as discussed below, may incorporate a nugget effect and if this is the case the concept of an exact interpolator ceases to be appropriate approximate interpolators are used when there is some uncertainty about the given surface values this utilizes the belief that in many data sets there are global trends, which vary slowly, overlain by local fluctuations, which vary rapidly and produce uncertainty (error) in the recorded values the effect of smoothing will therefore be to reduce the effects of error on the resulting surface 4. Stochastic/Deterministic Interpolators stochastic methods incorporate the concept of randomness the interpolated surface is conceptualized as one of many that might have been observed, all of which could have produced the known data points stochastic interpolators include trend surface analysis, Fourier analysis and Kriging procedures such as trend surface analysis allow the statistical significance of the surface and uncertainty of the predicted values to be calculated deterministic methods do not use probability theory (e.g. proximal) 5. Gradual/Abrupt Interpolators a typical example of a gradual interpolater is the distance weighted moving average usually produces an interpolated surface with gradual changes however, if the number of points used in the moving average is reduced to a small number, or even one, there would be abrupt changes in the surface it may be necessary to include barriers in the interpolation process semipermeable, e.g. weather fronts will produce quickly changing but continuous values impermeable barriers, e.g. geologic faults will produce abrupt changes SPATIAL INTERPOLATION I C. POINT BASED INTERPOLATION - EXACT METHODS Lam (1983) and Burrough (1986) describe a variety of quantitative interpolation methods suitable for computer contouring algorithms in this and the next sections, these are divided into exact and approximate methods this section deals with exact methods 1. Proximal all values are assumed to be equal to the nearest known point is a local interpolator computing load is relatively light output data structure is Thiessen polygons with abrupt changes at boundaries has ecological applications such as territories and influence zones best for nominal data although originally used by Thiessen for computing areal estimates from rainfall data is absolutely robust, always produces a result, but has no "intelligence" about the system being analyzed available in very few mapping packages, SYMAP is a notable exception 2. B-splines uses a piecewise polynomial to provide a series of patches resulting in a surface that has continuous first and second derivatives ensures continuity in: elevation (zero-order continuity) - surface has no cliffs slope (first-order continuity) - slopes do not change abruptly, there are no kinks in contours curvature (second order continuity) - minimum curvature is achieved produces a continuous surface with minimum curvature output data structure is points on a raster note that maxima and minima do not necessarily occur at the data points is a local interpolator can be exact or used to smooth surfaces computing load is moderate best for very smooth surfaces poor for surfaces which show marked fluctuations, this can cause wild oscillations in the spline are popular in general surface interpolation packages but are not common in GISs can be approximated by smoothing contours drawn through a TIN model see Burrough (1986), Davis (1986) and mathematical aspects in Lam (1983) and Hearn and Baker (1986) also described in "numerical approximation theory" 3. Kriging developed by Georges Matheron, as the "theory of regionalized variables", and D.G. Krige as an optimal method of interpolation for use in the mining industry the basis of this technique is the rate at which the variance between points changes over space this is expressed in the variogram which shows how the average difference between values at points changes with distance between points Variograms diagram De (vertical axis) is E(zi - zj)2, i.e. "expectation" of the difference i.e. the average difference in elevation of any two points distance d apart d (horizontal axis) is distance between i and j most variograms show behavior like the diagram the upper limit (asymptote) of De is called the sill the distance at which this limit is reached is called the range the intersection with the y axis is called the nugget a non-zero nugget indicates that repeated measurements at the same point yield different values in developing the variogram it is necessary to make some assumptions about the nature of the observed variation on the surface: simple Kriging assumes that the surface has a constant mean, no underlying trend and that all variation is statistical universal Kriging assumes that there is a deterministic trend in the surface that underlies the statistical variation in either case, once trends have been accounted for (or assumed not to exist), all other variation is assumed to be a function of distance Deriving the variogram the input data for Kriging is usually an irregularly spaced sample of points to compute a variogram we need to determine how variance increases with distance begin by dividing the range of distance into a set of discrete intervals, e.g. 10 intervals between distance 0 and the maximum distance in the study area for every pair of points, compute distance and the squared difference in z values assign each pair to one of the distance ranges, and accumulate total variance in each range after every pair has been used (or a sample of pairs in a large dataset) compute the average variance in each distance range plot this value at the midpoint distance of each range Computing the estimates once the variogram has been developed, it is used to estimate distance weights for interpolation interpolated values are the sum of the weighted values of some number of known points where weights depend on the distance between the interpolated and known points weights are selected so that the estimates are: unbiased (if used repeatedly, Kriging would give the correct result on average) minimum variance (variation between repeated estimates is minimum problems with this method: when the number of data points is large this technique is computationally very intensive the estimation of the variogram is not simple, no one technique is best since there are several crucial assumptions that must be made about the statistical nature of the variation, results from this technique can never be absolute simple Kriging routines are available in the Surface II package (Kansas Geological Survey) and Surfer (Golden Software), and in the GEOEAS package for the PC developed by the US Environmental Protection Agency 4. Manual Interpolation or "eyeballing" traditionally not a highly regarded method among geographers and cartographers however, Dutton-Marion (1988) has shown that among geologists this is a very important procedure and that most geologists actually distrust the more sophisticated, mathematical algorithms they feel that they can use their expert knowledge, modelling capabilities and experience and generate a more realistic interpolation by integrating this knowledge into the construction of the geological surface attempts are now being made to use knowledge engineering techniques to extract this knowledge from experts and build it into an expert system for interpolation see Unit 74 for more on this topic characteristics of this method include: procedures are local as different methods may be used by the expert on different parts of the map tend to honor data points abrupt changes such as faults are more easily modelled using these methods the surfaces are subjective and vary from expert to expert output data structure is usually in the form of a contour D. POINT BASED INTERPOLATION - APPROXIMATE METHODS 1. Trend Surface Analysis surface is approximated by a polynomial output data structure is a polynomial function which can be used to estimate values of grid points on a raster or the value at any location the elevation z at any point (x,y) on the surface is given by an equation in powers of x and y e.g. a linear equation (degree 1) describes a tilted plane surface: z = a + bx + cy e.g. a quadratic equation (degree 2) describes a simple hill or valley: z = a + bx + cy + dx2 + exy + fy2 in general, any cross-section of a surface of degree n can have at most n-1 alternating maxima and minima e.g. a cubic surface can have one maximum and one minimum in any cross-section equation for the cubic surface: z = a + bx + cy + dx2 + exy + fy2 + gx3 + hx2y + ixy2 + jy3 a trend surface is a global interpolator assumes the general trend of the surface is independent of random errors found at each sampled point computing load is relatively light problems statistical assumptions of the model are rarely met in practice edge effects may be severe a polynomial model produces a rounded surface this is rarely the case in many human and physical applications available in a great many mapping packages see Davis (1973) and Sampson (1978) for non- orthogonal polynomials; Mather (1976) for orthogonal polynomials 2. Fourier Series approximates the surface by overlaying a series of sine and cosine waves a global interpolator computing load is moderate output data structure is the Fourier series which can be used to estimate grid values for a raster or at any point best for data sets which exhibit marked periodicity, such as ocean waves rarely incorporated in computing packages simple program and discussion in Davis (1973) 3. Moving average/distance weighted average estimates are averages of the values at n known points: z = S wizi/ S wi where w is some function of distance, such as: w = 1/dk w = e-kd an almost infinite variety of algorithms may be used, variations include: the nature of the distance function varying the number of points used the direction from which they are selected is the most widely used method objections to this method arise from the fact that the range of interpolated values is limited by the range of the data no interpolated value will be outside the observed range of z values diagram other problems include: how many points should be included in the averaging? what to do about irregularly spaced points? how to deal with edge effects? REFERENCES Burrough, P.A., 1986. Principles of Geographical Information Systems for land Resources Assessment, Clarendon, Oxford. See Chapter 8. Davis, J.C., 1986. Statistics and Data Analysis in Geology, 2nd edition, Wiley, New York. (Also see the first, 1973, edition for program listings.) Dutton-Marion, K.E., 1988. Principles of Interpolation Procedures in the Display and Analysis of Spatial Data: A Comparative Analysis of Conceptual and Computer Contouring, unpublished Ph.D. Thesis, Department of Geography, University of Calgary, Calgary, Alberta. Hearn, D., and Baker, M.P., 1986. Computer Graphics, Prentice-Hall Inc, Englewood Cliffs, N.J. Jones, T.A., Hamilton, D.E. and Johnson, C.R., 1986. Contouring Geologic Surfaces with the Computer, Van Nostrand Reinhold, New York Lam, N., 1983. "Spatial Interpolation Methods: A Review," The American Cartographer 10(2):129-149. Mather, P.M., 1976. Computational Methods of Multivariate Analysis in Physical Geography, Wigley, New York. Sampson, R.J., 1978. Surface II, revised edition, Kansas Geological Survey, Lawrence, Kansas. Waters, N.M., 1988. "Expert Systems and Systems of Experts," Chapter 12 in W.J. Coffey, ed., Geographical Systems and Systems of Geography: Essays in Honour of William Warntz, Department of Geography, University of Western Ontario, London, Ontario. An important class of interpolation methods is missing here - so called radial basis functions, such as multiquadrics, thin plate spline, thin plate spline with tension, regularized spline with tension and a large number of other flavours of this approach (also sometimes refered to as variational approach). These methods are available in almost every GIS, from ArcINFO, GRASS, SURFER to specialized visualization packages. The description can be found at Mitas, L., Mitasova, H., 1999, Spatial Interpolation. In: P.Longley, M.F. Goodchild, D.J. Maguire, D.W.Rhind (Eds.), Geographical Information Systems: Principles, Techniques, Management and Applications, GeoInformation International, Wiley, 481-492. DISCUSSION AND EXAM QUESTIONS 1. Are there other techniques for surface generation? How many of the above procedures are commonly used? How would they be ranked in terms of popularity? Give examples from the literature of where they have been used. 2. How does hand contouring rate as an alternative? What did you think of it and have you changed your mind? What are the key features and processes involved in hand contouring? 3. Explain the advantages and disadvantages of manual interpolation as used in hand contouring over computer based interpolation as used in a computer contouring package. 4. Describe the different ways in which spatial interpolation algorithms can be classified. SPATIAL INTERPOLATION II A. INTRODUCTION B. AREAL INTERPOLATION - NON-VOLUME PRESERVING Procedure C. AREAL INTERPOLATION - VOLUME-PRESERVING 1. Overlay 2. Pycnophylactic Boundary conditions D. SPECIAL CASES OF SPATIAL INTERPOLATION 1. Mapping populated areas 2. Estimating trade areas E. A GIS PERSPECTIVE ON INTERPOLATION Expert systems for spatial interpolation algorithms Conclusion REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES UNIT 41 - SPATIAL INTERPOLATION II Compiled with assistance from Nigel M. Waters, University of Calgary A. INTRODUCTION this unit continues the examination of spatial interpolation by looking at areal interpolation techniques and some applications areal interpolation is the problem of transferring data from one set of areas (source reporting zones) to another (target reporting zones) this is easy if the target set is an aggregation of the source set, but more difficult if the boundaries of the target set are independent of the source set later we look at applications that do not fall easily into either point or areal interpolation categories B. AREAL INTERPOLATION - NON-VOLUME PRESERVING e.g. interpolating population counts from census tracts to school districts Procedure overhead - Non-volume preserving areal interpolation calculate the population density for each source census tract by dividing population by area identify a centroid for each region assign to the point located at each centroid, the population density value determined for its enclosing area using this set of points, interpolate a gridded population density surface using any of the methods described previously convert each grid cell''s value to a population by multiplying the estimated density by the cell''s area overlay the interpolated grid on the target map and assign each grid value to each its target region (school district) calculate the total population in each target region this method is criticized because: choosing the center point is ill-defined inadequacy of point based interpolation methods most importantly, the total value of each zone is not conserved e.g. if a source zone is divided into two target zones, the total population of the target zones after interpolation need not equal the population of the source zone SPATIAL INTERPOLATION II C. AREAL INTERPOLATION - VOLUME-PRESERVING 1. Overlay discussed by MacDougall (1976) and Goodchild and Lam (1980) procedure involves: overlay of target and source zones determining the proportion of each source zone that is assigned to each target zone apportioning the total value of the attribute for each source zone to target zones according to the areal proportions assumes uniform density of the attribute within each zone e.g. uniform population density if the attribute is total zone population 2. Pycnophylactic see Tobler (1979) for the original algorithm the technique has two objectives: 1. create a smooth surface, no steps attribute values should not change suddenly at zone boundaries 2. the total value of the attribute within each zone must be correct procedure: 1. overlay a dense raster on a choropleth map 2. divide each zone''s total value equally among the raster cells that overlap the zone 3. smooth the values by replacing each cell''s value with the average of its neighbors 4. sum the values of the cells in each zone 5. adjust the values of all cells within each zone proportionally so that the zone''s total is the same as the original total e.g., if the total is 10% low, increase the value of each cell by 10% 6. repeat steps 3, 4 and 5 until no more changes occur does not require an assumption of homogeneity within zones but rapid variation within zones may affect the quality of interpolation output is a contour or continuously shaded map Boundary conditions at the boundary of the reporting zones, pixels will have neighbors outside the study area and therefore without values some decision must be made about the behavior of the surface outside the study area e.g. population density equals zero (a lake or rural area) e.g. population density unknown, assumed equal to the values of the outermost pixels of the study area D. SPECIAL CASES OF SPATIAL INTERPOLATION 1. Mapping populated areas objective is to create a map showing "populated areas", given point population values for a number of cities and towns this problem arises frequently when populated areas are represented as points it arises for small reporting zones when boundary files are unavailable, but data includes centroid locations e.g. US or UK census data are several methods that could be used a simple approach would be to estimate the populated area using an empirical relationship like: A is proportional to p0.84 and draw a circle around the point, of radius: / (A / p) Bracken and Martin (1989) have developed methods for replacing ED centroids by disks, the radius of each disk being estimated from the distances to neighboring centroids the method works very well with UK ED data an alternative approach might proceed as follows: establish a critical population density for defining an urban area spread the population over each urban area so that population density is highest in the center and decreases gradually outwards e.g. use a normal distribution function interpolate densities to a raster, accumulating values where the population spread from two urban areas overlap draw contours at the critical value to define the boundaries of the populated areas both of these methods fall within the general heading of density estimation a density is being estimated from a collection of points see Silverman 2. Estimating trade areas in marketing, it is often desirable to plot the boundary of a trade area for e.g. a store, given information on the home locations of customers simplest case is when the location of all customers and non-customers is known simply draw a boundary contour between them if the location of non-customers is not known: 1. calculate the average distance to all customers and draw a circle or 2. divide the area into sectors, average the distance to customers within the sectors and draw a distance arc for each sector (see Huff and Batsell, 1977) these techniques do not pick up islands or holes in the trade area or 3. give each customer a small probability surface accumulate values as in the populated areas example set critical value for delimiting trade area E. A GIS PERSPECTIVE ON INTERPOLATION both point and areal interpolation try to estimate a continuous surface in the point case, the surface has been measured at sample points in the areal case, the surface of population density is estimated from total population counts in each reporting zone in other cases it is impossible to conceive of a continuous surface e.g. if each point is a city and the attribute is city population if city A has population 1 million and city B 100 km away has population 2 million, there is no reason to believe in the existence of a city half way between A and B with population 1.5 million in this case, the variable population exists only at the points, not as a continuous surface in other cases the variable might exist only along lines e.g. traffic density on a street network we must distinguish here between layer and object views of the world a continuous surface of elevations is a layer view of the world - there is one value of elevation at an infinite number of possible places in the space the point map of cities is an object view of the world - the space in between points is empty, and has no value of the population variable the street map is an object view of the world - the world is empty except where there are streets - only along streets is traffic density defined spatial interpolation implies a layer view of the world, and it requires special techniques (e.g. density estimation) to apply it to objects such as store customers Expert systems for spatial interpolation algorithms a good GIS should include a range of spatial interpolation routines so that the user can choose the most appropriate method for the data and the task ideally, these routines should provide a natural language interface which would lead the user through an appropriate series of questions about the intentions, goals and aims of the user and about the nature of the data a number of prototype expert systems for guiding the choice of a spatial interpolation algorithm have been developed these may be written in the form of: an expert system shell (Waters, 1988) in one of the artificial intelligence languages such as Prolog or LISP (see Dutton-Marion, 1988) or in a high level language such as Pascal (Maslyn, 1987) Conclusion if computer contouring and surface generation techniques are to be incorporated successfully into GIS, they must be easy to use and effective "easy to use" implies that those without a detailed knowledge of the mathematical and statistical characteristics of the procedure should be able to choose the correct technique for displaying a particular data set for a particular purpose note: statisticians argue that this is not an ideal goal as people may use techniques without a proper understanding of the underlying assumptions "effective" means that these techniques should be informative, highlighting the essential nature of the data and/or surface and serving the purpose of the researcher/analyst the researcher''s measure of success will be largely subjective and visual - does the result look right? this purpose may vary from an attempt to model all the "real" intricacies of the surface to simply trying to highlight the general, spatial trend of the data in order to aid in the decision-making process REFERENCES Bracken, I. and D. Martin, 1989. "The generation of spatial population distributions from census centroid data," Environment and Planning A 21:537-44. Dutton-Marion, K.E., 1988. Principles of Interpolation Procedures in the Display and Analysis of Spatial Data: A Comparative Analysis of Conceptual and Computer Contouring, unpublished Ph.D. Thesis, Department of Geography, University of Calgary, Calgary, Alberta. Goodchild, M.F., and Lam, N., 1980. "Areal Interpolation: A Variant of the Traditional Spatial Problem," Geo- Processing 1: 297-312. Huff, D.L. and R.R. Batsell, 1977. "Delimiting the areal extent of a market area," Journal of Marketing Research 14:581-5. MacDougall, E.B., 1976. Computer Programming for Spatial Problems, Arnold, London. Maslyn, R.M., 1987. "Gridding Advisor: An Expert System for Selecting Gridding Algorithms," Geobyte 2(4):42-43. Silverman, B.W., 1986. Density Estimation for Statistics and Data Analysis, Chapman and Hall, London. Tobler, W.R., 1979. "Smooth pycnophylactic interpolation for geographical regions," Journal of the American Statistical Association 74:519-30. Waters, N.M., 1988. "Expert Systems and Systems of Experts," Chapter 12 in W.J. Coffey, ed., Geographical Systems and Systems of Geography: Essays in Honour of William Warntz, Department of Geography, University of Western Ontario, London, Ontario. DISCUSSION AND EXAM QUESTIONS 1. What are the main considerations to be aware of in computer contouring? What are the key aspects for the design of an expert system to aid in choosing a computer contouring algorithm within a GIS? How long do you think it will be before such expert systems become widely available? 2. Describe how Tobler''s pycnophylactic method differs from volume-preserving overlay. What model of the underlying spatial distribution is assumed by each? Give examples of phenomena and application which fit each method''s assumptions. 3. Describe the application of areal interpolation in political districting. 4. One test of a spatial interpolation method is that its results would be judged as equal or better to hand contouring by a specialist, e.g. a field geologist, with detailed knowledge of the phenomenon being mapped. How well do the methods discussed in these two units do against this criterion? TEMPORAL AND THREE-DIMENSIONAL REPRESENTATIONS A. INTRODUCTION B. VERTICAL DIMENSION ("3D") Uses of 3D representations C. CHARACTER OF THE PHENOMENON Distribution Topological complexity Geometric complexity D. METHODS OF REPRESENTATION 2 1/2 dimensions True three dimensional representations Summary E. TIME DEPENDENCE Possible models Summary REFERENCES DISCUSSION AND EXAM QUESTIONS NOTES UNIT 42 - TEMPORAL AND THREE-DIMENSIONAL REPRESENTATIONS Compiled with assistance from John H. Ganter, University of Pennsylvania A. INTRODUCTION although the vast majority of GISs currently work only in two dimensions, across the plane, certain applications require the addition of other dimensions, namely time or elevation/depth most geological applications require a consideration of attributes in the vertical dimension as well as the horizontal ones temporal variations are important in many economic and social studies oceanographic and meteorological models need to consider variations both in the vertical and the temporal dimensions this unit will look briefly at how these additional dimensions can be incorporated into GISs B. VERTICAL DIMENSION ("3D") there are two very different ways of looking at representations of the vertical dimension (normally called the third dimension) in GIS the most commonly recognized is a data structure where a z value (normally elevation) is recorded as an attribute for each data point (x,y) these z values can be used in a perspective plot to create the appearance of 3 dimensions this is not true three dimensional representation and is often referred to as "2 1/2 dimensions" overhead - Perspective plot - Tiefort Mountains, CA, same data used for graphics in Unit 11 these 2 1/2D plots are an attractive way of displaying topography and other continuous surfaces from DEMs or TINs perspective plots can be computed from any viewpoint additional layers can be "draped" over the surface using color "artist''s impressions" can be created by converting classes (e.g. of land cover) to simulated trees, etc. with powerful computers, it is possible to animate 2 1/2D plots to create simulations of flying over topography "LA the Movie" was created by Jet Propulsion Lab, Pasadena CA, by draping a Landsat scene over a DEM of LA, then simulating the view from a moving aircraft true three dimensional representations store data in structures that reference locations in 3D space (x,y,z) here z is not an attribute but an element of the location of the point this permits data to be recorded at several points with equal x and y coordinates, e.g., soundings in the ocean or atmosphere, geologic logs of wells true 3D representations allow: visualization of volumes is difficult to understand volumes when they are represented by several orthographic projections modeling of volumes algorithms for spatial analysis of volumes are simpler if the data is in a volumetric form Uses of 3D representations 3D representations of spatial information have several important applications: designing major developments such as mines, quarries, dams and reservoirs geologic exploration scientific explanation of three dimensional processes such as ocean currents here don''t necessarily know what is being sought therefore, the structure of the representation can constrain the types of analyses that are performed and what is found C. CHARACTER OF THE PHENOMENON a major determinant of the type of representation used is the ''phenomenon-structure'' itself three dimensional phenomena have several characteristics: Distribution 1. Continuous present in some quantity in all places e.g. land, stratigraphic or piezometric surface similar to 2D raster representations 2. Discrete distinct objects which occupy specific locations e.g. lithology, ore bodies, tunnels, caves similar to 2D vector objects Topological complexity how the object is composed this has a major effect on the data structure used overhead - Topological complexity 1. Compound (One class) composed of identical smaller objects well casing or well log ore body composed of smaller bodies 2. Mixed (Multiple classes) composed of smaller, dissimilar objects mine composed of shafts, adits, etc. which are hierarchically-arranged either adjacent to, or wholly within each other 3. Interpenetrating (Multiple phenomena) mixed, but objects may share subsets of each other''s volumes large-scale structural geology karst features intersecting with the water table and geologic structures Geometric complexity degree to which the representation is irregular or convoluted involves questions of: Accuracy how much is required? design applications (e.g. tunneling) - must be highly accurate prospecting or science applications - general relations may be more important Precision resolution of measurement and detail of analysis often depends on scale of examination and nature of phenomenon may have to filter and generalize to reduce storage and computation burden TEMPORAL AND THREE-DIMENSIONAL REPRESENTATIONS D. METHODS OF REPRESENTATION specific approach taken is a function of: user''s needs and capabilities character, distribution and complexity of phenomenon characteristics of the data available, or the means to collect it 2 1/2 dimensions 1. Single-valued surfaces single z (elevation) value for each coordinate pair usually continuous distributions topologic complexity is low geometric complexity can be high defines a surface with no thickness usually displayed with isolines geometrically 3-D, but topologically 2-D suited to visualization and some modeling available in many mapping and statistical packages 2. Multi-valued surfaces and volumes more than one z-value for each x,y pair usually continuous distributions topological complexity is low geometric complexity can be high can be displayed with isolines may become difficult to comprehend often subjected to geostatistical analysis in prospecting and scientific research not as widely available in turnkey systems True three dimensional representations 1. Boundary representations (B-reps) overhead - B-Rep of a Cave Passage objects are defined as polyhedra bounded by planes or faces can be displayed with hidden line removal for easier comprehension each object can be represented by a number of: faces - flat planes, usually triangular (a mixture of rectangles and hexagons on the overhead) edges - define the edges of the faces (3 per triangular face) points - define the ends of the edges (2 per edge) suited to discrete objects topological complexity can be high geometric complexity can be high well suited to design, some exploration and explanation applications widely available in CAD systems the TIN is a type of B-rep, constrained to be single-valued (i.e. one value of z for every x,y) requires a powerful user-interface to construct combinatorially-complex objects each part of the B-rep (planes, edges, points) must be carefully and consistently defined for each application in order to maintain validity performance degrades rapidly with high geometric complexity 2. Spatial occupancy enumeration (SOE) overhead - SOE of a Mine/Quarry volume is divided into cubes or voxels can have on (full) or off (empty) status or, can have attribute values vertical resolution is often different (higher) from horizontal resolution, e.g. modeling the atmosphere objects can be displayed as positive (casts) or negative (molds) suited to discrete objects or continuous distributions combinatorial complexity can be very high geometric complexity can be high, within limits of voxel resolution suited to exploration and explanation applications, also analytical operations in design some systems exist for mine modeling, also medical applications usually produced by converting from B-reps (similar to converting vectors to rasters in 2D) properties like mass, volume and surface area are quickly computed as Boolean operations or voxel counts these can be indexed using octtrees (also octrees) overhead - Octree is an extension of the quadtree concept to 3 dimensions cells are numbered by starting on one level and using the same pattern as a quadtree, then moving up a level and continuing with the same pattern but numbering from 4 to 7 Summary these 3D representations are relatively new, so there is little collective experience on how to implement them in the earth sciences and engineering it may be easiest to utilize technology developed in other fields (mechanical engineering, medicine) and adapt to needs however, the needs of medical imaging are different from earth science medical imaging technology is not designed for modelling, it does not need analytical tools for abstraction and interpretation that earth science applications do medical imaging is time dependent (it is usually necessary to track moving objects between one 3D image and the next) while many earth science applications do not require this E. TIME DEPENDENCE time dependence adds a third dimension to spatial data, just as the vertical dimension does Hagerstrand (1970) has used the vertical dimension to visualize movement in human systems - movements in the plane become trajectories in three dimensions suggested overhead - Time dependence as a third dimension based on Hagerstrand model (not supplied, see for example Haggett, P., A.D. Cliff, A. Frey, 1977, Locational Models, Edward Arnold, London, p. 16) computer science deals with time dependence of records in databases records may be valid only for limited times the geographical cases are more complex - objects may have limited existence, but may also move, change shape, and change attributes similarly to the 3D case, the set of database models for time dependent data has not been fully developed Possible models 1. boundaries of reporting zones change through time since the boundaries turn on and off rather than move, the solution is to store all boundary lines which ever existed, then to reconstruct objects from the boundaries at any given time e.g. Great American History Project stored boundaries for all definitions of US counties since the early 19th Century, reconstructed counties at any Census year from selected boundary pieces (see Unit 30) 2. attributes of objects change through time - define a limited number of time "slices", and store the attributes as separate tables for each time slice if attributes are needed between time slices, interpolate 3. shapes of objects change through time define time slices, and store the objects at each slice may be difficult to identify objects from one time slice to the next because objects may coalesce or split - e.g. kelp beds in the ocean off the coast of California may be easier to avoid identifying objects, and store classified but unrelated rasters at each time - equivalent to the SOE or voxel solution to 3D data alternatively, use a 3D space with the vertical dimension as time, populated by 3D objects, e.g. the lines in Hagerstrand''s diagrams - equivalent to B- reps attach attributes to these objects if the attributes change through time, we have a problem similar to that of continuously varying attributes on transportation networks Summary the main issue is the extent to which objects should be identified - either in 2 or 3D solutions vary from one extreme of no objects to the other of fully 3D objects: no objects at all - voxels e.g. remotely sensed images objects at each time slice, but unrelated from one time to another - layers e.g. GAHP US counties objects at each time slice, related or tracked from one time to another - related layers e.g. migration data objects defined continuously in the time dimension - 3D objects e.g. individual space-time travel behavior REFERENCES Bouille, F. 1976. "A Model of Scientific Data Bank and Its Applications to Geological Data," Computers and Geosciences 2: 279-291. Carter, J.R. 1988. "Digital Representations of Topographic Surfaces: An Overview," in American Congress on Surveying and Mapping and American Society for Photogrammetry and Remote Sensing, Technical Papers 5:54-60. Ganter, J.H. 1989. "A Comparison of Representations for Complex Earth Volumes," Auto-Carto 9: Proceedings of the Ninth International Symposium on Computer-Assisted Cartography, Baltimore, MD. Hagerstrand, T., 1970. "What about people in Regional Science?" Papers, Regional Science Association 24:7-21. Discusses the concept of space-time prisms in human spatial behavior. Kavouras, M. and S. Masry, 1987. "An Information System for Geosciences: Design Considerations," Auto-Carto 8: Proceedings of the Eighth International Symposium on Computer-Assisted Cartography, ASPRS/ACSM, Falls Church, VA, pp. 282-291. Langran, G., 1989. "A review of temporal database research and its use in GIS applications," International Journal of Geographical Information Systems 3(3):215-32. Can research on time dependence in databases help in representing the effects of time in GIS data? Langran, G. and N.R. Chrisman, 1988. "A framework for temporal geographic information," Cartographica 25(3):1- 14. Discusses models for representing temporal change in GIS. Mark, D.M. and J.A. Cebrian, 1986. "Oct-trees: A Useful Data- Structure for the Processing of Topographic and Sub- Surface Data," ACSM/ASPRS Technical Papers 1:104-113. Raper, Jonathan. Three Dimensional Applications in GIS, Taylor and Francis, New York. A collection of papers on the developing technology of 3D GIS. Requicha, A.A.G. 1980. "Representations for Rigid Solids: Theory, Methods and Systems," ACM Computing Surveys 12:437-464. DISCUSSION AND EXAM QUESTIONS 1. You are directed to build a 3-D representation for (a) a civil engineer; (b) a petroleum geologist; (c) a hydrogeologist; (d) a meteorologist. What kinds of questions might you ask each specialist about their job and what they want from the representation? 2. How do the distributions considered by each of the individuals (above) compare? 3. What would be the advantages and limitations of surface representations, B-reps and SOE in different disciplines and applications? 4. Many people have assumed that the problems of moving from 2D to 3D in GIS are comparable to those of moving from 2D to time-dependent 2D. Do you agree? NEXT PAGE |