SPATIAL DATA  

Exporting GPS to GIS Format
Data formats NMEA PCX5 TRK WPT GPL GPX RINEX

Exporting GPS to GIS Format

Discussion

The next step in a GPS project is to export your collected data files to a format compatible with your organization''s GIS database. Formats may include, but are not limited to shapefile, coverage, geodatabase feature class, and AutoCAD. This section identifies GPS configuration options, processing software for GPS unit types and provides links that walk you through the export process.

Deciding on the right GPS unit and processing software from the start will dictate how many options you have in configuring exports. READ MORE >>


Capability to export to different formats allows you maximum flexibility in handing data to various agencies or other software platforms. GPS processing software varies based on the type of GPS unit you are using. See the GPS Unit & Project Needs Assessment step for more details on GPS units.

Exporting GPS data in the correct projection and datum is the most important component of any Export function. READ MORE >>


The export software should have the capability to output in the spatial coordinate projection and datum of your GIS. Since the removal of Selective Availability recreational grade GPS units have been used more for GIS mapping purposes. Keep in mind that errors from these units are introduced when using an inadequate or non-robust projection/datum transform engine. Third party software may rely on less robust datum transformations (from WGS84 - NAD27). See the Watch Outs and Links section to learn more about datum transformations.


Mapping Grade GPS receivers allow additional information, called generated attributes (values such as PDOP, correction type etc) to be transferred with the GIS data. READ MORE >>


These attributes will be stored in attribute tables alongside user defined attributes recorded during feature collection. Exporting these attributes assist in describing horizontal and vertical accuracy for the data quality section in FGDC metadata. See the Data & Process Documentation step to learn more about metadata

Spatial Data Online
Some of the data formats common to the GIS marketplace are listed below. Please note that most formats are only utilized for graphic data. Attribute data is usually handled as ASCII text files. Vendor names are supplied where appropriate.

IGDS - Interactive Graphics Design Software (Intergraph / Microstation)
This binary format is a standard in the turnkey CAD market and has become a de facto standard in Canada''s mapping industry. It is a proprietary format, however most GIS software vendors provide DGN translators.

DLG - Digital Line Graph (US Geological Survey)
This ASCII format is used by the USGS as a distribution standard and consequently is well utilized in the United States. It is not used very much in Canada even though most software vendors provide two way conversion to DLG.

DXF - Drawing Exchange Format (Autocad)
This ASCII format is used primarily to convert to/from the Autocad drawing format and is a standard in the engineering discipline. Most GIS software vendors provide a DXF translator.

GENERATE - ARC/INFO Graphic Exchange Format
A generic ASCII format for spatial data used by the ARC/INFO software to accommodate generic spatial data.

EXPORT - ARC/INFO Export Format

Spatial Data Online

Canadian Geospatial Data

Metadata GEMINI
To use online Data, be sure that
1.in what format are data provided
2when were the data produced
3.what project has been used
4.what scale has been used
5.who created the data

GIS Data Formats
GIS Data Formats

Digital Map Formats
Vector File Formats
Raster File Formats
Information Types
Software File Formats


Digital Map Formats
The term file format refers to the logical structure used to store information in a GIS file. File formats are important in part because not every GIS software package supports all formats. If you want to use a data set, but it isn’t available in a format that your GIS supports, you will have to find a way to transform it, find another data set, or find another GIS.

Almost every GIS has its own internal file format. These formats are designed for optimal use inside the software and are often proprietary. They are not designed for use outside their native systems. Most systems also support transfer file formats. Transfer formats are designed to bring data in and out of the GIS software, so they are usually standardized and well documented.

If your data needs are simple, your main concern will be with the internal format your GIS software supports. If you have complex data needs, you will want to learn about a wider range of transfer formats, especially if you want to mix data from different sources. Transfer formats will be required to import some data sets into your software.

Back to Top




Vector Formats

Many GIS applications are based on vector technology, so vector formats are the most common. They are also the most complex because there are many ways to store coordinates, attributes, attribute linkages, database structures, and display information. Some of the most common formats are briefly described below and summarized in Table 1.

Arc Export
Arc Export is a transfer format, either ASCII or compressed into binary used to transfer files between different versions of ARC/INFO. It is undocumented and will work only with ESRI products.

ARC/INFO Coverages
An ARC/INFO "coverage" is a set of internal binary files used by ARC/INFO, a GIS program. This file format is proprietary and not readily usable by other programs.

AutoCAD" Drawing Files (DWG)
DWG is the internal, proprietary format used in AutoCAD® software, which is a computer-aided design/drafting (CAD) program. Despite its proprietary nature, AutoCAD can convert any DWG file to a DXF file (described below) without loss of graphic information. As with DXF files, there are a number of ways to store attribute information in DWG files. The emerging standard is one that uses Extended Entity Data (EED) to link attributes, but many others are possible. However, the lack of one standard for linking attributes can cause problems when data is transferred between systems.

Autodesk’s Data Interchange File (DXF) Format
DXF is probably the most widely used vector data transfer format, and a file in DXF format offers some very strong advantages. It contains very complete display information, and almost every graphics program can read it. However, there are several different ways to store attribute information in DXF and to link DXF entities to external attributes. Because there are no attribute standards, many programs that claim to read DXF files still do not import attribute information properly.

Digital Line Graphs (DLG)
DLG, a transfer format used by the US Geological Survey (USGS), depicts vector information portrayed on printed paper maps. It carries very accurate coordinate information and sophisticated feature-classification information but no other attribute data. DLG does not include any display information. The DLG standard is significant because the USGS and other US government agencies have used it to publish large numbers of digital maps.

Hewlett-Packard Graphic Language (HPGL)
HPGL is a language that controls computer plotters; it contains display information but no geographic coordinates or attribute data. It is usually not appropriate for the storage or transfer of GIS data.

MapInfo" Data Transfer Files (MIF/MID).
MIF/MID is a transfer standard used by MapInfo, a desktop mapping system. It carries all three types of GIS information: geographic, attribute, and display. Attribute links are implicit in the file format.

MapInfo Map Files.
MapInfo has its own internal binary format, known as a map file. It is undocumented and proprietary, so it cannot be used outside a MapInfo system.

MicroStation Design Files (DGN).
DGN is the internal format used by Bentley Systems Inc.’s MicroStation, a CAD program. It is well documented and standardized, so it may also be used as a transfer standard. DGN files contain detailed display information. The most common way to store attributes is to place them in an external database file and record links in the MSLINK field-a data item carried for each element in the DGN file.

Spatial Data Transfer System (SDTS)
SDTS, a new transfer format developed by the US government, was designed to handle all types of geographic data. SDTS can be either binary or ASCII but is generally binary. Virtually all geographic concepts can be encoded in SDTS, including coordinate information, complex attribute information, and display information. This versatility causes a corresponding increase in complexity. To simplify things, several standard subsets of SDTS have been adopted. The first of these, the Topological Vector Profile (TVP), is used to store certain types of vector maps. SDTS can also be used for raster information. Not much data is available in SDTS format at this time, nor do many software systems support it. However, it will be the foundation of the US National Spatial Data Infrastructure (NSDI). Its importance will increase as more NSDI data becomes available.

Topologically Integrated Geographic Encoding and Referencing Files (TIGER).
TIGER is an ASCII transfer format used by the US Census Bureau to store the street maps constructed for the 1990 census. It contains complete geographic coordinates and is line, not polygon, based (although polygons can be constructed from its attribute information). The most important attributes include street name and address information. TIGER does not contain display information. Maps of the entire US are available in TIGER format.

Vector Product Format (VPF)
VPF is a binary format used by the US Defense Mapping Agency. It is well documented and can be used as an internal format and as a transfer format. It carries geographic and attribute information but no display data. VPF files are sometimes referred to as VMAP products. The Digital Chart of the World (DCW) is published in this format.

Back to Top



Raster Formats
Raster files generally are used to store image information, such as scanned paper maps or aerial photographs. They are also used for data captured by satellite and other airborne imaging systems. Images from these systems are often referred to as remote-sensing data. Unlike other raster files, which express resolution in terms of cell size and dots per inch (dpi), resolution in remotely sensed images is expressed in meters, which indicates the size of the ground area covered by each cell.

Some common raster formats are described below and summarized in Table 2.

Arc Digitized Raster Graphics (ADRG).
ADRG is a format used by the US military to store raster images of paper maps.

Band Interleaved by Line (BIL),.
Band Interleaved by Pixel (BIP), and Band Sequential (BSQ). BIL, BIP, and BSQ are formats produced by remote-sensing systems. The primary difference among them is the technique used to store brightness values captured simultaneously in each of several colors or spectral bands.

Digital Elevation Model (DEM).
DEM is a raster format used by the USGS to record elevation information. Unlike other raster file formats, DEM cells do not represent color brightness values, but rather the elevations of points on the earth’s surface.

PC Paintbrush Exchange (PCX).
PCX is a common raster format produced by most scanners and personal computer (PC) drawing programs.

Spatial Data Transfer Standard (SDTS).
As was indicated under vector formats above, SDTS is a general-purpose format designed to transfer geographic information. One SDTS variant is the raster profile, designed as a standard format for transferring raster data. However, this protocol has not as yet been finalized.


Tagged Image File Format (TIFF).
Like PCX, TIFF is a common raster format produced by PC drawing programs and scanners.

vector transfer formats +ESRI shf, HPGL, DXF,NTF,DLG, TIGER,SDTS), raster transfer formats (+GIF, JPEG, DEM)

electronic data transfer

imput: electronic data transfer

free GIS data - The GIS Data Depot

Spatial Data on the Internet
what will the data cost?
methods of attribute data checking
1. impossible values
2.extreme values
3.internal consistency
4. scattergrams
5.trend surfaces

Adding Attribute Data to Spatial Datasets

Introduction to PLTS for ArcGIS
Attributes checked before feature is created. • Two methods of validation ... GIS Data ReViewer. Running Data Checks
Spatial analysis
Errors
Missing entities
Duplicate entities
Mislocated entities
Missing labels - unidentified polygons
Duplicate labels - two labels for same polygon
Artefacts of digitizing- undershoots, overshoots,wrong placed nodes, loops and spikes, gaps, overlapping area

topological mismatch between data in different projections


Tech Contemp Environ

Ch 5 - Data input and editing

More meat for helping with your projects…

How we get the data into the computer known as "data encoding"

"GIS without data is like a car without fuel" (good analogy)

original data can come from a variety of sources - paper maps, hardcopy spreadsheets, air photos, digital images, digital data…..list goes on

but there are usually problems getting the data ready to use:

• Need common projection for the spatial data
• May need to generalize from complex to simple
• May have to add your own new data to existing data
• May have to prepare new maps

I. Methods of data input

One thing for sure - must get your data into some type of digital format, and in a digital format that the GIS can use


4 basic ways to get your data into correct digital format:
1. keyboard entry ("keycoding")
2. manual digitizing
3. automatic digitizing (incl. "scanning")
4. digital conversion


1. keycoding used to enter attribute data from paper to the computer. Once entered, use the unique ID number (Primary key) technique to join the attribute data to spatial locations.

2. Manual digitizing - quickly being replaced by on-screen digitizing. Once a very important component of GIS data entry.








TCE ch 5 p.2

3. Automatic digitizing (primarily scanning) - almost all scanners operate as raster devices….they have resolutions of ___ dots per inch (variable). they act to automatically encode or digitize each pixel. Remember your exam question asking you to encode the lake, wetland, and dryland? You were acting like a scanner, encoding each pixel….

bring up a color IR image of Fairview Lake, watch the computer slow down……raster data sets can take up large amts of space.



4. Digital conversion/transfer - as tough as the manual methods can be, no wonder that folks will try to find digital data anywhere they can. But you have to get the digital data in a format that will work….example….remember how you saved spreadsheet data as a .txt file, not in an Excel format.


Some data can be downloaded directly into a GIS format, like from GPS receivers or from total stations. This is very convenient.

"finding out what data exist, how much they will cost, where they can be found, and the format in which they are available are some of the most challenging stages…in development" what a true quote!!


II. Data editing

So you have errors in your data - now what do you do?
A. detection and correction
errors from 3 main areas:
1. source data
2. encoding errors
3. data transfer / conversion

1. source data
attribute errors not too tough to spot
spatial errors can be tough to spot

2. encoding errors
lots of them in vector, assoc with manual digitizing (Fig 5.3)
often can use editing packages to fix these
TCE ch 5 p.3

3.data transfer errors

re-projection, transformation, and generalization

• critical that all data be in same projection - things won''t overlay correctly otherwise…

book example of Mercator cylindrical proj of coastline overlain upon Albers conic proj of census boundaries - won''t fit properly

be careful - diff data sources often use diff coord systems - eg, NJ DEP is in state plane, TIGER data from ESRI in lat-long (I think)


• scale diffs from one data source to another not a big problem, but rule of thumb important here - "accuracy of final GIS output is only as good as the worst input data" - so scale can become somewhat of an issue if very general data is used with very accurate data…the general data will draw down the accuracy of the final product

• edge matching - a problem when joining maps of different vintages - Fig 5.5 shows what edge matching is all about
• rubber sheeting - often used when distorted air photos must be rectified to higher accuracy….certain points on the photo are known as control points, because they can be tied to known, accurate map features like road intersections, etc.
note that GPS can play valuable role here, because you can get accurate spatial control from the receiver right on the spots on the air photos, then you can rectify the air photo using the GPS control, rather than rectifying to a map

III. Toward an integrated database
Many different layers from many diff sources need to be combined
Table 5.4 is an example



Conclusions - 50 to 80% of GIS project time is eaten up with data encoding, editing….essentially building the database.

Your projects will reflect this same "time alligator" the minute you have to satr incorporating your own data with someone else''s…..but this effort is critical to ensure good quality data for the analyses that will follow….

QUALITY CONTROL OF SPATIAL DATA

DIGITAL ORTHOPHOTOGRAPHY AND GIS

Spatial error estimation techniques for seabed mapping
Spatial error estimation techniques for seabed mapping (Doctorate)
The aim of this research is to improve Geographical Information Systems (GIS) as a tool for marine habitat mapping by developing techniques to model and assess uncertainty in seabed vector data.
GIS is being utilised as a fundamental tool for seabed habitat mapping. The current methodologies used for data collection, classification, and communication differ so widely that seabed data are virtually incompatible between states. There is a need for the development of a common spatial framework of classification, for all mediums of marine spatial data collection, to bring GIS one step closer to becoming a more useful common tool for fine scale marine habitat mapping and reliable decision making.
The distribution of errors throughout the spatial habitat data will be explored. The relationship of the model design for survey sampling to uncertainty and reliability will be investigated in reference to the accurate positioning of habitat boundaries and classification of points. Furthermore, the integration of single beam acoustics and video to sampling designs, to determine the smallest scale in which marine habitat data can be accurately represented will be validated.
Finally, this research will look at how the results are analysed in a spatial context and how the ''uncertainty'' information is communicated for scientific and social decision making purposes

Data sources and errors - The GIS Primer - Data Sources
Data Editing and Quality Assurance

Data editing and verification is in response to the errors that arise during the encoding of spatial and non-spatial data. The editing of spatial data is a time consuming, interactive process that can take as long, if not longer, than the data input process itself.

Several kinds of errors can occur during data input. They can be classified as :

Incompleteness of the spatial data.
This includes missing points, line segments, and/or polygons.

Locational placement errors of spatial data.
These types of errors usually are the result of careless digitizing or poor quality of the original data source.

Distortion of the spatial data.
This kind of error is usually caused by base maps that are not scale-correct over the whole image, e.g. aerial photographs, or from material stretch, e.g. paper documents.

Incorrect linkages between spatial and attribute data.
This type of error is commonly the result of incorrect unique identifiers (labels) being assigned during manual key in or digitizing. This may involve the assigning of an entirely wrong label to a feature, or more than one label being assigned to a feature.

Attribute data is wrong or incomplete.
Often the attribute data does not match exactly with the spatial data. This is because they are frequently from independent sources and often different time periods. Missing data records or too many data records are the most common problems.


The identification of errors in spatial and attribute data is often difficult. Most spatial errors become evident during the topological building process. The use of check plots to clearly determine where spatial errors exist is a common practice. Most topological building functions in GIS software clearly identify the geographic location of the error and indicate the nature of the problem. Comprehensive GIS software allows users to graphically walk through and edit the spatial errors. Others merely identify the type and coordinates of the error. Since this is often a labour intensive and time consuming process, users should consider the error correction capabilities very important during the evaluation of GIS software offerings.


A variety of common data problems occur in converting data into a topological structure. These stem from the original quality of the source data and the characteristics of the data capture process. Usually data is input by digitizing. Digitizing allows a user to trace spatial data from a hard copy product, e.g. a map, and have it recorded by the computer software. Most GIS software has utilities to clean the data and build a topologic structure. If the data is unclean to start with, for whatever reason, the cleaning process can be very lengthy. Interactive editing of data is a distinct reality in the data input process.

Experience indicates that in the course of any GIS project 60 to 80 % of the time required to complete the project is involved in the input, cleaning, linking, and verification of the data

The most common problems that occur in converting data into a topological structure include :

slivers and gaps in the line work;
dead ends, e.g. also called dangling arcs, resulting from overshoots and undershoots in the line work; and
bow ties or weird polygons from inappropriate closing of connecting features.

Of course, topological errors only exist with linear and areal features. They become most evident with polygonal features. Slivers are the most common problem when cleaning data. Slivers frequently occur when coincident boundaries are digitized separately, e.g. once each for adjacent forest stands, once for a lake and once for the stand boundary, or after polygon overlay. Slivers often appear when combining data from different sources, e.g. forest inventory, soils, and hydrography. It is advisable to digitize data layers with respect to an existing data layer, e.g. hydrography, rather than attempting to match data layers later. A proper plan and definition of priorities for inputting data layers will save many hours of interactive editing and cleaning.

Dead ends usually occur when data has been digitized in a spaghetti mode, or without snapping to existing nodes. Most GIS software will clean up undershoots and overshoots based on a user defined tolerance, e.g. distance. The definition of an inappropriate distance often leads to the formation of bow ties or weird polygons during topological building. Tolerances that are too large will force arcs to snap one another that should not be connected. The result is small polygons called bow ties. The definition of a proper tolerance for cleaning requires an understanding of the scale and accuracy of the data set.

The other problem that commonly occurs when building a topologic data structure is duplicate lines. These usually occur when data has been digitized or converted from a CAD system. The lack of topology in these type of drafting systems permits the inadvertent creation of elements that are exactly duplicate. However, most GIS packages afford automatic elimination of duplicate elements during the topological building process. Accordingly, it may not be a concern with vector based GIS software. Users should be aware of the duplicate element that retraces itself, e.g. a three vertice line where the first point is also the last point. Some GIS packages do not identify these feature inconsistencies and will build such a feature as a valid polygon. This is because the topological definition is mathematically correct, however it is not geographically correct. Most GIS software will provide the capability to eliminate bow ties and slivers by means of a feature elimination command based on area, e.g. polygons less than 100 square metres. The ability to define custom topological error scenarios and provide for semi-automated correction is a desirable capability for GIS software.

The adjoining figure illustrates some typical errors described above. Can you spot them ? They include undershoots, overshoots, bow ties, and slivers. Most bow ties occur when inappropriate tolerances are used during the automated cleaning of data that contains many overshoots. This particular set of spatial data is a prime candidate for numerous bow tie polygons.

Attribute Data Errors

The identification of attribute data errors is usually not as simple as spatial errors. This is especially true if these errors are attributed to the quality or reliability of the data. Errors as such usually do not surface until later on in the GIS processing. Solutions to these type of problems are much more complex and often do not exist entirely. It is much more difficult to spot errors in attribute data when the values are syntactically good, but incorrect.

Simple errors of linkage, e.g. missing or duplicate records, become evident during the linking operation between spatial and attribute data. Again, most GIS software contains functions that check for and clearly identify problems of linkage during attempted operations. This is also an area of consideration when evaluating GIS software.



Data Verification

Six clear steps stand out in the data editing and verification process for spatial data. These are :

Visual review.
This is usually by check plotting.

Cleanup of lines and junctions.
This process is usually done by software first and interactive editing second.

Weeding of excess coordinates.
This process involves the removal of redundant vertices by the software for linear and/or polygonal features.

Correction for distortion and warping.
Most GIS software has functions for scale correction and rubber sheeting. However, the distinct rubber sheet algorithm used will vary depending on the spatial data model, vector or raster, employed by the GIS. Some raster techniques may be more intensive than vector based algorithms.

Construction of polygons.
Since the majority of data used in GIS is polygonal, the construction of polygon features from lines/arcs is necessary. Usually this is done in conjunction with the topological building process.

The addition of unique identifiers or labels.
Often this process is manual. However, some systems do provide the capability to automatically build labels for a data layer.


These data verification steps occur after the data input stage and prior to or during the linkage of the spatial data to the attributes. Data verification ensures the integrity between the spatial and attribute data. Verification should include some brief querying of attributes and cross checking against known values.

Snapping tolerance
the method when error definied in small area "snapping tolerance", and only data within here are moved in correct place
Understanding Topology and Shapefiles

Topology and Geocoding

Picture Set for editing
mismatch - do reprojection, transformation, generalization
"Rubber-Sheet", edge matching, gaps

"Rubber-Sheet"

Scanning Technology

Spatial Data ESRI

Spatial Data

Measurements in GIS – lengths, perimeters and areas

Measurements in GIS

Measurements in GIS and maps (photo set)
Analysis terminology:
Entity - An individual point, line, or area in GIS database
Attribute - Data about an entity
Feature - an object in the real world to be encoded in GIS database
Data layer - a data set for the area
Image - a data layer in the aster GIS
Cell - an individual pixel
Function or operation - a data analysis procedure perfoprme by the GIS
Algorithm - the computer implementation of a sequence of actions designed to solve a problem

Reclassificxation - "cells with values= forestry (value 10) should taken the new value of 1 Cells with value not = foresty should taken the new value of 10

Boolean operators - http://en.mimi.hu/gis/boolean_operators.html

GIS Analysis: Measurement. Distances, lengths, perimeters, areas

Understanding GIS Queries

Query lenguages

GIS Management - 12 Queries About GIS

Queries a GIS Can Answer:
WHAT exists here e.g. forest attributes municipal ownership
WHERE are specific conditions
e.g. all forest stands over 10 metres high, all houses owned by person x
WHAT has changed(over time)
e.g. areas harvested between then and now houses increased in price by > 50%
HOW are patterns related
e.g. harvested watersheds and stream water quality, traffic accidents and road surfaces
WHAT IF .. (modelling)
e.g. climate warmed by 2 degrees (habitats), avalanche took out section of road (network)

Reclassification Functions
sees http://educationally.narod.ru/gis37photoalbum.html
Value and Position

Size and Contiguity

Spatial Integrity and Boundary Configuration

Buffering in Geographic Information Systems

GIS data integration problem
In geospatial information systems (GIS), data integration is often a problem. Different systems may use different vocabularies to represent the same abstract concept, and different systems may express data values in different unit of measure (UOM). This problem may be of interest to the Semantic Web community because it’s a different kind of semantic interoperability problem.

I believe GIS data integration is not much about building knowledge representation models for things in the world or developing logical inference for reasoning about properties of things. It’s about how to detect misalignment in data representation and align functional computation to produce accurate and consistent results. An instance of the GIS data integration problem is the management of coordinate reference system (CRS

E&P GIS: Integrating E&P Data and Applications

Geographic Information Systems

map overlay

SPATIAL INTERPOLATION

Surface analysis in GIS

Performing Surface Analysis Using ARC GIS Spatial Analyst

SURFACE ANALYSIS

SURFACE ANALYSIS

ArcGIS Network Analyst
ArcGIS Network Analyst provides network-based spatial analysis including routing, travel directions, closest facility, service area origin-destination cost matrix, and vehicle routing problem analysis. ArcGIS Network Analyst helps you dynamically model realistic network conditions, including turn restrictions, speed limits, height restrictions, and traffic conditions, at different times of the day.
VECTOR GIS CAPABILITIES
. INTRODUCTION

B. SIMPLE DISPLAY AND QUERY

Display

Standard Query Language (SQL)

Boolean operators

SQL extensions for spatial queries

C. RECLASSIFY, DISSOLVE AND MERGE

Steps

Forestry example

City zoning example

D. TOPOLOGICAL OVERLAY

Point in polygon

Line on polygon

Polygon on polygon ("Polygon overlay")

Example

Spurious polygons

E. BUFFERING

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES
This unit would be illustrated well with a series of overlays of a real area which demonstrates the problems discussed here using simple examples.



UNIT 14 - VECTOR GIS CAPABILITIES

Compiled with assistance from Holly J. Dickinson, State University of New York at Buffalo


A. INTRODUCTION
- Analysis functions with vector GIS are not quite the same as with raster GIS
- More operations deal with objects
- Measures such as area have to be calculated from coordinates of objects, instead of counting cells
- Some operations are more accurate
- Estimates of area based on polygons more accurate than counts of pixels
- Estimates of perimeter of polygon more accurate than counting pixel boundaries on the edge of a zone
- Some operations are slower
- E.g. overlaying layers, finding buffers
- Some operations are faster
- E.g. finding path through road network

B. SIMPLE DISPLAY AND QUERY
Display
- Using points and "arcs" can display the locations of all objects stored
- Attributes and entity types can be displayed by varying colors, line patterns and point symbols
ARCMAP - Vector display
- May only want to display a subset of the data
- E.g. want to display areas of urban land use with some base map data
- Select all political boundaries and highways, but only areas that had urban land uses

- How would the user do this?
- E.g. one of the layers in a database is a "map" of land use, called USE
- Area objects on this layer have several attributes
- One attribute, called CLASS, identifies the area''s land use
- For urban land use, it has the value "U"
- Need to extract boundaries for all areas that have CLASS="U"

Standard Query Language (SQL)
- Different systems use different ways of formulating queries
- Standard Query Language (SQL) is used by many systems
- SQL phrase structure:
SELECT <attribute name(s)> FROM <table> WHERE <condition statement>
- E.g. SELECT FROM LANDUSE WHERE CLASS="U"
- This selects only the objects for display - no attributes are retrieved by the query
- SQL examples using a list of student names:
- SELECT name FROM list (selects all names)
- SELECT name FROM list WHERE grade = "A" (selects names of students receiving an "A")
- SELECT name FROM list WHERE cumgrade > 3.0 (selects names of students with a cumulative gpa greater than 3.0)
- SQL operators:
- RELATIONAL: >, <, =, >=, <=
- ARITHMETIC: =, -, *, / ( FOR USE ONLY ON NUMERIC FIELDS i.e. RATIO / INTERVAL)
- BOOLEAN: AND, OR, NOT, XOR
Boolean operators
- used to combine conditions
- E.g. WHERE cumulativegrade > 3.0 AND grade = "A" (selects students satisfying both conditions only)
- Boolean operators can have a spatial meaning in GIS as well
- E.g. when two maps are overlaid, areas (polygons) that are superimposed have the "and" condition
- A spatial representation is used to illustrate Boolean operators in the study of logic, through the use of diagrams called Venn diagrams
- Thus GIS area overlay is a geographical instance of a Venn diagram
Overhead - Boolean operators
- "XOR" is the "exclusive or" - A xor B means A or B but not both
SQL extensions for spatial queries
- Some systems allow specifically spatial queries to be handled under SQL
e.g. WITHIN operator
- SELECT <objects> WITHIN <specific area>

- The criteria for these spatial searches may include searching within the radius of a point, within a bounding rectangle, or within an irregular polygon

C. RECLASSIFY, DISSOLVE AND MERGE
- Reclassify, dissolve and merge operations are used frequently in working with area objects
- these are used to aggregate areas based on attributes
- consider a soils map:
- we wish to produce a map of major soil types from a layer that has polygons based on much more finely defined classification scheme
Steps
Overhead - Reclassify, dissolve and merge
1. Reclassify areas by a single attribute or some combination
- E.g. reclassify soil areas by soil type only
2. Dissolve boundaries between areas of same type
- By delete the arc between two polygons if the relevant attributes are the same in both polygons
3. Merge polygons into large objects
- Recode the sequence of line segments that connect to form the boundary (i.e. rebuild topology)
- Assign new ID #''s to each new object
Forestry example
- consider a forestry GIS where the forest is divided into "stands", average size 10 ha:
- Each stand carries a list of attributes, including tree species and average tree age
- Attributes apply homogeneously to area of each stand
- Boundary occurs between stands whenever at least one attribute changes
- Problem: identify all harvestable areas of white spruce
- Assign new attribute “harvestable” to each stand
- Value = "y" if white spruce AND age > 50 years

- Value = "n" otherwise

- After assigning new attribute, all others can be dropped
- Now wish to identify harvestable areas, each may be merger of several individual stands
- dissolve boundaries between polygons with same value of "cuttable" attribute
- merge polygons into larger objects
City zoning example
- Need to know how many individual landuse zones have been created in the city and how these are distributed geographically
- Each land parcel in the city has a zoning attribute attached to it
- Dissolve boundaries between parcels if the zoning is the same
- Result can be a map showing large areas of similar zoning classes

D. TOPOLOGICAL OVERLAY
- suppose individual layers have planar enforcement (required in many systems, not all)
- When two layers are combined ("overlayed", "superimposed") the result must have planar enforcement as well
- New intersection must be calculated and created wherever two lines cross
- A line across an area object creates two new area objects
- Topological overlay is the general name for overlay followed by planar enforcement
- Relationships are updated for the new, combined map
- Result may be information about relationships (new attributes) for the old (input) maps rather than the creation of new objects
- E.g. overlay map of school districts on census tracts
- Result is map showing every school district/census tract combination

- For each combination, the database contains an area object

- however, concern may be with obtaining the number of overlapping census tracts as a new attribute of each school district rather than with new objects themselves

Point in polygon
Overhead - Overlay - Point in polygon
- Overlay point objects on areas, compute "is contained in" relationship
- Result is a new attribute for each point
- E.g. combine wells and planning districts, find district containing each well
Line on polygon
Overhead - Overlay - Line on polygon
- Overlay line objects on area objects, compute "is contained in" relationship
- Lines are broken at each area object boundary
- Number of output lines is greater than number of input lines
- Containing area is new attribute of each output line
- E.g. combine streams and counties, find county containing each stream segment
Polygon on polygon ("Polygon overlay")
Overhead - Overlay - polygon on polygon
- Overlay two layers of area objects
- Boundaries are broken at each intersection
- Number of output areas likely greater than the total number of input areas
- E.g. input watershed boundaries, county boundaries, output map of watershed/county combinations
- After overlay we can recreate either of the input layers by dissolving and merging based on the attributes contributed by the input layer
Example
Overhead - from Unit 13, Vector database and analysis steps
- Wish to use find those areas that are the best land for timber harvesting
- After overlay, each original layer contributes attributes to the combined layer
- We get the final map by selecting the desired attributes of the combined layer
- SELECT FROM OVERLAY WHERE Species = "Jack pine" AND Soil = "C"
Spurious polygons
- During polygon overlay, many new and smaller polygons are created, some of which may not represent true spatial variations
Overhead - Sliver or spurious polygons (3 pages)
- physically overlay pages 1 and 2, page 3 shows resulting spurious polygons
- The small, invalid polygons are called spurious or sliver polygons and can be a major problem in polygon overlay
- spurious polygons arise when two lines are overlaid which are actually slightly different versions of the same line
- if the same line occurs on two input maps, the digitized versions may be slightly different
- in many cases the lines on the source maps have been compiled from different sources, but are nevertheless the same line on the ground
- E.g. a road may be part of a county boundary, also the boundary between two fields or two soil types or two vegetation types
- The problem cannot be removed by more careful digitizing - more points simply leads to more slivers
- Some GISs allow the user to set a tolerance value for deleting spurious polygons during overlay operations
diagram



- If the tolerance is set too high, some legitimate polygons may be deleted
- If set too low, some erroneous polygons will remain
- Deletion rules might also be based on shape, as spurious polygons tend to be long and thin

E. BUFFERING
- A buffer can be constructed around a point, line or area
- buffering creates a new area, enclosing the buffered object
Overhead - Buffering
- Applications in transportation, forestry, resource management
- Protected zone around lakes and streams
- Zone of noise pollution around highways
- Service zone around bus route (e.g. 300 m walking distance)
- Groundwater pollution zone around waste site
- Options available for raster, such as a "friction" layer, do not exist for vector
- buffering is much more difficult in vector from the point of view of the programmer
- Sometimes, width of the buffer can be determined by an attribute of the object
- E.g. buffering residential buildings away from a street network:
- three types of street (1, 2, 3 or major, secondary, tertiary) with the setbacks being 600 feet from a major street, 200 feet from a secondary street, and only 100 feet from a tertiary street

- Problems with buffer operations may occur when buffering very convoluted lines or areas
diagram



REFERENCES
Documentation for ARC/INFO (user manuals, Understanding GIS) provides an overview of vector GIS functionality for a commonly available system.

Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resources Assessment, Clarendon, Oxford. Chapter 5 on data analysis.

Lusardi, Frank, 1988. The Database Expert''s Guide to SQL, McGraw-Hill Book Co., New York. Good introduction to Standard Query Language

SPATIAL RELATIONSHIPS IN SPATIAL ANALYSIS
B. ANALYSIS OF ONE CLASS OF OBJECTS

Using attributes

Using locational information

C. ANALYSIS OF OBJECT PAIRS

D. ANALYSIS OF MORE THAN ONE CLASS OF OBJECTS

Shortest path example

What spatial objects are required?

Spatial interaction example

E. ANALYSIS WHICH DEFINES NEW OBJECTS

Buffer example

Street noise example

Trade area example

Polygon overlay example

F. GIS ANALYSIS FUNCTIONS

Measure

Coordinate transformation

Generate objects

Select a subset of objects

Modify attributes of objects

Dissolve and merge area objects

Generalize or smooth lines

Compute statistics for a set of objects

Topological overlay

Operations on surfaces

Network analysis

Input and output management

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES
Having established a basis of the fundamental concepts in GIS data structures, this unit begins a large module looking at how GIS can be used. First we look at how spatial relationships can be analyzed and then present a summary of the range of functions that fall within present GIS capabilities.


UNIT 15 - SPATIAL RELATIONSHIPS IN SPATIAL ANALYSIS
A. INTRODUCTION
A. INTRODUCTION
Review
- the types of spatial objects are points, lines, areas, raster cells (Unit 10)
- these object types are digital representations of phenomena
- are defined by their dimensionality
- are sometimes further subdivided
- e.g. line objects are divided into chains, strings etc. in DCDSTF

- an entity type is a type of phenomenon, e.g. church, city, highway, lake
- the same entity type may be represented by different types of objects at different scales
- e.g. a city may be a point at one scale, an area at another

- in the database, several different types of entities may be represented by the same type of object
- e.g. points may represent both cities and churches
- an object class is a group of objects of the same type, representing the same type of entity
- e.g. cities and churches are different classes of the same type of object (point)
- the number and meaning of attributes is the same for all objects in a class
- e.g. church: denomination, capacity, date of construction
- e.g. city: name, population, date of charter
- object attributes may use various measurement scales (i.e. nominal, ordinal, interval, ratio) - see Unit 6
- we think of a class of objects and its attributes as a table with rows corresponding to objects and columns corresponding to attributes
- classes of objects can be grouped into layers
- sometimes only one per layer, depending on the system
- the power of a GIS comes from its ability to store relationships among and between objects - see Unit 12
- relationships can be between objects of the same class
- more often between objects of different classes
- relationships can identify object pairs which have their own attributes
- using this framework of spatial objects and relationships, the range of analysis possible with a GIS is explored
B. ANALYSIS OF ONE CLASS OF OBJECTS
Using attributes
- need only a single attribute table
- might be any object class
- example using city neighborhoods:
- attributes include:
- population count: ratio scale

- household count: ratio scale

- average income: $000s per household, ratio scale

- name: nominal scale

- average household expenditure on automobile purchases/year: $000s, ratio scale

- GIS operations on the attribute table might include:
- very primitive forms of analysis or enquiry are possible
- list neighborhoods by average income
- simple data retrieval, print table

- list all neighborhoods with average income greater than $40,000
- select records satisfying criterion, print table

- compute the mean expenditure on automobile purchases
- requires weighting by household count in each area

- compute and print result

- look for relationship between average income and average expenditure on automobiles
- retrieve data and plot a graph

- all of these are capabilities of standard databases, e.g. DBase III
- none use GIS capabilities, no access to spatial data (locations) required
Using locational information
- make a map of average household income, shading each neighborhood accordingly
- requires locational information to plot outline of neighborhood
- shading might be determined by a function of more than one attribute, e.g. the ratio of household expenditure on automobiles to household income
- this type of capability is offered by automated mapping packages
- compute the area of each neighborhood and store it as a new attribute
- area computed from locational information (digitized outline of neighborhood)
- useful in making meaningful maps
- e.g. map population density as population divided by area rather than simply mapping population over variable sized areas

- other similar measures that can be computed include perimeter, centroid location, distance, e.g. from downtown
C. ANALYSIS OF OBJECT PAIRS
- e.g. pairs formed from each combination of neighborhoods, including neighborhoods paired with themselves
- with 5 neighborhoods have 15 combinations
- with n neighborhoods have n(n+1)/2 combinations
- example attributes:
- distance

- number of commuters in each direction

- time to travel by public transit

- draw a map of interactions
overhead - Net flows between states
- map becomes exceedingly complex if all pairs are shown
- may need to show only most important flows, e.g. to downtown
- analyze commuter trips by time of travel by public transit
- e.g. how many take over 20 mins, 40 mins, 1 hour
- produces results in tables
D. ANALYSIS OF MORE THAN ONE CLASS OF OBJECTS
- one of the major strengths of GIS analysis
Shortest path example
- find the shortest path through a street network between two places
- useful for fire truck dispatching, cabs, delivery vehicles
- navigation systems are being developed for mounting in vehicles
- able to display current locations and map of surrounding streets, follow vehicle''s path on map

- able to compute recommended route to given destination

What spatial objects are required?
- links in the network
- attributes include length
- also factors such as traffic counts, congestion, number of lanes, average speed are important in finding optimum route
- nodes in the network
- intersections allow route to move from one link to another using link-node relationships
- important attributes include presence of traffic light, overpass or underpass
- what about turn restrictions?
- turn restrictions are not attributes of links or nodes
- e.g. "no left turn" is an attribute of a pair of links - cannot turn from link A to link B

- thus a link-link object pair is needed (see Unit 12 for more on object pairs)
- some systems define a "turn table" which is equivalent to this link-link object pair

- what about stop signs?
- stop signs are not attributes of nodes but are determined by the direction of entering the node, irrespective of the exit link
- thus a link-node object pair is needed
Spatial interaction example
- useful for prediction of customer behavior
handout - Spatial interaction data
- given 3 shopping centers, represented as points
- attributes: parking spaces, number of stores, quality of signage, age of construction
- given 5 neighborhoods, represented as areas
- attributes: population count, average household income, average age
- given information on # of shopping visits by neighborhood by shopping center (information gathered by street interviews)
- these are attributes of neither shopping centers or neighborhoods, but of the object pairs
- produces 15 object pairs plus attributes, including distance
- commonly used model in this situation is a "Spatial Interaction Model" (SIM) - requires information on:
- neighborhoods (from attributes of area objects), e.g. average household income
- shopping centers (from attributes of point objects), e.g. number of parking spaces
- spatial behavior (from attributes of object pairs)
- e.g. for each object pair, divide number of trips to appropriate shopping center by population of appropriate neighborhood, plot against distance
handout - Spatial interaction modeling
E. ANALYSIS WHICH DEFINES NEW OBJECTS
- many GIS operations produce new spatial objects from old ones
- may be same or different type, e.g. points producing points or points producing areas
- new objects may have attributes of the old objects which created them
Buffer example
- build a buffer zone (area) around a stream network (a layer of line objects)
- stream layer has attributes of each stream link, including ID, discharge, length, depth
- buffer operation creates area objects
- may be:
(a) an object for each link

(b) one merged object for the entire network

- attributes of the new area object:
- in case (a) - length, ID, discharge, depth of channel in the buffer object (from link attributes)
- in case (b) - total length
Street noise example
- street is a line object with attribute "traffic count"
- apply an equation to convert "traffic count" attribute to "noise level"
- build a buffer of 500 m around line
- attach noise level attribute to the new area object
- further development:
- includes houses as point objects
- identify all houses lying in buffer ("point in polygon" operation)
- attach noise level attribute to all houses lying inside area object
- produce list of all such houses, generate mailing labels from database and mail announcements of meeting to protest noise
Trade area example
- given list of customers of shopping center, with home locations (point objects)
- create new attribute for each point giving distance to shopping center
- calculate average distance from all points
- find the "trade area" of the shopping center
- e.g. draw a circle with radius equal to the average distance to all customers
- produces an area object
- attach count of customers within trade area as an attribute of the new object
Polygon overlay example
- perhaps the most important operation in GIS
- given two classes of area objects
- e.g. two maps for the same area, one showing soil types, the other vegetation zones
- "overlay" the two classes of objects creating a new set of area objects
- every new area object has two sets of attributes - soil type (copied from the soil map) and vegetation (copied from the vegetation map)
F. GIS ANALYSIS FUNCTIONS
- functions should be defined independently of technical issues, understandable by users with little technical knowledge of GIS, independently of data model
- e.g. "buffer" - does not depend on choice of raster or vector, or require knowledge of technical detail
- functions are used to translate needs into specific GIS operations
- list of available functions is outgrowth of past GIS user needs
- emphasis on resource management applications because of strength of that market sector in last 10 years
- however, is no consensus on the possible domain of GIS, the total set of possible functions
- some GIS claim as many as 1,000 commands
- since functions and operations are defined at a higher level, each function may require several commands
overhead - GIS Analysis Functions
Measure
- results become attribute of objects
- measure length of line object
- measure area or perimeter of area object
Coordinate transformation
- results in new coordinates for points
- register map to control points, transform coordinates accordingly
- change projection, scale, coordinate system (e.g. lat/long, State Plane, military grid)
Generate objects
- by user input, e.g. mouse, digitizer tablet
- area, line, point objects
- circle around point, e.g. for query
- grid cell net, lat/long graticule
- from existing objects in the database
- buffer zones or "corridors" around points, lines, areas
- areas around points by assigning everywhere to the nearest point, producing polygons (Thiessen, Voronoi or Dirichlet polygons) - e.g. to create "trade areas"
- representative points in the middle of each area object (centroids)
Select a subset of objects
- based on attributes, or regions, or "window"
Modify attributes of objects
- by user input, e.g. keyboard
- by arithmetic based on existing attributes, e.g. find density
- by rules using relational and Boolean operators
- e.g. if white spruce and age > 50 years then new attribute is "y"
Dissolve and merge area objects
- generates new, fewer objects
Generalize or smooth lines
- reduce the complexity of a line or area boundary, or smooth it, or reduce the number of digitized points needed to represent it ("weed")
diagram




- note: generalization is a very complex topic which is covered in detail in Unit 48
Compute statistics for a set of objects
- count them
- total or average a selected attribute
- compute statistical indices, e.g. standard deviation, correlation
Topological overlay
- point in polygon
- line on polygon
- polygon overlay
- sliver polygon removal
Operations on surfaces
- mostly for topographic surfaces
- recall there are several methods for digital representation, e.g. digitized contours, grid of heights (DEM), mosaic of triangles (TIN) - Unit 11
- estimate height at a point
- find profile of surface along a line, e.g. a stream profile
- compute contours (line objects) from grid of heights, and vice versa
- compute grid of slopes, aspects
- find area objects of slope or aspect categories, e.g. slope <5%
- find watershed boundaries from DEM
- find the area visible from a point (viewshed)
Network analysis
- many types of analysis can be carried out on networks, for transportation planning, utility management, airline scheduling, navigation
- find shortest path through the network between selected points
- determine whether one point on a stream network is downstream or upstream of another
- find the parts of the network which can be reached within a given travel time from a selected point
Input and output management
- applications often consist of existing analytical packages running in conjunction with a GIS
- GIS does the "housekeeping" - handling data input and providing advanced output capabilities
- analytical package solves the problem
REFERENCES
Goodchild, M.F., 1988. "A spatial analytical perspective on GIS," International Journal of Geographical Information Systems 1:327-34. Examines the relationship between spatial analysis and GIS and discusses key issues.

Goodchild, M.F., 1988. "Towards an enumeration and classification of GIS functions," Proceedings, IGIS: The Research Agenda. NASA, Washington, DC, II:67-77. Develops categories of analy

GRAPHIC OUTPUT DESIGN ISSUES
B. LABEL PLACEMENT

Imhof''s basic rules

Overposting

Polygon labeling

Some simple methods

C. PRINCIPLES OF GRAPHICAL EXCELLENCE

Graphical excellence

D. DESIGN OF GRAPHIC OUTPUT

Scale

Base map

General graphic design

Screen display

Scene generation

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES
This unit introduces some fundamental concepts of graphic design. If you wish to do a more thorough job of this, consider combining this unit with the material in Unit 49.


UNIT 17 - GRAPHIC OUTPUT DESIGN ISSUES
A. INTRODUCTION
- previous unit described technical aspects of GIS output
- much GIS output is in the form of hard copy maps or graphic displays
- design of graphic output is critical if information is to be conveyed effectively to the user
- graphic output from GIS is often poorly designed
- e.g. colors used randomly without appropriate scaling

- conventional scale of colors used to display elevation on standard atlas maps has been optimized over centuries of cartographic experience

- design can benefit from principles of cartographic design developed in cartography
- screen display introduces a new set of issues because of greater capabilities compared with paper maps
- also see more general treatment of visualization of spatial data in Unit 49
Topics covered
- technical issues of label placement
- general principles of graphical excellence
- introduction to principles of map design
B. LABEL PLACEMENT
- features shown on maps and displays can be differentiated and identified in various ways:
- symbols, e.g. church, bridge
- colors
- sizes
- labels
- labels provide the greatest flexibility to attach descriptions to point, line and area features
- names of administrative divisions, lakes, rivers etc.
- elevations of contours, spot heights
- highway numbers
- in cartography, positioning labels is a complex and sophisticated process
- there have been few attempts to write down the rules used (Imhof, 1975 is a well-known exception)
- it has proven difficult to emulate these rules in automated map production or GIS
- positioning labels on screen displays is especially difficult because of low resolution (e.g. 640 by 480 pixels), and the importance of speed
- by comparison, a plotted map may have an effective resolution of 300 dots per inch, and an hour computing time may be acceptable

Imhof''s basic rules
- names on maps should:
- be legible
- be easily associated with the features they describe
- not overlap other map contents
- be placed so as to show the extent of the feature
- reflect the hierarchy of features by the use of different font sizes
- not be densely clustered nor evenly dispersed
- it may not be possible to satisfy all of these rules perfectly
- the best solution will balance conflicting objectives, e.g. need to associate name with feature vs. need to avoid overlap of contents
- label placement is a complex problem because of the vast number of possible positions that have to be searched and the number of conflicting objectives
- two labelling problems are particularly significant in automated mapping and GIS:
Overposting
- when features are densely packed on a map or screen, it is difficult to keep labels separated
- labels may overlap (overposting)
- labels must be positioned to avoid overposting, but without destroying the eye''s ability to associate labels with appropriate features
- e.g. point features
- optimum position for a label is above and to the right
- below and to the right is less acceptable
- least acceptable positions are to the left
- label can be turned (non-horizontal) if necessary, but only by a small amount
- overposting is a problem because the computer must search a vast number of possible positions
- in practice, must limit the number of positions somehow
- some solutions define a fixed number of possible absolute positions, like a raster
- other solutions define a fixed number of positions relative to the feature
diagram




Polygon labeling
- labelling polygons has become notorious within automated mapping as a difficult and challenging programming problem
- the label should be central to the feature, may be reoriented or curved to fit the feature
diagram




- in some cases the label may be connected with the feature by an arrow
diagram




Some simple methods
1. label centered on the polygon centroid
- problems:
- centroid may lie outside the polygon

- a long label may have to be multi-line to fit inside

- solution fails to meet Imhof''s criterion of showing the extent of the feature

2. variable rectangle positioned inside the polygon
- search for feasible positions for a rectangle wholly enclosed within the polygon
- ratio of width to height should be as high as possible
- solution will not curve the label to fit the feature
- largest enclosed rectangle may be in an inappropriate part of the polygon
diagram




3. Skeleton
- shrink the polygon by moving its edges inward at a uniform rate
- the vertices trace out a network known as the skeleton (discussed in more detail in Unit 33)
diagram



- position the label along the central part of the skeleton
- best for polygons like Florida which require curved labels
- practical labeling methods use combinations of rules for different shapes, sizes of polygons
- many developers have used the term expert system to describe label placement software
- an expert system works with complex sets of rules in a rule base
- the objective of the expert system is to emulate the complex decision process of a cartographer
C. PRINCIPLES OF GRAPHICAL EXCELLENCE
- some very broad principles apply to the design of graphic output in general (includes graphics and charts)
- the following discussion relies heavily on Tufte (1983)
Graphical excellence
- gives the viewer the greatest number of ideas, in the shortest time, with the least ink, in the smallest place
- maximize the data/ink ratio
- erase non-data ink
- erase redundant data-ink
- revise and edit the graphic
- it is difficult to get a good graphic first time around

- mobilize every graphical element, perhaps several times over, to show the data
- maximize data density and the number of data entries shown, within reason
- if the nature of the data suggests the shape of the graphic, follow that suggestion - otherwise, move toward horizontal graphics about 50% wider than tall
D. DESIGN OF GRAPHIC OUTPUT
- for GIS, graphic output must show:
- features appropriately symbolized or labeled
- objects computed by the GIS, e.g. buffer zones
- relationships
- it may be difficult to display the results of some forms of GIS data analysis because of the constraints of 2D display, e.g.:
- 3D data
- interaction data (migration, flows of goods)
- global data
Scale
- the scale of output should be consistent with input scale
- e.g. inappropriate to digitize from 1:1,000,000 map, display at 1:24,000 because data will not be sufficiently accurate
- also inappropriate to digitize at 1:24,000, display at 1:1,000,000 without adequate generalization
features will be too dense, too detailed

- scale on a CRT screen is as important as on a plotted map
- in principle a spatial database is "scale-free", but in practice scale is a crude indicator of data accuracy
- GISs should record and track scale in the database, but do not
Base map
- to be useful, a map must include information for visual locational reference
- output of computed information alone is rarely useful
need base map features as well

- e.g. map of cuttable forest stands
- needs to show locations of roads, watersheds, streams and lakes, besides cuttable stands, so user can locate stands on the ground, make decisions based on correct spatial context

- particularly important in raster systems
- display of a single layer is rarely useful without some form of basemap for locational reference

- basemap information will normally be vector, or at higher resolution than the raster

- this will be difficult if the raster system does not have vector capabilities

- input of basemap information can be expensive
- difficult to justify digitizing of data just to support interpretation of graphic output
- can plot output on top of pre-printed base map
- avoids need to digitize base map information
- base map must be accurately registered
- some GIS support this function

General graphic design
- often desirable to create good-looking finished product
- e.g. as part of professional report, presentation
- undesirable to have map look "computer-produced", excessively abstract or schematic
- high cost of providing cosmetic output functions in GIS
- e.g. map border neatlines, symbols, north arrows, legends
- complexity of programming for these features may be much greater than for analytic functions
- time to plot these features may be high, particularly for pen plotters
- some GIS map products are now almost indistinguishable in quality from manual cartography
- is appearance really important in a map drawn to support decision-making?
- GIS output maps are to be used directly, not destined for walls or map libraries
- should GIS products be simple, schematic, avoid high cost of manual cartographic quality?
- marketplace seems to say "no"
Screen display
- issues are different here because screen is:
- smaller, lower resolution than a printed or plotted map
- more flexible
- zoom, pan, interaction with user, animation, use of color

- principles of design of screen displays are still poorly developed
- black background or white?
- affects perception of color

- tradition (PC and mainframe terminals) is black background, Mac and many workstations use white

- hard copy map must display as much information as possible to satisfy possible user requirements
- because system is interactive, screen can display limited information but provide for access to more
- e.g. user "clicks" on or "picks" an object with a mouse, accesses lengthy text description

- access to an object''s attributes is not limited by constraints of static display

Scene generation
- maps show geographic variation using symbols, objects, other abstractions of reality
- GISs do not have to do this - why not show a picture of the reality? - artist''s impression?
- scene generation is set of techniques for simulating real physical appearance
- e.g. GIS is used to plan a ski area on a mountain which is currently forested
- plan could be shown as a map, with contours, green tint for remaining forest, line objects for ski lifts
- scene generation would show oblique perspective view, cover hill with trees of varying height
- current technology allows appearance of trees to be varied depending on species, age
- we are still some way from having hardware fast enough to do this in "real time"
REFERENCES
Freeman, H. and J. Ahn, 1984. "AUTONAP - an expert system for automatic map name placement," Proceedings, First International Symposium on Spatial Data Handling, Zurich.

Imhof, E., 1975. "Positioning names on maps," The American Cartographer 2(2):128-44.

Robinson, A.H., R.D. Sale, J.L. Morrison and P.C. Muehrcke, 1984. Elements of Cartography, 5th edition, Wiley, New York. Excellent source of map design principles.

Tufte, E.R., 1983. The Visual Display of Quantitative Information, Graphics Press, Cheshire, CT. Contains numerous examples of graphical excellence (and its opposite) in map design.

Zoraster, S., 1986. "Integer programming applied to the map label placement problem," Cartographica 23(3):16-27

A session on automatic names placement at AutoCarto 9, Baltimore, April, 1989 provides several reviews of the use of expert systems for map design:

Doerschler, J.S., and H. Freeman, "An expert system for dense-map name placement," pp. 215-224.

Ebinger, L.R., and A.M. Goulette, "Automated names placement in a non-interactive environment," pp. 205-214.

Johnson, D.S., and U. Basoglu, "The use of artificial intelligence in the automated placement of cartographic names," pp. 225-230.

Jones, C.B., and A.C. Cook, "Ruled-based cartographic name placement with Prolog," pp. 231-240.

THE VECTOR OR OBJECT GIS
B. "ARCS"

Storing areas

C. DATABASE CREATION

Building topology

Editing

Relationship between digitizing and editing

Edgematching

D. ADDING ATTRIBUTES

E. EXAMPLE ANALYSIS USING VECTOR GIS

Objective

Procedure

Result

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES
This unit begins a two part introduction to vector GIS. We have placed these units here since we feel this discussion benefits from an understanding of the previous introduction to spatial data concepts in Units 10 to 12. However, with a little revision, it is possible to move this module so that it follows Units 4 and 5 on raster GIS.


UNIT 13 - THE VECTOR OR OBJECT GIS

Compiled with assistance from Holly Dickinson, State University of New York at Buffalo
A. INTRODUCTION
Vector data model
- based on vectors (as opposed to space-occupancy raster structures)
- fundamental primitive is a point
- objects are created by connecting points with straight lines
- some systems allow points to be connected using arcs of circles
- areas are defined by sets of lines
- the term polygon is synonymous with area in vector databases because of the use of straight-line connections between points
overhead - Example of vector GIS data
- very large vector databases have been built for different purposes
- vector tends to dominate in transportation, utility, marketing applications
- raster and vector both used in resource management applications

B. "ARCS"
- when planar enforcement is used, area objects in one class or layer cannot overlap and must exhaust the space of a layer
- every piece of boundary line is a common boundary between two areas
- the stretch of common boundary between two junctions (nodes) has various names
- edge is favored by graph theorists, "vertex" for the junctions
- chain is the word officially sanctioned by the US National Standard
- arc is used by several systems
- arcs have attributes which identify the polygons on either side
- these are referred to as "left" and "right" by reference to the sequence in which the arc is coded
- arcs (chains/edges) are fundamental in vector GIS
Storing areas
- two ways of storing areas:
- polygon storage
- every polygon is stored as a sequence of coordinates
- although most boundaries are shared between two adjacent areas, all are input and coded twice, once for each adjacent polygon
- the two different versions of each internal boundary line may not coincide
- difficult to do certain operations, e.g. dissolve boundaries between neighboring areas and merge them
- used in some current GISs, many automated mapping packages
- arc storage
- every arc is stored as a sequence of coordinates
- areas are built by linking arcs
- only one version of each internal shared boundary is input and stored
- used in most current vector-based GISs
overhead - Database creation process

C. DATABASE CREATION
- database creation involves several stages:
- input of the spatial data
- input of the attribute data
- linking spatial and attribute data
- spatial data is entered via digitized points and lines, scanned and vectorized lines or directly from other digital sources
- once the spatial data has been entered, much work is still needed before it can be used
Building topology
- once points are entered and geometric lines are created, topology must be "built"
- this involves calculating and encoding relationships between the points, lines and areas
- this information may be automatically coded into tables of information in the database
overhead - Example of "built" topology
Editing
- during this topology generation process, problems such as overshoots, undershoots and spikes are either flagged for editing by the user or corrected automatically
- automatic editing involves the use of a tolerance value which defines the width of a buffer zone around objects within which adjacent objects should be joined
- tolerance value is related to the precision with which locations can be digitized
diagram




- these edit procedures include such functions as snap, move, delete, split, join, etc.
Relationship between digitizing and editing
- digitizing and editing are complementary activities
- poor digitizing leads to much need for editing
- good digitizing can avoid most need for editing
- both can be very labor-intensive
- the process used to digitize area objects can affect the need for later editing:
- in "blind" digitizing all linework is digitized once as "noodles" in any order
- it is unlikely that the building and cleaning operations will be able to automatically sort out area objects unambiguously from the resulting jumble
diagram




- some systems require the user to identify junctions between digitized "noodles" explicitly
- usually by touching a special button on the cursor
diagram




- mistakes in building topology are less likely
- some systems require the user to digitize each individual arc/chain separately
diagram




- much easier to sort out polygons - less need for editing
- some systems support the building of topology "on the fly"
- the system searches constantly for complete area objects as digitizing proceeds
- the user is informed by a sound or by blinking as soon as the object is detected
Edgematching
- compares and adjusts features along the edges of adjacent map sheets
- some edgematches merely move objects into alignment
- others "join" the pieces together logically - conceptually they become one object
- the user "sees" no interruption
- an edgematched database is "seamless" - the sheet edges have disappeared as far as the user is concerned

D. ADDING ATTRIBUTES
- once the objects have been formed by building topology, attributes can be keyed in or imported from other digital databases
- once added to the database, attributes must be linked to the different objects
- attributes can be linked by pointing to the appropriate object on the screen and coding its corresponding object ID into the attribute table
- unlike many raster GIS systems, attribute data is stored and manipulated in entirely separate ways from the locational data

E. EXAMPLE ANALYSIS USING VECTOR GIS
- compare with example analysis in Unit 4 (The Raster GIS)
Objective
- identify areas suitable for logging
- an area is suitable if it satisfies the following criteria:
- is Jack pine (Black Spruce are not valuable)
- is well drained (poorly drained and waterlogged terrain cannot support equipment, logging causes unacceptable environmental damage)
- is not within 500 m of a lake or watercourse (erosion may cause deterioration of water quality)
Procedure
overhead - Vector database
- database consists of three layers
- note: polygons do not entirely fill the space in each case
- hence, areas not included fall in polygon ID 0

overhead - Analysis steps
- buffer hydrography out to 500 m
- merge buffer and lake
- extract Jack pine polygons (species = Jack pine)
- extract drained soil polygons (drainage = 2, therefore soil = A)
- overlay buffer, Jack pine and soil polygons

- build topology

- extract polygons not in the buffer but in others (buffer = n, Jack pine = y, drainage = y)

Result
- loggable area shown in final map




REFERENCES
Beard, M.V. and N.R. Chrisman, 1988. "Zipping: a locational approach to edgematching," The American Cartographer 15:163-72. Describes a solution to the edgematching problem.

Chrisman, N.R., 1990. "Deficiencies of sheets and tiles: building sheetless databases," International Journal of Geographical Information Systems 4:157-67. A more general discussion of building edgematched databases.

ESRI, 1990. Understanding GIS: The ARC/INFO Way, ESRI, Redlands, CA. A general introductory tutorial for ARC/INFO, a well-known contemporary GIS.

Tomlinson, R.F., H.W. Calkins and D.F. Marble, 1976. Computer Handling of Geographical Data. UNESCO Press, Paris. Excellent semi-technical description of CGIS, an early vector-based system.

WHAT IS GIS?
. CONTRIBUTING DISCIPLINES AND TECHNOLOGIES

Geography

Cartography

Remote Sensing

Photogrammetry

Surveying

Geodesy

Statistics

Operations Research

Computer Science

Mathematics

Civil Engineering

C. MAJOR AREAS OF PRACTICAL APPLICATION

Street network-based

Natural resource-based

Land parcel-based

Facilities management

D. GIS AS A SET OF INTERRELATED SUBSYSTEMS

Data Processing Subsystem

Data Analysis Subsystem

Information Use Subsystem

Management Subsystem

MAPS and MAP ANALYSIS
WHAT IS A MAP?
Definition

Maps show more than the Earth''s surface

Cartographic abstraction

Types of maps

Thematic maps in GIS

Line maps versus photo maps

Characteristics of maps

The concept of scale

Map projections

C. WHAT ARE MAPS USED FOR?
Data display

Data stores

Spatial indexes

Data analysis tool

D. THE USE OF MAPS FOR INVENTORY AND ANALYSIS
Measuring land use change

Landscape architecture

E. AUTOMATED AND COMPUTER-ASSISTED CARTOGRAPHY
Changeover to computer mapping

Advantages of computer cartography

Disadvantages of computer cartography

GIS and Computer Cartography

F. GIS COMPARED TO MAPS
Data stores

Data indexes

Data analysis tools

Data display tools

COMPUTATIONAL BASICS FOR GIS
B. COMPUTER DATA

Binary notation

Bits and bytes

ASCII coding system

C. COMPUTER HARDWARE

Central processing unit (CPU)

Memory

Peripherals

Networks

D. DATA STORAGE

Storage media

Fixed disks

Dismountable devices

Volumes

Files

E. SOFTWARE

Programs

Operating systems

Compilers and languages

Applications programs

F. EDITORS AND WORD PROCESSORS

G. DATABASES

Functions of a database

Three types of database

H. SPREADSHEETS

I. STATISTICAL PACKAGES

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES

This unit provides a brief introduction to computer hardware and software. We have included this unit to help those who are teaching students with no computer background. However, any introductory course in the use of micro-computers is likely to have covered this material already. Binary notation is introduced here. A knowledge of the binary numbering system and conversion to decimal is needed only for Units 35, 36 and 37 but it is useful for students to be aware of this fundamental topic.

UNIT 3 - INTRODUCTION TO COMPUTERS FOR GIS
A. INTRODUCTION
- The environment in which a GIS operates is defined by:
Hardware

The machinery, including:
- A host computer

Ranging from a stand-alone microcomputer, through a range of client-server configurations to a large network supporting many users, or in special cases supercomputer centers

- Several devices for handling input and output software

- The programs that tell the computer what to do (applications)

- The data the programs will use

- This unit provides a brief overview of computer hardware and software so that students will have a basic understanding of how computers operate and will recognize some of the common computer terminology

- Important topics are covered in greater detail in later units

B. COMPUTER DATA
- Computer data is coded, manipulated and stored by use of an exclusive two-state condition
- In the English language such two-state forms of information can include yes/no, on/off, open/closed, hole/no hole

- In simple electronic terms this two-state condition can be translated for the computer into "switch open/switch closed", meaning "there is electricity passing through the circuit/there is no electricity passing through the circuit"

- Note that one of the two exclusive states always exists

If one switch provides two different datum, how much data can we obtain from two switches?
Four - there are four combinations of open and closed switches


Binary notation
- in computer terminology, this two state condition is represented in binary notation by the use of 1s and 0s
- thus, two switches produce four codes - 00, 01, 10, 11
- three switches produce eight codes - 000, 001, 010, 011, 100, 101, 110, 111
- in mathematical terms:
- 1 binary digit provides 21 = 2 alternatives
- 2 binary digits provide 22 = 4 alternatives
- 3 binary digits provide 23 = 8 alternatives
- 8 binary digits provide 28 = 256 alternatives
THE POWER OF 2
Bits and bytes
- Each binary digit is called a bit
- the complexity of computer circuitry is described in terms of the number of bits that can be transmitted simultaneously
- this is determined by the number of wires that run parallel to one another on the circuit-boards
- current PCs use 8, 16 and 32 bit paths
- a group of 8 bits is called a byte
· bytes are the standard unit of measurement of computer data
http://www.geog.ucsb.edu/%7Ekclarke/G128/Lecture05.html
http://www.ncgia.ucsb.edu/education/curricula/giscc/units/u037/u037_f.html
ASCII Coding system
American Standard Code for Information Interchange
- to maximize efficiency, most computers store data in their own internal formats
- however, transfer of data requires the use of standard codes which are understood by all systems
- the most successful standard is ASCII (pronounced ass-key)
- ASCII originated well before computer communication as a code for Teletypes
- ASCII assigns the numbers 0 through 127 to 128 characters, including the upper and lower case alphabets, numerals 0 through 9 and various special characters
- 128 different patterns can be generated using 7 bits in different combinations of on and off
- Any ASCII character can therefore be coded with 7 bits
- in practice, 8 bits (one byte) are used, the extra bit may be used to extend the code to 128 extra characters, or it simply may be redundant


BINARY NOTATION

- by using binary notation, these codes can be converted into decimal numbers
- counting from the right, the 8 bits are numbered 0 through 7, and signify as follows:
Bit: 7 6 5 4 3 2 1 0
128s 64s 32s 16s 8s 4s 2s units
- e.g. the combination 01010101 is
no 128s, one 64, no 32s, one 16, no 8s, one 4, no 2s and one unit
i.e. 64+16+4+1 = 85
- In the ASCII code system, code number 85 is an upper case U
Thus to store a U, the system stores a byte with the bit pattern 01010101
- In ASCII code, characters 0 through 32 often perform special functions
- E.g. character 7, 00000111, is the BEL character and rings a bell if received by many terminals or devices
- E.g. character 12, 00001100, is the FF character and produces a form feed (new page) if received by many printers
- Computer files which contain information coded in ASCII are easily transferred and processed by different computers and programs
- Files are often called "ASCII" or "text" or "coded" files
- ASCII characters are the dominant basis for communication between different systems, and communication with peripherals
- Files which are not ASCII are often coded in "binary" and generally can be processed or understood only by specific programs


C. COMPUTER HARDWARE
- Computers consist of several different hardware components
See How PC’s Work
Or probably more than you ever want to know at this site
Central processing unit (CPU)
- The central processing unit is the essential component of a computer because it is the part that executes the programs and controls the operation of all the hardware
- Powerful computers may have several processors handling different tasks, although there will need to be one or more central processing unit controlling the flow of instructions and data through the subsidiary processors

- CPUs of PCs are based on a series of processors or "chips" from Intel, or other vendors (Cyrix)

- High powered machines use the Pentium 3 & 4 chips

32 bit processor - Up to 2GHz and 4 gigabytes of main memory

- Macintosh CPUs are based on the 68000 series of chips from Motorola

The Power Mac G5 is currently the world’s fastest personal computer with a 64-bit processor — which means it can use up to 8 gigabytes of main memory.

http://keene.home.texas.net/macsoftware.html

http://www.geog.uni-hannover.de/grass/

Memory
- Memory stores input for and output from the CPU as well as the instructions that are followed by the CPU
The amount stored is measured in bits, bytes, Kbytes (K, Kb, 103 bytes), Megabytes (Mb, 106 bytes), Gigabytes (Gb, 109), Terabytes (Tb, 1012)
The Earth Observing System (EOS) satellite generates 17 Terabytes of data per day.

- There are two kinds of memory:
- MAIN MEMORY (or internal or primary memory) is essential for the operation of the computer, all data and instructions must be in main memory first before it can be processed by the computer
- Most costly memory
- In the form of microchips integrated with the computer''s central processor
- Fastest access - any byte can be accessed equally rapidly (random access, hence it is called RAM)
- Temporary - since data and instructions are stored in main memory as electrical voltages, power failures cause the loss of all data in main memory
- Ranges from several hundred Megabytes to 10 Gigabytes for typical PC to many Terrabytes for high end servers
- SECONDARY MEMORY (or auxiliary memory or secondary storage) is used for large, permanent or semi-permanent files
- GIS programs and data generally require very large amounts of storage
- Data storage is covered after this overview of the components of computers

Peripherals
- Peripherals refer to all the other devices attached to computers that handle input and output
- Input devices include keyboards, mice, trackballs, digitizers, and disk drives
- Output devices include screens, printers, and plotters
· Those devices important to GIS are examined in later lessons

Networks
Many computers are linked to share data and resources (hardware and software)
Client-Server architecture
Connection protocols - proprietary (E.g. Microsoft Network, Novell), TCP/IP
WAN (Wide area networks) such as the World Wide Web
LAN (Local area networks) provide specific resources to a group of users.



LOCATION-BASED SERVICES

Triggered by location, accessed by mobile devices – cellular phones, PDA’s, etc.

Provide context-based information: directions, routes, traffic conditions, advertising, sights, games, etc.

www.whereonearth.com

O2 Traffic Line

Web-based GIS
Internet Map Server
Web GIS sites
D. DATA STORAGE
Storage media
- Computers can use several different media for storing information
- needed to store both raw data and programs

- media differ by

- storage capacity

- speed of access

- permanency of storage

- mode of access

- cost

Fixed disks
- Most costly memory next to main/internal memory is fixed disk memory

- Ranges from 700 - 8000 Megabytes for typical PC to hundreds of Gigabytes in large "disk farms" RAID systems

- Random access but slower than internal memory

- Permanent (i.e. does not disappear when power is turned off), though data can be erased and modified

COMPUTATIONAL BASICS FOR GIS
Dismountable devices
- dismountable devices can be removed for storage or shipping, include:
- Removable Hard drives, Memory sticks, Flash Cards, ZIP Drives (250 Mb) Floppy diskettes 1.44 Megabytes for PC - random access
- removable hard drives E.g. Zipä and Jaz ä Drive 100 Mbyte - 1 Gigabye

- magnetic tapes and cartridges
- 10s to 100s Megabytes for standard tape

- Access is sequential, not random

- Can take minutes to reach a particular set of data on the tape, depending on where it is stored

- Compact Disks (CDs) random access, 600 Megabytes per CD Read-only memory (ROM); Recordable (WORM) Rewritable (WMRM)
- Digital Versatile (Video) Disk (DVD) 17Gbyte random access, access speeds close to CD-ROM


Volumes
- a volume is a single tape, CD, diskette or fixed disk, i.e. a physical unit of storage

Files
- a file is a logical collection of data - a table, document, program, map
- many files can be stored on a single volume
- files are given names
- the rules for naming files vary among types of systems
- the computer operating system keeps track of files stored in a volume by using a table called a directory
- files are identified in the directory by name, size, date of creation and often type of contents
- files are often organized into subdirectories so that the user can group files under specific topics

E. SOFTWARE
Programs
- a program is a sequence of related instructions, performed one step at a time by the CPU to accomplish some task
- programs determine how computers respond to input, what will be displayed and output
- there are three types of programs: operating systems, language interpreters and compilers and applications programs
Operating systems
- an operating system (OS) is the software which controls the operation of the computer from the moment it is turned on or "booted"
- the OS controls all input and output to and from the peripherals as well as the operation of other programs
- allows the user to work with and manage files without knowing specifically how the data is stored and retrieved
- in multi-user systems, operating systems manage user access to the processor and peripherals and schedule jobs
- common operating systems include:
· IBM PCs and clones use MS-Windows or -WindowsNT
- Apple maintains its own operating system
- UNIX (and similar operating systems such as LINUX) is operating system for workstations
- networks commonly use proprietary operating systems developed by their manufacturers
- although functions performed by operating systems are similar, it can be very difficult to move files or software from one to another
- many software packages run under only one operating system, or have substantially different versions for different operating systems
Compilers and languages
- since computers operate on electricity and binary operations, all instructions executed by computers must be provided to the CPU in machine code
- however, humans do not have to interact with computers at this level
- programs can be written in very specialized languages, called assemblers, which allow programmers to take advantage of the specific capabilities of particular machines by addressing the basic operations directly
- these languages are very cryptic and very difficult to use
- they are also system specific and cannot be transported from one type of computer to another
- most programs are created using standard high level languages such as C, C++, VISUAL BASIC, FORTRAN, etc., which are common across most computer systems, from micro to network
- such programs are referred to as source code
- these languages generally use English words and familiar mathematical structure
- a compiler is a program designed to convert a program written in a high level language to the machine instructions of a specific computing system or "platform"
- the output of a C compiler for the IBM PC has almost nothing in common with the output of a C compiler for a network computer
- although high level languages are generally used in the development of application packages such as GIS, it is normally compiled for specific platforms before distribution to the public
- this is done to protect the commercial interests of the developer
Applications programs
- applications programs are programs used for all purposes other than performing operating system chores or writing other programs
- includes GIS, word processors, spreadsheets, statistics packages and graphics programs, airline reservation systems, payroll systems
F. EDITORS AND WORD PROCESSORS
- are packages designed to modify or edit the contents of files
- are most often used to edit written text or programs
- editing and creation of files of numerical data is best done with the special purpose editors found in database packages or spreadsheets (see sections G and H)
- editors and word processors are ususally WYSIWYG ("what you see is what you get")
- the screen shows a picture of the contents of the file at all times
- well-known word processors for the IBM PC include Wordstar, WordPerfect and Microsoft Word
- linkage to a printer is essential so that the user can obtain "hard copy" of a file''s contents
- an editor is the most important system to learn after the operating system
- it is difficult to make much effective use of a system without one
G. DATABASES
- are packages designed to create, edit, manipulate and analyze data
- to be suitable for a database, the data must consist of records which provide information on individual cases, people, places, features, etc.
- each record may contain several fields each of which contains one item of information
- the number and interpretation of the fields must be constant for each class of records
- e.g. each record in the class of "streets" may contain fields for name, length, surface, type.

- field contents can be of many types - numeric or text, fixed or variable length
- there can be several classes of records in a database
- e.g. an airline reservation database might have the following classes of records and associated items:
passengers: name, phone, flight numbers

aircraft: type, registration number, number of seats

crew: names of pilot, copilot, cabin crew, home city

flight: number, departure and arrival times, aircraft

Functions of a database
- creating and editing records, using customized screens
- printing reports (summarizes of groups of records), using customized report forms, including subtotals and totals
- selecting records based on user-specified rules
- updating records based on new information
- linking records, e.g. to determine arrival time for a passenger by linking the passenger''s record with the correct flight record
Types of database
- Network, hierarchical, relational and Object-Oriented are different ways of modeling data within a database
- Although all four are used, the relational model has been most successful within GIS
- it is discussed at length later in the course
- well-known relational database management systems (RDBMSs) include dBase, Oracle, Info
- many of these have been used in specific GISs
- many databases use the same language, SQL (Standard Query Language), for formulating queries
H. SPREADSHEETS
- are systems which allow the user to work with numerical data in tabular form
- column and row totals, percentages etc. are automatically updated as data items are changed
- Lotus 1-2-3 is a well-known spreadsheet for the IBM PC
I. STATISTICAL PACKAGES
- offer a range of types of statistical analysis
- data is primarily numerical
- may include:
- database functions, such as editing, printing reports
- capabilities for graphic output, particularly graphs but many also produce maps
- - S-plus is a commonly available statistical package other common packages are SAS, SPSS, BMD
- available over a wide range of operating systems
- some have been "ported" to (rewritten for) the IBM PC
- numerous other packages have been developed specifically for the PC environment

REFERENCES
Maguire, D.J., 1989. Computers in Geography, John Wiley and Sons, Inc., New York.

Current reviews and comparisons of different hardware and software are published frequently, particularly for the PC environment in magazines such as Byte and PC Magazine.

Numerous texts are available at various levels of sophistication for operating systems, editors, compilers and common applications programs.

EXAM AND DISCUSSION QUESTIONS
1. Compare the data storage needs of (a) the data which will be transmitted by the EOS satellites of the 1990s, which generate approximately 1 Terabyte/day, (b) the US Bureau of the Census''s TIGER files of street networks, which amount to about 10 Gigabytes and are updated every 10 years, and (c) a database of 100 Megabytes created for use in a one-time environmental impact study

2. "User expectations about data volumes rise at least as rapidly as the capacity of available storage devices". Discuss.

3. Why do you think the computer industry has been unable to agree on a common operating system? or single source language?

4. Describe the functional differences between databases, spreadsheets and statistical packages. Which would be more useful for (a) research in a university department, (b) administrative record-keeping in a small business, (c) personal budget planning?

THE RASTER GIS
. CREATING A RASTER

Cell by cell entry

Digital data

C. CELL VALUES

Types of values

One value per cell

D. MAP LAYERS

Resolution

Orientation

Zones

Value

Location

E. EXAMPLE ANALYSIS USING A RASTER GIS

Objective

Procedure

Result

Operations used

REFERENCES

EXAM AND DISCUSSION QUESTIONS

NOTES
Although most of the material in this Curriculum is designed to be as independent as possible from specific data models, it is necessary to deal with this basic concept early so that students can start hands-on exercises with a GIS program. Following Unit 5, we return to the more fundamental concepts and do not address specific vector GIS issues until Units 13 and 14. There are other several places these topics could be placed in a course sequence. We have tried to make Units 4 and 5 as independent as possible so that you can move them within the Curriculum relatively easily.


UNIT 4 - THE RASTER GIS

Compiled with assistance from Dana Tomlin, The Ohio State University
A. THE DATA MODEL
- geographical variation in the real world is infinitely complex
- the closer you look, the more detail you see, almost without limit
- it would take an infinitely large database to capture the real world precisely
- data must somehow be reduced to a finite and manageable quantity by a process of generalization or abstraction
- geographical variation must be represented in terms of discrete elements or objects
- the rules used to convert real geographical variation into discrete objects is the data model
- Tsichritzis and Lochovsky (1977) define a data model as "a set of guidelines for the representation of the logical organization of the data in a database... (consisting) of named logical units of data and the relationships between them."
- current GISs differ according the way in which they organize reality through the data model
- each model tends to fit certain types of data and applications better than others
- the data model chosen for a particular project or application is also influenced by:
- the software available
- the training of the key individuals
- historical precedent
- there are two major choices of data model - raster and vector
overhead - Major GIS data models
- raster model divides the entire study area into a regular grid of cells in specific sequence
- the conventional sequence is row by row from the top left corner
- each cell contains a single value
- is space-filling since every location in the study area corresponds to a cell in the raster
- one set of cells and associated values is a layer
- there may be many layers in a database, e.g. soil type, elevation, land use, land cover

- vector model uses discrete line segments or points to identify locations
- discrete objects (boundaries, streams, cities) are formed by connecting line segments
- vector objects do not necessarily fill space, not all locations in space need to be referenced in the model
- a raster model tells what occurs everywhere - at each place in the area
- a vector model tells where everything occurs - gives a location to every object
- conceptually, the raster models are the simplest of the available data models
- therefore, we begin our examination of GIS data and operations with the raster model and will consider vector models after the fundamental concepts have been introduced.
B. CREATING A RASTER
- consider laying a grid over a geologic map
- create a raster by coding each cell with a value that represents the rock type which appears in the majority of that cells areas
- when finished, every cell will have a coded value
overhead - Creating a raster
- this illustrates a more complex example
- in most cases the values that are to be assigned to each cell in the raster are written into a file, often coded in ASCII
- this file can be created manually by using a word processor, database or spreadsheet program or it can be created automatically
- then it is normally imported into the GIS so that the program can reformat the data for its specific processing needs
- there are several methods for creating raster databases
Cell by cell entry
- direct entry of each layer cell by cell is simplest
- entry may be done within the GIS or into an ASCII file for importing
- each program will have specific requirements
overhead - Typical ASCII file formats used in importing
- the process is normally tedious and time-consuming
- layer can contain millions of cells
- average Landsat image is around 7.4 x 106 pixels, average TM scene is about 34.9 x 106 pixels
- run length encoding can be more efficient
- values often occur in runs across several cells
- this is a form of spatial autocorrelation - tendency for nearby things to be more similar than distant things
- data entered as pairs, first run length, then value
e.g. the array
0 0 0 1 1

0 0 1 1 1

0 0 1 1 1

0 1 1 1 1

would be entered as 3 0 2 1 2 0 3 1 2 0 3 1 1 0 4 1

- this is 16 items to enter, instead of 20

- in this case the saving is 20%, but much higher savings occur in practice

- imagine a database of 10,000,000 cells and a layer which records the county containing each pixel
- suppose there are only two counties in the area covered by the database
- each cell can have one of only two values so the runs will be very long
- only some GISs have the capability to use run length encoded files
- note: Units 35 and 36 cover run length encoding and other aspects of raster storage in more detail
Digital data
- much raster data is already in digital form, as images, etc.
- however, resampling will likely be needed in order that pixels coincide in each layer
- because remote sensing generates images, it is easier to interface with a raster GIS than any other type
- elevation data is commonly available in digital raster form from agencies such as the US Geological Survey
C. CELL VALUES
Types of values
- the type of values contained in cells in a raster depend upon both the reality being coded and the GIS
- different systems allow different classes of values, including:
overhead - Raster data values
- whole numbers (integers)
- real (decimal) values
- alphabetic values
- many systems only allow integers, others which allow different types restrict each separate raster layer to a single kind of value
- if systems allow several types of values, e.g. some layers numeric, some non-numeric, they should warn the user against doing unreasonable operations
- e.g. it is unreasonable to try to multiply the values in a numeric layer with the values in a non-numeric layer
- integer values often act as code numbers, which "point" to names in an associated table or legend
- e.g. the first example might have the following legend identifying the name of each soil class:
0 = "no class"

1 = "fine sandy loam"
2 = "coarse sand"
3 = "gravel"
One value per cell
- each pixel or cell is assumed to have only one value
- this is often inaccurate - the boundary of two soil types may run across the middle of a pixel
- in such cases the pixel is given the value of the largest fraction of the cell, or the value of the middle point in the cell
- note, however, a few systems allow a pixel to have multiple values
- the NARIS system developed at the University of Illinois in the 1970s allowed each pixel to have any number of values and associated percentages
- e.g. 30% a, 30% b, 40% c
D. MAP LAYERS
- the data for an area can be visualized as a set of maps of layers
- a map layer is a set of data describing a single characteristic for each location within a bounded geographic area
- only one item of information is available for each location within a single layer - multiple items of information require multiple layers
- on the other hand, a topographic map can show multiple items of information for each location, within limits
- e.g. elevation (contours), counties (boundaries), roads, railroads, urbanized areas (grey tint)

- these would be 5 layers in a raster GIS

- typical raster databases contain up to a hundred layers
- each layer (matrix, lattice, raster, array) typically contains hundreds or thousands of cells
- important characteristics of a layer are its resolution, orientation and zone(s)
Resolution
- in general, resolution can be defined as the minimum linear dimension of the smallest unit of geographic space for which data are recorded
- in the raster model the smallest units are generally rectangular (occasionally systems have used hexagons or triangles)
- these smallest units are known as cells, pixels
- note: high resolution refers to rasters with small cell dimensions
- high resolution means lots of detail, lots of cells, large rasters, small cells
Orientation
- the angle between true north and the direction defined by the columns of the raster
Zones
- each zone of a map layer is a set of contiguous locations that exhibit the same value
- these might be:
- ownership parcels
- political units such as counties or nations
- lakes or islands
- individual patches of the same soil or vegetation type
overhead - Example raster database
- there is considerable confusion over terms here
- other terms commonly used for this concept are patch, region, polygon
- each of these terms, however, have different meanings to individual users and different definitions in specific GIS packages
- in addition, there is a need for a second term which refers to all individual zones that have the same characteristics
- class is often used for this concept
diagram



- note that not all map layers will have zones, cell contents may vary continuously over the region making every cell''s value unique
- e.g. satellite sensors record a separate value for reflection from each cell
- major components of a zone are its value and location(s)
Value
- is the item of information stored in a layer for each pixel or cell
- cells in the same zone have the same value
Location
- generally location is identified by an ordered pair of coordinates (row and column numbers) that unambiguously identify the location of each unit of geographic space in the raster (cell, pixel, grid cell)
- usually the true geographic location of one or more of the corners of the raster is also known
E. EXAMPLE ANALYSIS USING A RASTER GIS
Objective
- identify areas suitable for logging
- an area is suitable if it satisfies the following criteria:
- is Jackpine (Black Spruce are not valuable)
- is well drained (poorly drained and waterlogged terrain cannot support equipment, logging causes unacceptable environmental damage)
- is not within 500 m of a lake or watercourse (erosion may cause deterioration of water quality)
Procedure
overheads - Example project steps (1 page) and details (3 pages)
- recode layer 2 as follows, creating layer 4
- y if value 2 (Jackpine)
- n if other value
- recode layer 3 as follows, creating layer 5
- y if value 2 (good)
- n if other value
- spread the lake on layer 1 by one cell (500 m), creating layer 6
- recode the spread lake on layer 6 as follows, creating layer 7
- n if in spread lake
- y if not
- overlay layers 4 and 5 to obtain layer 8, coding as follows
- y if both 4 and 5 are y
- n otherwise
- overlay layers 7 and 8 to obtain layer 9, coding as follows
- y if both 7 and 8 are y
- n otherwise
Result
- the loggable cells are y on layer 9
Operations used
- recode
- overlay
- spread
- we could have achieved the same result using the operations in other sequences, or by combining recode and overlay operations
- e.g. overlay layers 2 and 3, coding as follows
- y if layer 2 is 2 and layer 3 is 2, n otherwise

- this would replace two recodes and an overlay

- e.g. some systems allow layers to be overlaid 3 or more at a time
- the names given to operations vary from system to system, but most of the operations themselves are common across systems

RASTER GIS CAPABILITIES
B. DISPLAYING LAYERS

a) Basic display

b) Other types of display

3) C. LOCAL OPERATIONS

a) Recoding

b) Overlaying layers

4) D. OPERATIONS ON LOCAL NEIGHBORHOODS (FOCAL - Tomlin)

a) Filtering

b) Slopes and aspects

5) E. OPERATIONS ON EXTENDED NEIGHBORHOODS

a) Distance

b) Buffer zones

c) Visible area or "viewshed"

6) F. OPERATIONS ON ZONES (GROUPS OF PIXELS)

a) Identifying zones

b) Areas of zones

c) Perimeter of zones

d) Distance from zone boundary

e) Shape of zone

7) G. COMMANDS TO DESCRIBE CONTENTS OF LAYERS

a) One layer

b) More than one layer

c) Zones on one layer

8) H. ESSENTIAL HOUSEKEEPING



9) REFERENCES

10) EXAM AND DISCUSSION QUESTIONS

NOTES
This unit continues the overview of raster GIS. If possible, we suggest that you replace and/or supplement the graphics provided with this unit with graphics generated by the raster program your students will be using in their labs. Alternatively, the best way to illustrate this unit may be through the use of a laboratory demonstration.

Consider providing handouts to the students that summarize the commands for the raster GIS program you will be using in labs. Check your program''s manual for a command summary or do a screen dump of the appropriate help screen if there is one.

UNIT 5 - RASTER GIS CAPABILITIES
Compiled with assistance from Micha Pazner, University of Manitoba

A. INTRODUCTION
A raster GIS must have capabilities for:
- Input of data
- Various housekeeping functions
- Operations on layers, like those encountered in the previous unit - recode, overlay and spread
- Integration with vector GIS operations
- Output of data and results
- The range of possible functions is enormous, current raster GISs only scratch the surface
- Because the range is so large, some have tried to organize functions into a consistent scheme, but no scheme has been widely accepted yet
- The unit covers a selection of the most useful and common
- Each raster GIS uses different names for the functions
IDRISI is a commonly used and powerful raster based GIS developed by Dr. Eastman at Clark University in Worcester, MASS. It is now a commercial success.

IDRISI TUTORIAL on line at Univ. British Columbia

ArcGIS Spatial Analyst provides powerful tools for comprehensive, raster-based spatial modeling and analysis
Find suitable locations
Calculate the accumulated cost of traveling from one point to another
Perform land use analysis
Predict fire risk
Analyze transportation corridors
Determine pollution levels
Perform crop yield analysis
Determine erosion potential
Perform demographic analysis
Conduct risk assessments
Model and visualize crime patterns

ESRI Clips
B. DISPLAYING LAYERS
Basic display
- The simplest type of values to display are integers
- On a color display each integer value can be assigned a unique color
- There must be as many colors as integers
- If the values have a natural order we will want the sequence of colors to make sense
- E.g. elevation is often shown on a map using the sequence blue-green-yellow-brown-white for increasing elevation
- There should be a legend explaining the meaning of each color
- The system should generate the legend automatically based on the descriptions of each value stored with the data layer
IDRISI TUTOR display.htm - Simple display (IDRISI)
- On a dot matrix or laser printer shades of grey can be generated by varying the density of dots
- If there are too many values for the number of colors, may have to recode the layer before display
Other types of display
- It may be appropriate to display the data as a surface
- Contours can be "threaded" through the pixels along lines of constant value
- The searching operation for finding contours is computer-intensive so may be slow
- The surface can be shown in an oblique, perspective view
FIGURE - Perspective view
- This can be done by drawing profiles across the raster with each profile offset and hidden lines removed
- The surface might be colored using the values in a second layer (a second layer can be "draped" over the surface defined by the first layer)
- The result can be very effective
- FLY Overs -- "LA The Movie" was produced by Jet Propulsion Lab by draping a Landsat image of Los Angeles over a layer of elevations, then simulating the view from a moving aircraft we''ve come along way in 15 years - Google Earth and Microsoft''s Virtual Earth - common place.

- These operations are also computer-intensive because of the calculations necessary to simulate perspective and remove hidden lines

C. LOCAL OPERATIONS
- Produce a new layer from one or more input layers
- The value of each new pixel is defined by the values of the same pixel on the input layer(s)
- Neighboring or distant pixels have no effect
- Note: arithmetic operations make no sense unless the values have appropriate scales of measurement (see Unit 6)
- You cannot find the "average" of soils types 3 and 5, nor is soil 5 "greater than" soil 3

Recoding / reclassing
- using only one input layer
- Examples:
1. Assign a new value to each unique value on the input layer
- Useful when the number of unique input values is small

2. Assign new values by assigning pixels to classes or ranges based on their old values
- E.g. 0-499 becomes 1, 500-999 becomes 2, >1000 becomes 3
- Useful when the old layer has different values in each cell, e.g. elevation or satellite images

3. Sort the unique values found on the input layer and replace by the rank of the value
- E.g. 0, 1, 4, 6 on input layer become 1, 2, 3, 4 respectively
- Applications: assigning ranks to computed scores of capability, suitability etc.
- Some systems allow a full range of mathematical operations
- E.g. newvalue = (2*oldvalue + 3) 2

Overlaying layers
- An overlay occurs when the output value depends on two or more input layers
- Many systems restrict overlay to two input layers only
- Examples:
1. Output value equals arithmetic average of input values
2. Output value equals the greatest (or least) of the input values
3. Layers can be combined using arithmetic operations
- x and y are the input layers, z is the output
- Some more examples:
Z = X + Y

Z = X * Y

Z = X / Y

4. Combination using logical conditions
- E.g. if y>0, then z = y , otherwise z = x
- Note: in many raster packages logical conditions cannot be done directly from input layers
- must first create reclassified input images so that cells have 0 if they do not meet the condition and 1 if they do

Boolean logical operations on rasters (2 pages)
5. Assign a new value to every unique combination of input values
- E.g. LAYER 1 LAYER 2 OUTPUT LAYER

1 A 1

1 B 2

2 A 3

2 B 4
etc.


D. OPERATIONS ON LOCAL NEIGHBORHOODS
- the value of a pixel on the new layer is determined by the local neighborhood of the pixel on the old layer
Filtering
- A filter operates by moving a "window" across the entire raster
- E.g. many windows are 3x3 cells
- The new value for the cell at the middle of the window is a weighted average of the values in the window
- By changing the weights we can produce two major effects:

- Smoothing -- a "low pass" filter, removes or reduces local detail
- Edge enhancement -- a "high pass" filter, exaggerates local detail

- Weights should add to 1
- Example filters:
1)
0.11 0.11 0.11
0.11 0.11 0.11
0.11 0.11 0.11
- Replaces each value by the simple unweighted average of it and its eight neighboring values
- Severely smoothes the spatial variation on the layer
2)
0.05 0.05 0.05
0.05 0.60 0.05
0.05 0.05 0.05
- Gives the pixel''s old value 12 times the weight of its neighboring values
- Slightly smoothes the layer

3)
-0.1 -0.1 -0.1
-0.1 1.8 -0.1
-0.1 -0.1 -0.1
- Slightly enhances local detail by giving neighbors negative weights

Spatial filtering
- Filters can be useful in enhancing detail on images for input to GIS, or smoothing layers to expose general trends
Three examples from IDRISI

1.Smoothing - low pass filters
G:GEOG250idrtutors_tools3.htm#lowpass

2.Edge enhancing, edge detecting - high pass filters
G:GEOG250idrtutors_tools3.htm#highpass

3.Directional filters

Enhance or detect directional structures in the filtered images.

LAPISELMES$GEOG250idrtutors_tools3.htm - directional



Slopes and aspects
- If the values in a layer are elevations, we can compute the steepness of slopes by looking at the difference between a pixel''s value and those of its adjacent neighbors
- The direction of steepest slope, or the direction in which the surface is locally "facing", is called its aspect
- Aspect can be measured in degrees from North or by compass points - N, NE, E etc. (Cyclic level of measurement)
- Slope and aspect are useful in analyzing vegetation patterns, computing energy balances and modeling erosion or runoff
- Aspect determines the direction of runoff
- This can be used to sketch drainage paths for runoff

Computing Slope and Aspect in IDRISI with SURFACE

http://gis01.ame.umontreal.ca/APA/6237/idrtutor/s_tools4.htm



E. OPERATIONS ON EXTENDED NEIGHBORHOODS
Distance
- calculate the distance of each cell from a cell or the nearest of several cells
- each pixel''s value in the new layer is its distance from the given cell(s)

Buffer zones
- Buffers around objects and features are very useful GIS capabilities
- E.g. build a logging buffer 500 m wide around all lakes and watercourses
- Buffer operations can be visualized as spreading the object spatially by a given distance
- The result could be a layer with values:
1 if in original selected object
2 if in buffer
0 if outside object and buffer

- Applications include noise buffers around roads, safety buffers around hazardous facilities
- in many programs the buffer operation requires the user to first do a distance operation, then a reclassification of the distance layer
- The rate of spreading may be modified by another layer representing "friction"
- E.g. the friction layer could represent varying cost of travel
- This will affect the width of the buffer - narrow in areas of high friction, etc.

Visible area or "viewshed"
- Given a layer of elevations, and one or more viewpoints, compute the area visible from at least one viewpoint
- E.g. value = 1 if visible, 0 if not
- useful for planning locations of unsightly facilities such as smokestacks, or surveillance facilities such as fire towers, or transmission facilities

F. OPERATIONS ON ZONES (GROUPS OF PIXELS)
Identifying zones
- By comparing adjacent pixels, identify all patches or zones having the same value
- Give each such patch or zone a unique number
- Set each pixel''s value to the number of its patch or zone

Areas of zones
- Measure the area of each zone and assign this value to each pixel instead of the zone''s number
- Alternatively output may be in the form of a summary table sent to the printer or a file

Perimeter of zones
- Measure the perimeter of each zone and assign this value to each pixel instead of the zone''s number
- Alternatively output may be in the form of a summary table sent to the printer or a file
- Length of perimeter is determined by summing the number of exterior cell edges in each zone
- Note: the values calculated in both area and perimeter are highly dependent upon the orientation of objects (zones) with respect to the orientation of the grid
Overhead - Area and perimeter functions in rasters
- However, if boundaries in the study area do not have a dominant orientation such errors may cancel out

Distance from zone boundary
- Measure the distance from each pixel to the nearest part of its zone boundary, and assign this value to the pixel
- Boundary is defined as the pixels which are adjacent to pixels of different values

Shape of zone
- Measure the shape of the zone and assign this to each pixel in the zone
- One of the most common ways to measure shape is by comparing the perimeter length of a zone to the square root of its area
- by dividing this number by 3.54 we get a measure which ranges from 1 for a circle (the most compact shape possible) to 1.13 for a square to large numbers for long, thin, wiggly zones
- Commands like this are important in landscape ecology
- Helpful in studying the effects of geometry and spatial arrangement of habitat
- E.g. size and shape of woodlots on the animal species they can sustain

- E.g. value of linear park corridors across urban areas in allowing migration of animal species



G. COMMANDS TO DESCRIBE CONTENTS OF LAYERS
- Important to have ways of describing a layer''s contents
- Particularly new layers created by GIS operations
- Particularly in generating results of analysis

One layer
- generate statistics on a layer
- e.g. mean, median, most common value, other statistics

More than one layer
- Compare two maps statistically
- E.g. is pattern on one map related to pattern on the other?
- E.g. chi-square test, regression, analysis of variance

Zones on one layer
- Generate statistics for the zones on a layer
- E.g. largest, smallest, number, mean area

H. ESSENTIAL HOUSEKEEPING
- List available layers
- Input, copy, rename layers
- Import and export layers to and from other systems
- Other raster GIS
- Input of images from remote sensing system
- Other types of GIS
- Identify resolution, orientation
- "Resample"
- Changing cell size, orientation, portion of raster to analyze
- Change colors
- Provide help to the user
- Exit from the GIS (the most important command of all!)
CARTOGRAPHIC MODELING EXAMPLE
Harvard Graduate School of Design

REFERENCES
Berry, J.K., 1987. "Fundamental operations in computer-assisted map analysis," International Journal of Geographical Information Systems 1:119-136. Describes a logical and consistent way of classifying and grouping raster GIS functions.

Burrough, P.A., 1986. Principles of Geographical Information Systems for Land Resource Assessment, Clarendon, Oxford. Chapter 5 is a comprehensive review of raster GIS.

Star, J.L. and J.E. Estes, 1990. Geographic Information Systems: An Introduction, Prentice Hall. A comprehensive text on GIS, with excellent treatment of raster systems.

Tomlin, C.D., 1990. Geographic Information Systems and Cartographic Modeling, Prentice-Hall, Englewood Cliffs, NJ. A comprehensive approach to analysis and modeling using raster systems - an excellent introduction to GIS-based
Hosted by uCoz