Data Imput 

B. DIGITIZERS
- digitizers are the most common device for extracting spatial information from maps and photographs
- the map, photo, or other document is placed on the flat surface of the digitizing tablet
Hardware
the position of an indicator as it is moved over the surface of the digitizing tablet is detected by the computer and interpreted as pairs of x,y coordinates
CALCOMP
- the indicator may be a pen-like stylus or a cursor (a small flat plate the size of a hockey puck with a cross-hair)
- frequently, there are control buttons on the cursor which permit control of the system without having to turn attention from the digitizing tablet to a computer terminal
- digitizing tablets can be purchased in sizes from 25x25 cm to 200x150 cm, at approximate costs from $500 to $5,000
- early digitizers (ca. 1965) were backlit glass tables
- a magnetic field generated by the cursor was tracked mechanically by an arm located behind the table
- the arm''s motion was encoded, coordinates computed and sent to a host processor
- some early low-cost systems had mechanically linked cursors - the free-cursor digitizer was initially much more expensive
- the first solid-state systems used a spark generated by the cursor and detected by linear microphones
- problems with errors generated by ambient noise
- contemporary tablets use a grid of wires embedded in the tablet to generate a magnetic field which is detected by the cursor
- accuracies are typically better than 0.1 mm
- this is better than the accuracy with which the average operator can position the cursor
- functions for transforming coordinates are sometimes built into the tablet and used to process data before it is sent to the host
The digitizing operation
- the map is affixed to a digitizing table
- three or more control points ("reference points", "tics", etc.) are digitized for each map sheet
- these will be easily identified points (intersections of major streets, major peaks, points on coastline)
- the coordinates of these points will be known in the coordinate system to be used in the final database, e.g. lat/long, State Plane Coordinates, military grid
- the control points are used by the system to calculate the necessary mathematical transformations to convert all coordinates to the final system
- the more control points, the better
- digitizing the map contents can be done in two different modes:
- in point mode, the operator identifies the points to be captured explicitly by pressing a button
- in stream mode points are captured at set time intervals (typically 10 per second) or on movement of the cursor by a fixed amount
- advantages and disadvantages:
- in point mode the operator selects points subjectively
- two point mode operators will not code a line in the same way
- stream mode generates large numbers of points, many of which may be redundant
- stream mode is more demanding on the user while point mode requires some judgement about how to represent the line
- most digitizing is currently done in point mode
Problems with digitizing maps
- arise since most maps were not drafted for the purpose of digitizing
- paper maps are unstable: each time the map is removed from the digitizing table, the reference points must be re-entered when the map is affixed to the table again
- if the map has stretched or shrunk in the interim, the newly digitized points will be slightly off in their location when compared to previously digitized points
- errors occur on these maps, and these errors are entered into the GIS database as well
- the level of error in the GIS database is directly related to the error level of the source maps
- maps are meant to display information, and do not always accurately record locational information
- for example, when a railroad, stream and road all go through a narrow mountain pass, the pass may actually be depicted wider than its actual size to allow for the three symbols to be drafted in the pass
- discrepancies across map sheet boundaries can cause discrepancies in the total GIS database
- e.g. roads or streams that do not meet exactly when two map sheets are placed next to each other
- user error causes overshoots, undershoots (gaps) and spikes at intersection of lines
diagram

- user fatigue and boredom
- for a complete discussion on the manual digitizing process, see Marble et al, 1984
Editing errors from digitizing
- some errors can be corrected automatically
- small gaps at line junctions
- overshoots and sudden spikes in lines
- error rates depend on the complexity of the map, are high for small scale, complex maps
- these topics are explored in greater detail in later Units
- Unit 13 looks at the process of editing digitized data
- Units 45 and 46 discuss digitizing error
Digitizing costs
- a common rule of thumb in the industry is one digitized boundary per minute
- e.g. it would take 99/60 = 1.65 hours to digitize the boundaries of the 99 counties of Iowa
C. SCANNERS
Video scanner
HP SCANJET
- essentially television cameras, with appropriate interface electronics to create a computer-readable dataset
- available in either black and white or color
- extremely fast (scan times of under 1 second)
- relatively inexpensive ($100 - $6,000)
- produce a raster array of brightness (or color) values, which are then processed much like any other raster array
- typical data arrays from video scanners are of the order of 250 to 1000 pixels on a side
- typically have poor geometrical and radiometrical characteristics, including various kinds of spatial distortions and uneven sensitivity to brightness across the scanned field
- video scanners are difficult to use for map input because of problems with distortion and interpretation of features
Electromechanical scanner
Contex
- unlike the video scanning systems, electromechanical systems are typically more expensive ($10,000 to 100,000) and slower, but can create better quality products
- one common class of scanners involves attaching the graphic to a drum
- as the drum rotates about its axis, a scanner head containing a light source and photodetector reads the reflectivity of the target graphic, and digitizing this signal, creates a single column of pixels from the graphic
- the scanner head moves along the axis of the drum to create the next column of pixels, and so on through the entire scan
- compare the action of a lathe in a machine shop
- this controls distortion by bringing the single light source and detector to position on a regular grid of locations on the graphic
- systems may have a scan spot size of as little as 25 micrometers, and be able to scan graphics of the order of 1 meter on a side
- an alternative mechanism involves an array of photodetectors which extract data from several rows of the raster simultaneously
- the detector moves across the document in a swath
- when all the columns have been scanned, the detector moves to a new swath of rows
Requirements for scanning
- documents must be clean (no smudges or extra markings)
- lines should be at least 0.1 mm wide
- complex line work provides greater chance of error in scanning
- text may be accidently scanned as line features
- contour lines cannot be broken with text
- automatic feature recognition is not easy (two contour lines vs. road symbols)
diagram
- special symbols (e.g. marsh symbols) must be recognized and dealt with
- if good source documents are available, scanning can be an efficient time saving mode of data input
D. CONVERSION FROM OTHER DIGITAL SOURCES
- involves transferring data from one system to another by means of a conversion program
- more and more data is becoming available in magnetic media
- USGS digital cartographic data (DLGs - Digital Line Graphs)
WV Tech Center Mineral Lands Mapping Project
Digital orthophotographs (DOQQ)
- digital elevation models (DEMs)
- TIGER and other census related data
- data from CAD/CAM systems (AutoCAD, DXF)
- data from other GIS
- these data generally are supplied in digital format - on-line, WWW, or media that must be read into the computer
- CD-ROM is popular for this purpose
- provides better standards
CD-ROM hardware is much less expensive - CD-ROM drive $500, tape drive $14,000
FTP over a network for smaller datasets ( <5 Mbyte)
Iomega ZIP or JAZ drives
DAT tape drive 4 and 8 mm
Automated Surveying
- directly determines the actual horizontal and vertical positions of objects
- two kinds of measurements are made: distance and direction
- traditionally, distance measuring involved pacing, chains and tapes of various materials
- direction measurements were made with transits and theodolites
- modern surveyors have a number of automated tools to make distance and direction measurements easier
Leica Total Stations
- electronic systems measure distance using the time of travel of beams of light or radio waves
- by measuring the round-trip time of travel, from the observing instrument to the object in question and back, we can use the relationship (d = v x t) to determine the distance
- an instrument based on timing the travel of a pulse of infrared light can measure distances on the order of 10 km with a standard deviation of +/- 15 mm
- the total station (cost about $30,000) captures distance and direction data in digital form
- the data is downloaded to a host computer at the end of each session for direct input to GIS and other programs
Global Positioning System (GPS)
- a new tool for determining accurate positions on the surface of the earth
- computes positions from signals received from a series of satellites (NAVSTAR)
TRIMBLE
24 birds some "spares"
- depends on precise information about the orbits of the satellites
- a radio receiver with appropriate electronics is connected to a small antenna, and depending on the method used, in one hour to less than 1 second, the system is able to determine its location in 3-D space
- developed and operated by the US armed forces, but access is generally available and civilian interest is high
- particularly valuable for establishing accurate positional control in remote areas
- current GPS receivers cost about $5,000 to $15,000 (mid 1990) but costs will decline rapidly
- railroad companies are using GPS to create the first accurate survey of the US rail network and to track train positions
- recently, the use of GPS has resulted in corrections to the elevations of many of the world''s peaks, including Mont Blanc and K2
- current GPS positional accuracies are order 5 to 10 m with standard equipment and as small as 1 cm with "survey grade" receivers
- accuracy will continue to improve as more satellites are placed in orbit and experts fine tune the software and hardware
- GPS accuracy is already as good as the largest scale base mapping available for the continental US
E. CRITERIA FOR CHOOSING MODES OF INPUT
- the type of data source
- images favor scanning
- maps can be scanned or digitized
- the database model of the GIS
- scanning easier for raster, digitizing for vector
- the density of data
- dense linework makes for difficult digitizing
- expected applications of the GIS implementation

F. RASTERIZATION AND VECTORIZATION
Rasterization of digitized data
- for some data, entry in vector form is more efficient, followed by conversion to raster
- we might digitize the county boundary in vector form by
- mounting a map on a digitizing table
- capturing the locations of points along the boundary
- assuming that the points are connected by straight line segments
- this may produce an ASCII file of pairs of xy coordinates which must then be processed by the GIS, or the output of the digitizer may go directly into the GIS
- the vector representation of the boundary as points is then converted to a raster by an operation known as vector-raster conversion
- the computer calculates which county each cell is in using the vector representation of the boundary and outputs a raster
- digitizing the boundary is much less work than cell by cell entry
- most raster GIS have functions such as vector-raster conversion to support vector entry
- many support digitizing and editing of vector data
Vectorization of scanned images
- for many purposes it is necessary to extract features and objects from a scanned image
- e.g. a road on the input document will have produced characteristic values in each of a band of pixels
- if the scanner has pixels of 25 microns = 0.025 mm, a line of width 0.5 mm will create a band 20 pixels across
- the vectorized version of the line will be a series of coordinate points joined by straight lines, representing the road as an object or feature instead of a collection of contiguous pixels
- successful vectorization requires a clean line scanned from media free of cluttering labels, coffee stains, dust etc.
- to create a sufficiently clean line, it is often necessary to redraft input documents
- e.g. the Canada Geographic Information System redrafted each of its approximately 10,000 input documents
- since the scanner can be color sensitive, vectorizing may be aided by the use of special inks for certain features
- although scanning is much less labor intensive, problems with vectorization lead to costs which are often as high as manual digitizing
- two stages of error correction may be necessary:
1. edit the raster image prior to vectorization
2. edit the vectorized features

G. INTEGRATING DIFFERENT DATA SOURCES
Formats
- many different format standards exist for geographical data
- some of these have been established by public agencies
- e.g. the USGS in cooperation with other federal agencies ihas developed SDTS (Standard Data Transfer Standard) for geographical data
Clearinghouse
Federal Geographic Data Committee FGDC

METADATA standards

Search and retrieval of spatial data

Where is it? Who has it? How can I get it? In what form can I get it?

Framework

Standards for digital data development

Seven or eight major data layers

National On-line database


Regional Coordinators and Local Producers
- e.g. the Defense Mapping Agency (DMA) has developed the DIGEST data transfer standard
- some have been defined by vendors
- e.g. SIF (Standard Interchange Format) is an Intergraph standard for data transfer
DXF for CAD/CAM data
- a good GIS can accept and generate datasets in a wide range of standard formats
Projections
- there are many ways of representing the curved surface of the earth on a flat map
- some of these map projections are very common, e.g. Mercator, Universal Transverse Mercator (UTM), Lambert Conformal Conic
- each state has a standard SPC (State Plane Coordinate system) based on one or more projections
- see Unit 27 for more on map projections
- a good GIS can convert data from o

INTEGRATING DIFFERENT DATA SOURCES
Data Imput basic source

Hosted by uCoz