|
Describing Data Quality and Errors | |
|
High accuracy,low precisisons
|
High precisisons,low accuracy
|
|
Acc-prec
|
Acc
|
Prec
|
|
Point
|
Line
|
Common Errors
|
|
|
|
|
|
|
|
|
Digitalizing
|
|
Data quality issues
�� Factors affecting data quality
�� Types of GIS errors
�� Methods to deal with errors
�� Estimating degree of errors
Computer cartography
�� Computer cartography (digital mapping) is the generation,
storage and editing of maps using a computer.
�� Spatial statistics
�� Initially, developments were in areas such as measures of
spatial distribution, three-dimensional analysis, network
analysis and modeling techniques.
�� CAD
�� Computer-aided design (CAD) is used to enter and
manipulate graphics data, whereas GIS is used to store and
analise spatial data. Traditional users of CAD used CAD for
the production, maintenance and analysis of design graphics
Causes of Errors
Measurement errors: accuracy (ex. altitude
measurement or soil samples, usually related
to instruments)
�� Computational errors: precision (ex. to what
decimal point the data is represented?)
�� Human error: error in using instruments,
selecting scale, location of samples
�� Data model representation errors
�� Errors in derived data
What kind of Errors
Factor:
Age of data – collected at different times
�� Arial coverage
�� Map scale and resolution
�� Density of observations
�� Data formats and exchanges in formats
�� Accessibility
Data quality issues: errors
during data collections (accurate imprecise, Inaccurate precise, inaccurate imprecise, accurate precise)
Source of Error
Errors arising from our understanding and modeling
of reality
�� The different ways in which people perceive reality can have
effects on how they model the world using GIS.
�� Errors in source data for GIS
�� Survey data can contain errors due to mistakes made by
people operating equipment, or due to technical problems
with equipment.
�� Remotely sensed, and aerial photography data could have
spatial errors if they were referenced wrongly, and mistakes
in classification and interpretation would create attribute error
Perceptions or reality – mental mapping
�� Your mental map of your home town or local area is your
personal representation. You could commit this to paper as a
sketch map if required.
�� Figure in next slide shows two sketch map describing the
location of Northallerton in the UK. They would be flawed as
cartographic products, but nevertheless are a valid model of
reality. Keates (1982) considers these maps to be personal,
fragmentary, incomplete and frequently erroneous.
�� In all cases, distance and shape would be distorted in
inconsistent ways and the final product would be influenced by
the personality and experience of the ‘cartographer’.
Manual Digiting
Psychological errors: Difficulties in
perceiving the true centre of the line
being digitized and inability to move
the cursor cross-hairs accurately along
it.
�� Physiological errors: It result from
involuntary muscle spasms that give
rise to random displacement.
�� Line thickness: The thickness of lines
on a map is determined by the
cartographic generation employed.
�� Method of digitizing: Point mode and stream mode
Topological errors in vector
(a) Effects of tolerance on
topological cleaning
(b)Topological
ambiguities in raster to
vector conversion
Vector to raster classification
Error
Rasterization errors
Vector to raster
conversion can cause
an interesting
assortment of errors in
the resulting data. For
example
�� Topological errors
�� Loss of small polygons
�� Effects of grid
orientation
�� Variations in grid origin
and datum
Topological error in vector GIS:
(a) loss of connectivity and
creation of false connectivity
(b) loss of information
Errors in data processing and
Analysis
GIS operations that can introduce errors include the
classification of data, aggregation or disaggregation
of area data and the integration of data using overlay
technique.
�� Where a certain level of spatial resolution or a certain
set of polygon boundaries are required, data sets
that are not mapped with these may need to be
aggregated or disaggregated to the required level.
Attribute error due to
Processing
Attribute error result from
positional error (such as
the missing ‘hole’ feature in
map A that is present as an
island in map B). If one of
the two maps that overlaid
contains an error, then a
classification error will
result in the composite
map (polygon BA).
Project management and
human error
Measuring error
�� Typos/drawing errors
�� Incorrect implementation error
�� Planning/coordination error
�� Incorrect use of devices error
�� Erroneous methodology error
Geometry related errors
Rounding errors
�� Processing errors
�� Geometric coordinate transformation
�� Map scanning, geometric
approximations
�� Vector to fine raster errors
Numerical modeling -
propagating errors
In numerical modeling and simulation
errors propagate due to the
multiplication of numerical error and
initial measurement error.
�� This presents additional challenges
How errors can be controlled?
Estimating degree of an error is an
interesting area of GIS and
computational science.
Methods to deal with errors
Initial data: control quality of measurement,
develop standards, prevent human error.
�� Data models: select correct data models
based on experience or model
appropriateness, reduce errors during
conversion from one to another.
Finding and modeling errors in
GIS
Checking for errors
�� Probably the simplest means of checking for data
errors is by visual inspection.
�� Various statistical methods can be employed to
help pinpoint potential errors.
�� Estimating degree of an error helps in controlling
and correcting errors
Estimating degree of an error -
map digitizing errors
Map digitizing – errors in digitizing a line or a
geometric shape are estimated by studying the
number of segments for curve approximation and
map properties (distorted, rotated map).
�� Perkal’s concept – there is an epsilon error band
around a digitized line within which the data is
considered to be correct.
Estimating degree of an error –
raster to vector graphics
Statistical approaches for error estimation based on
probabilities of error
�� Estimating the size of the grid and its influence on
data approximation (Switzer’s method)
�� Double-conversion method for vector to raster
conversion (Bregt’s method): convert data twice,
compare differences
Estimating degree of an error
while processing data
Errors associated with overlaying data and other
spatial operations on data arise from inexact
representation of original data and processing errors.
�� Measurement of agreement between an original and
overlaid polygons can be taken (McAlpine and Cook)
Estimating degree of
progressing errors
Two approaches to the modeling of errors in spatial
data: epsilon modeling and Monte Carlo
simulation.
�� Epsilon modeling is based on a well known
method of line generation (epsilon width).
�� A Monte Carlo simulation approach has also been
used to model the effects of error propagation in
GIS overlays.
Methods to deal with
progressing errors
Conversion errors – develop more reliable algorithms
�� Numerical or progressing errors:
�� maintain consistence of data or result
�� utilize exact computation methods
progressing errors
Conversion errors – develop more reliable algorithms
�� Numerical or progressing errors:
�� maintain consistence of data or result
�� utilize exact computation methods
2. Logical error detection of individual objects
Certain objects in a map database have logical relationships with other objects. For example, a parking lot
should exit to a road. If a parking lot is by itself without any entrance or exit, then there is a logical error.
Consider a bridge, it could either be across a stream, river or another road and its two sides should be
connected to roads. These are the knowledge that can be coded to automatically check if there is any
logical errors associated with each bridge. Such logical error detection associated with bridges is
HWY
Spatial inconsistency detected
between a reservoir and a
highway
One way of correcting the
error
Reservoir Reservoir
HWY
particularly useful in detecting errors of other attributes that are connected to bridges. Figure 2 illustrates
the situation for a parking lot and a bridge. This method can be applied to any type of object whose
relationship with other objects can be logically expressed
3. Attribute error identification through logical consistency checking among different map layers
In a spatial database, data are often organized in different map layers. Each map layer may be obtained
from different sources. Attribute error on one map layer may not be detected without being compared with
attribute data from other map layers. For example, a forest fire history map contains the distribution of
burnt areas with an attribute of time of fire occurrence (e.g., Figure 3a). Are there any mistakes in the fire
history records? Some such errors may be detected when the fire history map is overlaid onto an up-to-date
forest cover map (e.g., Figure 3b). Fire history records can be checked according to the current stage of
forest restoration. In the example as shown in Figure 3, the two fire occurrence times should obviously be
exchanged because they are not consistent with the age of the vegetation.
Knowing how the maps are made helps us to detect attribute errors. From Figure 4, it is obvious that the
urban area determined from the DMSP city light data is largely exaggerated. This is partly caused by the
poor spatial resolution of the city light data (1 km resampled from the original 600 m) and the less accurate
city light intensity thresholding algorithm applied in urban area detection. It can be considered as the
extreme end of over-commission of urban land in the mapped area. Almost any area not included in the
urban area determined from the city light data is not likely to be urban. The urban area from the other two
map sources are relatively consistent except at the lower left corner where there is a large tract of urban
land only claimed by the USGS source. Therefore, before any other verification we are almost certain that
it is an attribute error over that tract of land. The particular error in the USGS data layer was verified by
road density and Landsat TM imagery.
Discussions
From the illustrations in the above section, it can be seen that inconsistency is a useful indicator of spatial
data errors. Inconsistency may exist on a single map layer or among different map layers. Inconsistency
can be detected automatically. This requires a good knowledge of various characteristics of spatial data.
Inconsistency checking should be made in at least four aspects: self checking of data completeness such as
various spatial, attribute components of an object represented in the database; spatial consistency among
neighbors of objects; multivariable (multi-attribute) consistency through comparison; and spatial
consistency among multiple variables. It is expected that the level of complexity in consistency checking
increases in a similar order. Some of the corrections for spatial errors as reflected by inconsistency can be
done automatically while it is more appropriate to correct errors or reduce uncertainties through an
interactive process.
Like a spelling checker in a word processing software, an inconsistency checker is envisioned that is
developed for each database. It can be fired to run in the batch mode or at the background once new data
are added into the database. Some detected inconsistencies are corrected according some rules and are
highlighted while some others are left uncorrected. All inconsistencies should be recorded to alert data
analysts for final correction decision. Some special visualization tools can be used for the purpose of
inconsistency warning. A mechanismshould be built for database manager and data users to track changes
made to data and to allow for reverse processing should automatic correction is considered done
inappropriately.
Errors
Lecture Errors and Quality
Errors in GIS
The Nature of Geographic Information
Error detection through consistency checking |