| |||
Describing Data Quality and Errors http://educationally.narod.ru/giserrphotoalbum.html Map Dictionary, Map Definition http://dictionary.reference.com/browse/map Metadata Quality http://www.uwyo.edu/wygisc/metadata/quality.html The GIS Book http://books.google.com/books?id=_C6oPvJ5S_EC&pg=RA3-PA223&lpg=RA3-PA223&dq=describing+data+quality+errors+gis&source=bl&ots=jl17KRYReW&sig=XB92A1kRoPbI3rQy0oywBUV593g&hl=en&ei=FCjaSoCWKZPKsAPsrb2RBg&sa=X&oi=book_result&ct=result&resnum=3&ved=0CBUQ6AEwAg#v=onepage&q=&f=false Error, Accuracy, and Precision http://www.colorado.edu/geography/gcraft/notes/error/error.html Accuracy and precision http://en.wikipedia.org/wiki/Accuracy_and_precision In the fields of engineering, industry and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to its actual (true) value. The precision of a measurement system, also called reproducibility or repeatability, is the degree to which repeated measurements under unchanged conditions show the same results.[1] Although the two words can be synonymous in colloquial use, they are deliberately contrasted in the context of scientific method. Accuracy indicates proximity of measurement results to the true value, precision to the repeatability or reproducibility of the measurementA measurement system can be accurate but not precise, precise but not accurate, neither, or both. For example, if an experiment contains a systematic error, then increasing the sample size generally increases precision but does not improve accuracy. Eliminating the systematic error improves accuracy but does not change precision. A measurement system is called valid if it is both accurate and precise. Related terms are bias (non-random or directed effects caused by a factor or factors unrelated by the independent variable) and error (random variability), respectively. The terminology is also applied to indirect measurements, that is, values obtained by a computational procedure from observed data. Image results for accuracy precision http://images.google.com/images?hl=en&source=hp&q=accuracy+precision&um=1&ie=UTF-8&ei=dT_aSoPlAY-0sgOuq-yxCQ&sa=X&oi=image_result_group&ct=title&resnum=4&ved=0CBkQsAQwAw The Science of Measurement: Accuracy vs. Precision http://honolulu.hawaii.edu/distance/sci122/SciLab/L5/accprec.html The dictionary definitions of these two words do not clearly make the distinction as it is used in the science of measurement. Accurate means "capable of providing a correct reading or measurement." In physical science it means ''correct''. A measurement is accurate if it correctly reflects the size of the thing being measured. Precise means "exact, as in performance, execution, or amount. "In physical science it means "repeatable, reliable, getting the same measurement each time." We can never make a perfect measurement. The best we can do is to come as close as possible within the limitations of the measuring instruments. Let''s use a model to demonstrate the difference. Suppose you are aiming at a target, trying to hit the bull''s eye (the center of the target) with each of five darts. Here are some representative pattern of darts in the target. Measurement http://honolulu.hawaii.edu/distance/sci122/SciLab/L5/measure.html#Accuracy%20vs.%20Precision Introduction: Measure for Measure All experiments to be performed in this laboratory require one or more measurements. A measurement is defined as the ratio of the magnitude (how much) of any quantity to a standard value. The standard value is called a unit. A measurement of any kind requires both a magnitude and a unit. By their nature, measurements can never be done perfectly. Part of the error in making measurements may be due to the skill of the person making the measurement, but even the most skillful among us cannot make the perfect measurement. Basically this is because no matter how small we make the divisions on our ruler (using distance as an example) we can never be sure that the thing we are measuring lines up perfectly with one of the marks. To put it another way, no matter how fine the measurement, there are always more decimal places that we must estimate. Therefore the judgment of the person doing the measurement plays a significant role in the accuracy and precision of the measurement. Typically we measure simple quantities of only three types, mass, length, and time. Occasionally we include temperature, electrical charge or light intensity. It is amazing, but just about everything we know about the universe comes from measuring these six quantities. Most of our knowledge comes from measurements of mass, length, and time alone. Error, Accuracy, and Precision http://www.colorado.edu/geography/gcraft/notes/error/error_f.html The Importance of Error, Accuracy, and Precision Until quite recently, people involved in developing and using GIS paid little attention to the problems caused by error, inaccuracy, and imprecision in spatial datasets. Certainly there was an awareness that all data suffers from inaccuracy and imprecision, but the effects on GIS problems and solutions was not considered in great detail. Major introductions to the field such as C. Dana Tomlin''s Geographic Information Systems and Cartographic Modeling (1990), Jeffrey Star and John Estes''s Geographic Information Systems: An Introduction (1990), and Keith Clarke''s Analytical and Computer Cartography (1990) barely mention the issue. This situation has changed substantially in recent years. It is now generally recognized that error, inaccuracy, and imprecision can "make or break" many types of GIS project. That is, errors left unchecked can make the results of a GIS analysis almost worthless. The irony is that the problem of error is devolves from one of greatest strengths of GIS. GIS gain much of their power from being able to collate and cross-reference many types of data by location. They are particularly useful because they can integrate many discrete datasets within a single system. Unfortunately, every time a new dataset is imported, the GIS also inherits its errors. These may combine and mix with the errors already in the database in unpredictable ways. One of first thorough discussions of the problems and sources error appeared in P.A. Burrough''s Principles of Geographical Information Systems for Land Resources Assessment (1986). Now the issue is addressed in many introductory texts on GIS.. The key point is that even though error can disrupt GIS analyses, there are ways to keep error to a minimum through careful planning and methods for estimating its effects on GIS solutions. Awareness of the problem of error has also had the useful benefit of making GIS practitioners more sensitive to potential limitations of GIS to reach impossibly accurate and precise solutions. -------------------------------------------------------------------------------- 2. Some Basic Definitions It is important to distinguish from the start a different between accuracy and precision: 1) Accuracy is the degree to which information on a map or in a digital database matches true or accepted values. Accuracy is an issue pertaining to the quality of data and the number of errors contained in a dataset or map. In discussing a GIS database, it is possible to consider horizontal and vertical accuracy with respect to geographic position, as well as attribute, conceptual, and logical accuracy. The level of accuracy required for particular applications varies greatly. Highly accurate data can be very difficult and costly to produce and compile. 2) Precision refers to the level of measurement and exactness of description in a GIS database. Precise locational data may measure position to a fraction of a unit. Precise attribute information may specify the characteristics of features in great detail. It is important to realize, however, that precise data--no matter how carefully measured--may be inaccurate. Surveyors may make mistakes or data may be entered into the database incorrectly. The level of precision required for particular applications varies greatly. Engineering projects such as road and utility construction require very precise information measured to the millimeter or tenth of an inch. Demographic analyses of marketing or electoral trends can often make do with less, say to the closest zip code or precinct boundary. Highly precise data can be very difficult and costly to collect. Carefully surveyed locations needed by utility companies to record the locations of pumps, wires, pipes and transformers cost $5-20 per point to collect. High precision does not indicate high accuracy nor does high accuracy imply high precision. But high accuracy and high precision are both expensive. Be aware also that GIS practitioners are not always consistent in their use of these terms. Sometimes the terms are used almost interchangeably and this should be guarded against. Two additional terms are used as well: Data quality refers to the relative accuracy and precision of a particular GIS database. These facts are often documented in data quality reports. Error encompasses both the imprecision of data and its inaccuracies. -------------------------------------------------------------------------------- 3. Types of Error Positional error is often of great concern in GIS, but error can actually effect many different characteristics of the information stored in a database. 3.1. Positional accuracy and precision. This applies to both horizontal and vertical positions. Accuracy and precision are a function of the scale at which a map (paper or digital) was created. The mapping standards employed by the United States Geological Survey specify that: "requirements for meeting horizontal accuracy as 90 per cent of all measurable points must be within 1/30th of an inch for maps at a scale of 1:20,000 or larger, and 1/50th of an inch for maps at scales smaller than 1:20,000." Accuracy Standards for Various Scale Maps 1:1,200 ¡¾ 3.33 feet 1:2,400 ¡¾ 6.67 feet 1:4,800 ¡¾ 13.33 feet 1:10,000 ¡¾ 27.78 feet 1:12,000 ¡¾ 33.33 feet 1:24,000 ¡¾ 40.00 feet 1:63,360 ¡¾ 105.60 feet 1:100,000 ¡¾ 166.67 feet This means that when we see a point on a map we have its "probable" location within a certain area. The same applies to lines. Beware of the dangers of false accuracy and false precision, that is reading locational information from map to levels of accuracy and precision beyond which they were created. This is a very great danger in computer systems that allow users to pan and zoom at will to an infinite number of scales. Accuracy and precision are tied to the original map scale and do not change even if the user zooms in and out. Zooming in and out can however mislead the user into believing--falsely--that the accuracy and precision have improved. 3.2. Attribute accuracy and precision The non-spatial data linked to location may also be inaccurate or imprecise. Inaccuracies may result from mistakes of many sorts. Non-spatial data can also vary greatly in precision. Precise attribute information describes phenomena in great detail. For example, a precise description of a person living at a particular address might include gender, age, income, occupation, level of education, and many other characteristics. An imprecise description might include just income, or just gender. 3.3. Conceptual accuracy and precision GIS depend upon the abstraction and classification of real-world phenomena. The users determines what amount of information is used and how it is classified into appropriate categories. Sometimes users may use inappropriate categories or misclassify information. For example, classifying cities by voting behavior would probably be an ineffective way to study fertility patterns. Failing to classify power lines by voltage would limit the effectiveness of a GIS designed to manage an electric utilities infrastructure. Even if the correct categories are employed, data may be misclassified. A study of drainage systems may involve classifying streams and rivers by "order," that is where a particular drainage channel fits within the overall tributary network. Individual channels may be misclassified if tributaries are miscounted. Yet some studies might not require such a precise categorization of stream order at all. All they may need is the location and names of all stream and rivers, regardless of order. 3.4 Logical accuracy and precision Information stored in a database can be employed illogically. For example, permission might be given to build a residential subdivision on a floodplain unless the user compares the proposed plat with floodplain maps. Then again, building may be possible on some portions of a floodplain but the user will not know unless variations in flood potential have also been recorded and are used in the comparison. The point is that information stored in a GIS database must be used and compared carefully if it is to yield useful results. GIS systems are typically unable to warn the user if inappropriate comparisons are being made or if data are being used incorrectly. Some rules for use can be incorporated in GIS designed as "expert systems," but developers still need to make sure that the rules employed match the characteristics of the real-world phenomena they are modeling. Finally, It would be a mistake to believe that highly accurate and highly precision information is needed for every GIS application. The need for accuracy and precision will vary radically depending on the type of information coded and the level of measurement needed for a particular application. The user must determine what will work. Excessive accuracy and precision is not only costly but can cause considerable details. -------------------------------------------------------------------------------- 4. Sources of Inaccuracy and Imprecision There are many sources of error that may affect the quality of a GIS dataset. Some are quite obvious, but others can be difficult to discern. Few of these will be automatically identified by the GIS itself. It is the user''s responsibility to prevent them. Particular care should be devoted to checking for errors because GIS are quite capable of lulling the user into a false sense of accuracy and precision unwarranted by the data available. For example, smooth changes in boundaries, contour lines, and the stepped changes of chloropleth maps are "elegant misrepresentations" of reality. In fact, these features are often "vague, gradual, or fuzzy" (Burrough 1986). There is an inherent imprecision in cartography that begins with the projection process and its necessary distortion of some of the data (Koeln and others 1994), an imprecision that may continue throughout the GIS process. Recognition of error and importantly what level of error is tolerable and affordable must be acknowledged and accounted for by GIS users. Burrough (1986) divides sources of error into three main categories: Obvious sources of error. Errors resulting from natural variations or from original measurements. Errors arising through processing. Generally errors of the first two types are easier to detect than those of the third because errors arising through processing can be quite subtle and may be difficult to identify. Burrough further divided these main groups into several subcategories. 4.1 Obvious Sources of Error 4.1.1. Age of data. Data sources may simply be to old to be useful or relevant to current GIS projects. Past collection standards may be unknown, non-existent, or not currently acceptable. For instance, John Wesley Powell''s nineteenth century survey data of the Grand Canyon lacks the precision of data that can be developed and used today. Additionally, much of the information base may have subsequently changed through erosion, deposition, and other geomorphic processes. Despite the power of GIS, reliance on old data may unknowingly skew, bias, or negate results. 4.1.2. Areal Cover. Data on a give area may be completely lacking, or only partial levels of information may be available for use in a GIS project. For example, vegetation or soils maps may be incomplete at borders and transition zones and fail to accurately portray reality. Another example is the lack of remote sensing data in certain parts of the world due to almost continuous cloud cover. Uniform, accurate coverage may not be available and the user must decide what level of generalization is necessary, or whether further collection of data is required. 4.1.3. Map Scale. The ability to show detail in a map is determined by its scale. A map with a scale of 1:1000 can illustrate much finer points of data than a smaller scale map of 1:250000. Scale restricts type, quantity, and quality of data (Star and Estes 1990). One must match the appropriate scale to the level of detail required in the project. Enlarging a small scale map does not increase its level of accuracy or detail. 4.1.4. Density of Observations. The number of observations within an area is a guide to data reliability and should be known by the map user. An insufficient number of observations may not provide the level of resolution required to adequately perform spatial analysis and determine the patterns GIS projects seek to resolve or define. A case in point, if the contour line interval on a map is 40 feet, resolution below this level is not accurately possible. Lines on a map are a generalization based on the interval of recorded data, thus the closer the sampling interval, the more accurate the portrayed data. 4.1.5. Relevance. Quite often the desired data regarding a site or area may not exist and "surrogate " data may have to be used instead. A valid relationship must exist between the surrogate and the phenomenon it is used to study but, even then, error may creep in because the phenomenon is not being measured directly. A local example of the use of surrogate data are habitat studies of the golden-cheeked warblers in the Hill Country. It is very costly (and disturbing to the birds) to inventory these habitats through direct field observation. But the warblers prefer to live in stands of old growth cedar Juniperus ashei. These stands can be identified from aerial photographs. The density of Juniperus ashei can be used as surrogate measure of the density of warbler habitat. But, of course, some areas of cedar may uninhabited or inhibited to a very high density. These areas will be missed when aerial photographs are used to tabulate habitats. Another example of surrogate data are electronic signals from remote sensing that are use to estimate vegetation cover, soil types, erosion susceptibility, and many other characteristics. The data is being obtained by an indirect method. Sensors on the satellite do not "see" trees, but only certain digital signatures typical of trees and vegetation. Sometimes these signatures are recorded by satellites even when trees and vegetation are not present (false positives) or not recorded when trees and vegetation are present (false negatives). Due to cost of gathering on site information, surrogate data is often substituted and the user must understand variations may occur and although assumptions may be valid, they may not necessarily be accurate. 4.1.6. Format. Methods of formatting digital information for transmission, storage, and processing may introduce error in the dat Error, Accuracy, and Precision 4.1.6. Format. Methods of formatting digital information for transmission, storage, and processing may introduce error in the data. Conversion of scale, projection, changing from raster to vector format, and resolution size of pixels are examples of possible areas for format error. Expediency and cost often require data reformation to the "lowest common denominator" for transmission and use by multiple GIS. Multiple conversions from one format to another may create a ratchet effect similar to making copies of copies on a photo copy machine. Additionally, international standards for cartographic data transmission, storage and retrieval are not fully implemented. 4.1.7. Accessibility. Accessibility to data is not equal. What is open and readily available in one country may be restricted, classified, or unobtainable in another. Prior to the break-up of the former Soviet Union, a common highway map that is taken for granted in this country was considered classified information and unobtainable to most people. Military restrictions, inter-agency rivalry, privacy laws, and economic factors may restrict data availability or the level of accuracy in the data. 4.1.8. Cost. Extensive and reliable data is often quite expensive to obtain or convert. Initiating new collection of data may be too expensive for the benefits gained in a particular GIS project and project managers must balance their desire for accuracy the cost of the information. True accuracy is expensive and may be unaffordable. 4.2. Errors Resulting from Natural Variation or from Original Measurements. Although these error sources may not be as obvious, careful checking will reveal their influence on the project data. 4.2.1. Positional accuracy. Positional accuracy is a measurement of the variance of map features and the true position of the attribute (Antenucci and others 1991, p. 102). It is dependent on the type of data being used or observed.. Map makers can accurately place well-defined objects and features such as roads, buildings, boundary lines, and discrete topographical units on maps and in digital systems, whereas less discrete boundaries such as vegetation or soil type may reflect the estimates of the cartographer. Climate, biomes, relief, soil type, drainage and other features lack sharp boundaries in nature and are subject to interpretation. Faulty or biased field work, map digitizing errors and conversion, and scanning errors can all result in inaccurate maps for GIS projects. 4.2.2. Accuracy of content. Maps must be correct and free from bias. Qualitative accuracy refers to the correct labeling and presence of specific features. For example, a pine forest may be incorrectly labeled as a spruce forest, thereby introducing error that may not be known or noticeable to the map or data user. Certain features may be omitted from the map or spatial database through oversight, or by design. Other errors in quantitative accuracy may occur from faulty instrument calibration used to measure specific features such as altitude, soil or water pH, or atmospheric gases. Mistakes made in the field or laboratory may be undetectable in the GIS project unless the user has conflicting or corroborating information available. 4.2.3. Sources of variation in data. Variations in data may be due to measurement error introduced by faulty observation, biased observers, or by mis-calibrated or inappropriate equipment. For example, one can not expect sub-meter accuracy with a hand-held, non-differential GPS receiver. Likewise, an incorrectly calibrated dissolved oxygen meter would produce incorrect values of oxygen concentration in a stream. There may also be a natural variation in data being collected, a variation that may not be detected during collection. As an example, salinity in Texas bays and estuaries varies during the year and is dependent upon freshwater influx and evaporation. If one was not aware of this natural variation, incorrect assumptions and decisions could be made, and significant error introduced into the GIS project. In any case if the errors do not lead to unexpected results their detection may be extremely difficult. 4.3. Errors Arising Through Processing Processing errors are the most difficult to detect by GIS users and must be specifically looked for and require knowledge of the information and the systems used to process it. These are subtle errors that occur in several ways, and are therefore potentially more insidious, particularly because they can occur in multiple sets of data being manipulated in a GIS project. 4.3.1. Numerical Errors. Different computers may not have the same capability to perform complex mathematical operations and may produce significantly different results for the same problem. Burrough (1990) cites an example in number squaring that produced 1200% difference. Computer processing errors occur in rounding off operations and are subject to the inherent limits of number manipulation by the processor. Another source of error may from faulty processors, such as the recent mathematical problem identified in Intel''s Pentium(tm) chip. In certain calculations, the chip would yield the wrong answer. A major challenge is the accurate conversion of existing to maps to digital form (Muehrcke 1986). Because computers must manipulate data in a digital format, numerical errors in processing can lead to inaccurate results. In any case numerical processing errors are extremely difficult to detect, and perhaps assume a sophistication not present in most GIS workers or project managers. 4.3.2. Errors in Topological Analysis. Logic errors may cause incorrect manipulation of data and topological analyses (Star and Estes 1990). One must recognize that data is not uniform and is subject to variation. Overlaying multiple layers of maps can result in problems such as Slivers , Overshoots , and Dangles . Variation in accuracy between different map layers may be obscured during processing leading to the creation of "virtual data which may be difficult to detect from real data" (Sample 1994). 4.3.3. Classification and Generalization Problems. For the human mind to comprehend vast amounts of data it must be classified, and in some cases generalized, to be understandable. According to Burrough (1986, pp. 137) about seven divisions of data is ideal and may be retained in human short term memory. Defining class intervals is another problem area. For instance, defining a cause of death in males between 18-25 years old would probably be significantly different in a class interval of 18-40 years old. Data is most accurately displayed and manipulated in small multiples. Defining a reasonable multiple and asking the question "compared to what" is critical (Tufte 1990, pp. 67-79). Classification and generalization of attributes used in GIS are subject to interpolation error and may introduce irregularities in the data that is hard to detect. 4.3.4. Digitizing and Geocoding Errors. Processing errors occur during other phases of data manipulation such as digitizing and geocoding, overlay and boundary intersections, and errors from rasterizing a vector map. Physiological errors of the operator by involuntary muscle contractions may result in spikes, switchbacks, polygonal knots, and loops . Errors associated with damaged source maps, operator error while digitizing , and bias can be checked by comparing original maps with digitized versions. Other errors are more elusive. -------------------------------------------------------------------------------- 5. The Problems of Propagation and Cascading This discussion focused to this point on errors that may be present in single sets of data. GIS usually depend on comparisons of many sets of data. This schematic diagram shows how a variety of discrete datasets may have to be combined and compared to solve a resource analysis problem. It is unlikely that the information contained in each layer is of equal accuracy and precision. Errors may also have been made compiling the information. If this is the case, the solution to the GIS problem may itself be inaccurate, imprecise, or erroneous. The point is that inaccuracy, imprecision, and error may be compounded in GIS that employ many data sources. There are two ways in which this compounded my occur. 5.1. Propagation Propagation occurs when one error leads to another. For example, if a map registration point has been mis-digitized in one coverage and is then used to register a second coverage, the second coverage will propagate the first mistake. In this way, a single error may lead to others and spread until it corrupts data throughout the entire GIS project. To avoid this problem use the largest scale map to register your points. Often propagation occurs in an additive fashion, as when maps of different accuracy are collated. 5.2. Cascading Cascading means that erroneous, imprecise, and inaccurate information will skew a GIS solution when information is combined selectively into new layers and coverages. In a sense, cascading occurs when errors are allowed to propagate unchecked from layer to layer repeatedly. The effects of cascading can be very difficult to predict. They may be additive or multiplicative and can vary depending on how information is combined, that is from situation to situation. Because cascading can have such unpredictable effects, it is important to test for its influence on a given GIS solution. This is done by calibrating a GIS database using techniques such as sensitivity analysis. Sensitivity analysis allows the users to gauge how and how much errors will effect solutions. Calibration and sensitivity analysis are discussed in Managing Error . It is also important to realize that propagation and cascading may affect horizontal, vertical, attribute, conceptual, and logical accuracy and precision. -------------------------------------------------------------------------------- 6. Beware of False Precision and False Accuracy! GIS users are not always aware of the difficult problems caused by error, inaccuracy, and imprecision. They often fall prey to False Precision and False Accuracy, that is they report their findings to a level of precision or accuracy that is impossible to achieve with their source materials. If locations on a GIS coverage are only measured within a hundred feet of their true position, it makes no sense to report predicted locations in a solution to a tenth of foot. That is, just because computers can store numeric figures down many decimal places does not mean that all those decimal places are "significant." It is important for GIS solutions to be reported honestly and only to the level of accuracy and precision they can support. This means in practice that GIS solutions are often best reported as ranges or ranking, or presented within statistical confidence intervals. These issues are addressed in the module, Managing Error . -------------------------------------------------------------------------------- 7. The Dangers of Undocumented Data Given these issues, it is easy to understand the dangers of using undocumented data in a GIS project. Unless the user has a clear idea of the accuracy and precision of a dataset, mixing this data into a GIS can be very risky. Data that you have prepared carefully may be disrupted by mistakes someone else made. This brings up three important issues. 7.1. Ask or look for a when you borrow or purchase data. Many major governmental and commercial data producers work to well-established standards of accuracy and precision that are available publicly in printed or digital form. These documents will tell you exactly how maps and datasets were compiled and such reports should be studied carefully. Data quality reports are usually provided with datasets obtained from local and state government agencies or from private suppliers. 7.2. Prepare a Data Quality Report for datasets you create. Your data will not be valuable to others unless you too prepare a data quality report. Even if you do not plan to share your data with others, you should prepare a report--just in case you use the dataset again in the future. If you do not document the dataset when you create it, you may end up wasting time later having to check it a second time. Use the data quality reports found above as models for documenting your dataset. 7.3. In the absence of a Data Quality Report, ask questions about undocumented data before you use it. What is the age of the data? Where did it come from? In what medium was it originally produced? What is the areal coverage of the data? To what map scale was the data digitized? What projection, coordinate system, and datum were used in maps? What was the density of observations used for its compilation? How accurate are positional and attribute features? Does the data seem logical and consistent? Do cartographic representations look "clean?" Is the data relevant to the project at hand? In what format is the data kept? How was the data checked? Why was the data compiled? What is the reliability of the provider? PRECISION VERSUS ACCURACY http://www.chem.tamu.edu/class/fyp/mathrev/mr-sigfg.html Managing Errors http://www.colorado.edu/geography/gcraft/notes/error/error_f.html The Problems of Error, Accuracy and Precision Managing error in GIS datasets is now recognized as a substantial problem that needs to be addressed in the design and use of such systems. Failure to control and manage error can limit severely or invalidate the results of a GIS analysis. Please see the module, Error, Accuracy, and Precision for an overview of the key issues. -------------------------------------------------------------------------------- 2. Setting Standards for Procedures and Products No matter what the project, standards should be set from the start. Standards should be established for both spatial and non-spatial data to be added to the dataset. Issues to be resolved include the accuracy and precision to be invoked as information is placed in the dataset, conventions for naming geographic features, criteria for classifying data, and so forth. Such standards should be set both for the procedures used to create the dataset and for the final products. Setting standards involves three steps. 2.1. Establishing Criteria that Meet the Specific Demands of a Project Standards are not arbitrary; they should suit the demands of accuracy, precision, and completeness determined to meet the demands of a project. The Federal and many state governments have established standards meet the needs of a wide range of mapping and GIS projects in their domain. Other users may follow these standards if they apply, but often the designer must carefully establish standards for particular projects. Picking arbitrarily high levels of precision, accuracy, and completeness simply adds time and expense. Picking standards that are too low means the project may not be able to reach its analytical goals once the database is compiled. Indeed, it is perhaps best to consider standards in the light of ultimate project goals. That is, how accurate, precise, and complete will a solution need to be? The designer can then work backward to establish standards for the collection and input of raw data. Sensitivity analysis (discussed below) applied to a prototype can also help to establish standards for a project. 2.2 Training People Involved to Meet Standards, Including Practice The people who will be compiling and entering data must learn how to apply the standards to their work. This includes practice with the standards so that they learn to apply them as a natural part of their work. People working on the project should be given a clear idea of why the standards are being employed. If standards are enforced as a set of laws or rules without explanation, they may be resisted or subverted. If the people working on a project know why the standards have been set, they are often more willing to follow them and to suggest procedures that will improve data quality. 2.3. Testing That the Standards Are Being Employed Throughout a Project and Are Reached by the Final Products Regular checks and tests should be employed through a project to make sure that standards are being followed. This may include the regular testing of all data added to the dataset or may involve spot checks of the materials. This allows to designer to pinpoint difficulties at an early stage and correct them. Examples of data standards: USGS, National Mapping Program Standards, http://nationalmap.gov/gio/standards/ Information on the Spatial Data Transfer Standard, http://mcmcweb.er.usgs.gov/sdts/ USGS Map Accuracy Standards, http://rockyweb.cr.usgs.gov/nmpstds/nmas.html -------------------------------------------------------------------------------- 3. Documenting Procedures and Products: Data Quality Reports Standards for procedures and products should always be documented in writing or in the dataset itself. Data documentation should include information about how data was collected and from what sources, how it was preprocessed and geocoded, how it was entered in the dataset, and how it is classified and encoded. On larger projects, one person or a team should be assigned responsibility for data documentation. Documentation is vitally important to the value and future use of a dataset. The saying is that an undocumented dataset is a worthless dataset. By in large, this is true. Without clear documentation a dataset can not be expanded and cannot be used by other people or organizations now or in the future. Documentation is of critical importance in large GIS projects because the dataset will almost certainly outlive the people who created it. That is, GIS for municipal, state, and AM/FM applications are usually designed to last 50-100 years. The staff who enters the data may have long retired when a question arises about the characteristics of their work. Written documentation is essential. Some projects actually place information about data quality and quality control directly in a GIS dataset as independent layers. An example of data quality reports is: Digital Elevation Model Standards, http://rockyweb.cr.usgs.gov/nmpstds/demstds.html -------------------------------------------------------------------------------- 4. Measuring and Testing Products GIS datasets should be checked regularly against reality. For spatial data, this involves checking maps and positions in the field or, at least, against sources of high quality. A sample of positions can be resurveyed to check their accuracy and precision. The USGS employs a testing procedure to check on the quality of its digital and paper maps, as does the Ordnance Survey. Indeed, the Ordnance Survey continues periodically to test maps and digital datasets long after they have first been compiled. If too many errors crop up, or if the mapped area has changed greatly, the work is updated and corrected. Non-spatial attribute data should also be checked either against reality or a source of equal or greater quality. The particular tests employed will, of course, vary with the type of data used and its level of measurement. Indeed, many different tests have been developed to test the quality of interval, ordinal, and nominal data. Both parametric and nonparametric statistical tests can be employed to compare true values (those observed "on the ground") and those recorded in the dataset. Cohen''s Kappa provides just one example of the types of test employed, this one for nominal data. The following example shows how data on land cover stored in a database can be tested against reality. See Attribute Accuracy and Calculating Cohen''s Kappa -------------------------------------------------------------------------------- 5. Calibrating a Dataset to Ascertain How Error Influences Solutions Solutions reached by GIS analysis should be checked or calibrated against reality. The best way to do this is check the results of a GIS analysis against the findings produced from completely independent calculations. If the two agree, then the user has some confidence that the data and modeling procedure is valid. This process of checking and calibrating a GIS is often referred to as Sensitivity Analysis. Sensitivity analysis allows the user to test how variations in data and modeling procedure influence a GIS solution. What the user does is vary the inputs of a GIS model, or the procedure itself, to see how each change alters the solution. In this way, the user can judge quite precision how data quality and error will influence subsequent modeling. This is quite straight forward with interval/ratio input data. The user tests to see how an incremental change in an input variable changes the output of the system. From this, the user can derive "marginal sensitivity" to an input and establish "marginal weights" to compensate for error. But sensitivity analysis can also be applied to nominal (categorical) and ordinal (ranked) input data. In these cases, data may be purposefully misclassified or misranked to see how such errors will change a solution. Sensitivity analysis can also be used during system design and development to test the levels of precision and accuracy required to meet system goals. That is, users can experiment with data of differing levels of precision and accuracy to see how they perform. If a test solution is not accurate or precise enough in one pass, the levels can be refined and tested again. Such testing of accuracy and precision is very important in large GIS projects that will generated large quantities of data. In is of little use (and tremendous cost) to gather and store data to levels of accuracy and precision beyond what is needed to reach a particular modeling need. Sensitivity can also be useful at the design stage in testing the theoretical parameters of a GIS model. It is sometimes the case that a factor, though of seemingly great theoretical importance to a solution, proves to be of little value in solving a particular problem. For example, soil type is certainly important in predicting crop yields but, if soil type varies little in a particular region, it is a waste of time entering into a dataset designed for this purpose. Users can check on such situations by selectively removing certain data layers from the modeling process. If they make no difference to the solutions, then no further data entry needs to be made. To see how sensitivity analysis might be applied to a problem concerned with upgrading a municipal water system, go to the following section on Sensitivity Analysis. In closing this example, it is useful to note that the results were reported in terms of ranking. No single solution was optimal in all cases. Picking a single, best solution might be misleading. Instead, the sites are simply ranked by the number of situations in which each comes out ahead. -------------------------------------------------------------------------------- 6. Report Results in Terms of the Uncertainties of the Data Too often GIS projects fall prey to the problem of False Precision , that is reporting results to a level of accuracy and precision unsupported by the intrinsic quality of the underlying data. Just because a system can store numeric solutions down to four, six, or eight decimal places, does not mean that all of these are significant. Common practice allows users to round down one decimal place below the level of measurement. Below one decimal place the remaining digits are meaningless. As examples of what this means, consider: Population figures are reported in whole numbers (5,421, 10,238, etc.) meaning that calculations can be carried down 1 decimal place (density of 21.5, mortality rate of 10.3). If forest coverage is measured to the closest 10 meters, then calculations can be rounded to the closest 1 meter. A second problem is False Certainty, that is reporting results with a degree of certitude unsupported by the natural variability of the underlying data. Most GIS solutions involve employing a wide range of data layers, each with its own natural dynamics and variability. Combining these layers can exacerbate the problem of arriving at a single, precision solution. Sensitivity analysis (discussed above) helps to indicate how much variations in one data layer will affect a solution. But GIS users should carry this lesson all the way to final solutions. These solutions are likely to be reported in terms of ranges, confidence intervals, or rankings. In some cases, this involves preparing high, low, and mid-range estimates of a solution based upon maximum, minimum, and average values of the data used in a calculation. You will notice that the case considered above pertaining an optimal site selection problem reported it''s results in terms of rankings. Each site was optimal in certain confined situations, but only a couple proved optimal in more than one situation. The results rank the number of times each site came out ahead in terms of total cost. In situations where statistical analysis is possible, the use of confidence intervals is recommended. Confidence intervals established the probability of solution falling within a certain range (i.e. a 95% probability that a solutions falls between 100m and 150m). National Map Accuracy Standards http://www.colorado.edu/geography/gcraft/notes/error/error_f.html United States National Map Accuracy Standards Defines accuracy standards for published maps, including horizontal and vertical accuracy, accuracy testing method, accuracy labeling on published maps, labeling when a map is an enlargement of another map, and basic information for map construction as to latitude and longitude boundaries. (1 p., 6KB, PDF) Error, Accuracy, and Precision: Full Table of Context http://www.colorado.edu/geography/gcraft/notes/error/error_f.html The Importance of Error, Accuracy, and Precision 2. Some Basic Definitions 3. Types of Error 3.1. Positional Accuracy and Precision 3.2. Attribute Accuracy and Precision 3.3. Conceptual Accuracy and Precision 3.4. Logical Accuracy and Precision 4. Sources of Inaccuracy and Imprecision 4.1. Obvious Sources of Error 4.1.1. Age of Data 4.1.2. Areal Cover 4.1.3. Map Scale 4.1.4. Density of Observations 4.1.5. Relevance 4.1.6. Format 4.1.7. Accessibility 4.1.8. Cost 4.2. Errors Resulting From Natural Variation or From Original Measurements 4.2.1. Positional Accuracy 4.2.2. Accuracy of Content 4.2.3. Sources of Variation in Data 4.3. Errors Arising Through Processing 4.3.1. Numerical Errors 4.3.2. Errors in Topological Analysis 4.3.3. Classification and Generalization Problems 4.3.4. Digitizing and Geocoding Errors 5. The Problems of Propagation and Cascading 5.1. Propagation 5.2. Cascading 6. Beware of False Precision and False Accuracy 7. The Dangers of Undocumented Data 7.1. Ask for a Data Quality Report 7.2. Prepare a Data Quality Report 7.3. Ask Questions abour Undocumented Data 8. Principles of Managing Error National Geospatial Program Standards http://www.colorado.edu/geography/gcraft/notes/error/error_f.html 1) Accuracy is the degree to which information on a map or in a digital database matches true or accepted values. Accuracy is an issue pertaining to the quality of data and the number of errors contained in a dataset or map. In discussing a GIS database, it is possible to consider horizontal and vertical accuracy with respect to geographic position, as well as attribute, conceptual, and logical accuracy. The level of accuracy required for particular applications varies greatly. Highly accurate data can be very difficult and costly to produce and compile. 2) Precision refers to the level of measurement and exactness of description in a GIS database. Precise locational data may measure position to a fraction of a unit. Precise attribute information may specify the characteristics of features in great detail. It is important to realize, however, that precise data--no matter how carefully measured--may be inaccurate. Surveyors may make mistakes or data may be entered into the database incorrectly. The level of precision required for particular applications varies greatly. Engineering projects such as road and utility construction require very precise information measured to the millimeter or tenth of an inch. Demographic analyses of marketing or electoral trends can often make do with less, say to the closest zip code or precinct boundary. Highly precise data can be very difficult and costly to collect. Carefully surveyed locations needed by utility companies to record the locations of pumps, wires, pipes and transformers cost $5-20 per point to collect. High precision does not indicate high accuracy nor does high accuracy imply high precision. But high accuracy and high precision are both expensive. Be aware also that GIS practitioners are not always consistent in their use of these terms. Sometimes the terms are used almost interchangeably and this should be guarded against. Two additional terms are used as well: Data quality refers to the relative accuracy and precision of a particular GIS database. These facts are often documented in data quality reports. Error encompasses both the imprecision of data and its inaccuracies. -------------------------------------------------------------------------------- 3. Types of Error Positional error is often of great concern in GIS, but error can actually effect many different characteristics of the information stored in a database. 3.1. Positional accuracy and precision. This applies to both horizontal and vertical positions. Accuracy and precision are a function of the scale at which a map (paper or digital) was created. The mapping standards employed by the United States Geological Survey specify that: "requirements for meeting horizontal accuracy as 90 per cent of all measurable points must be within 1/30th of an inch for maps at a scale of 1:20,000 or larger, and 1/50th of an inch for maps at scales smaller than 1:20,000." Accuracy Standards for Various Scale Maps 1:1,200 ¡¾ 3.33 feet 1:2,400 ¡¾ 6.67 feet 1:4,800 ¡¾ 13.33 feet 1:10,000 ¡¾ 27.78 feet 1:12,000 ¡¾ 33.33 feet 1:24,000 ¡¾ 40.00 feet 1:63,360 ¡¾ 105.60 feet 1:100,000 ¡¾ 166.67 feet This means that when we see a point on a map we have its "probable" location within a certain area. The same applies to lines. Beware of the dangers of false accuracy and false precision, that is reading locational information from map to levels of accuracy and precision beyond which they were created. This is a very great danger in computer systems that allow users to pan and zoom at will to an infinite number of scales. Accuracy and precision are tied to the original map scale and do not change even if the user zooms in and out. Zooming in and out can however mislead the user into believing--falsely--that the accuracy and precision have improved. 3.2. Attribute accuracy and precision The non-spatial data linked to location may also be inaccurate or imprecise. Inaccuracies may result from mistakes of many sorts. Non-spatial data can also vary greatly in precision. Precise attribute information describes phenomena in great detail. For example, a precise description of a person living at a particular address might include gender, age, income, occupation, level of education, and many other characteristics. An imprecise description might include just income, or just gender. 3.3. Conceptual accuracy and precision GIS depend upon the abstraction and classification of real-world phenomena. The users determines what amount of information is used and how it is classified into appropriate categories. Sometimes users may use inappropriate categories or misclassify information. For example, classifying cities by voting behavior would probably be an ineffective way to study fertility patterns. Failing to classify power lines by voltage would limit the effectiveness of a GIS designed to manage an electric utilities infrastructure. Even if the correct categories are employed, data may be misclassified. A study of drainage systems may involve classifying streams and rivers by "order," that is where a particular drainage channel fits within the overall tributary network. Individual channels may be misclassified if tributaries are miscounted. Yet some studies might not require such a precise categorization of stream order at all. All they may need is the location and names of all stream and rivers, regardless of order. 3.4 Logical accuracy and precision Information stored in a database can be employed illogically. For example, permission might be given to build a residential subdivision on a floodplain unless the user compares the proposed plat with floodplain maps. Then again, building may be possible on some portions of a floodplain but the user will not know unless variations in flood potential have also been recorded and are used in the comparison. The point is that information stored in a GIS database must be used and compared carefully if it is to yield useful results. GIS systems are typically unable to warn the user if inappropriate comparisons are being made or if data are being used incorrectly. Some rules for use can be incorporated in GIS designed as "expert systems," but developers still need to make sure that the rules employed match the characteristics of the real-world phenomena they are modeling. Finally, It would be a mistake to believe that highly accurate and highly precision information is needed for every GIS application. The need for accuracy and precision will vary radically depending on the type of information coded and the level of measurement needed for a particular application. The user must determine what will work. Excessive accuracy and precision is not only costly but can cause considerable details. Accuracy , Errors, and Standard http://www.colorado.edu/geography/gcraft/notes/error/error_f.html http://educationally.narod.ru/giserrphotoalbum.html PRECISION VERSUS ACCURACY http://www.chem.tamu.edu/class/fyp/mathrev/mr-sigfg.html Generalizing surficial geological maps for scale change: ArcGIS tools vs. cellular automata model http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V7D-4S0JN1D-1&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1052441407&_rerunOrigin=google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=db4d2d2559fbefcdfd04b504e84aaa3b Article Outline 1. Introduction 2. Methodology 2.1. Input data 2.2. ArcGIS procedure synthesis 2.3. The CA algorithm and its application for geological map generalization 2.4. Experimental work 2.4.1. Measures used for map comparison and results evaluation 2.4.2. Requirements development 2.4.3. Technical requirements 2.4.4. Experiment specifics 3. Results and discussion 3.1. The CA algorithm vs. ArcGIS procedures 3.2. Controlling generalization level 3.3. Calibrating generalized hydrographic network against a 1:250,000 topographic map 3.4. Generalizing large maps 4. Conclusions Acknowledgements Appendix A. ArcGIS functions and commands used in generalization procedure References Cartographic Generalization of Geo-Spatial Data Generalization has a long history in cartography as an art of creating maps for different scale and purpose. Cartographic generalization is the process of selecting and representing information of a map in a way that adapts to the scale of the display medium of the map. In this way, every map has, to some extent, been generalized to match the criteria of display. This includes small-scale maps, which cannot convey every detail of the real world. Cartographers must decide and then adjust the content within their maps to create a suitable and useful map that conveys geospatial information within their representation of the world. Generalization is meant to be context specific. This is to say that correctly generalized maps are those that emphasize the most important map elements while still representing the world in the most faithful and recognizable way. The level of detail and importance in what is remaining on the map must outweigh the insignificance of items that were generalized, as to preserve the distinguishing characteristics of what makes the map useful and important. Cartography or mapmaking (in Greek chartis = map and graphein = write) is the study and practice of making maps or globes. ... Geomatics is the discipline of gathering, storing, processing, and delivering of geographic information. ... LINE GENERALIZATION http://www.geog.ubc.ca/courses/klink/gis.notes/ncgia/u48.html 70 units og GIS - Compiled with assistance from David Cowen, University of South Carolina An overview of the Generalization tools http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=An_overview_of_the_Generalization_tools Cartographic generalization http://en.wikipedia.org/wiki/Cartographic_generalization Generalization has a long history in cartography as an art of creating maps for different scale and purpose. Cartographic generalization is the process of selecting and representing information of a map in a way that adapts to the scale of the display medium of the map. In this way, every map has, to some extent, been generalized to match the criteria of display. This includes small-scale maps, which cannot convey every detail of the real world. Cartographers must decide and then adjust the content within their maps to create a suitable and useful map that conveys geospatial information within their representation of the world. Generalization is meant to be context specific. This is to say that correctly generalized maps are those that emphasize the most important map elements while still representing the world in the most faithful and recognizable way. The level of detail and importance in what is remaining on the map must outweigh the insignificance of items that were generalized, as to preserve the distinguishing characteristics of what makes the map useful and important Cartographic generalization http://wapedia.mobi/en/Cartographic_generalization Scale and generalization, scale-related generalization Mental Maps: making the invisible, visible http://web.ics.purdue.edu/~smatei/MentalMaps/resource.html PROJECTS Visible Past Visible Past proposes a cross platform, scalable environment (Exploratorium) for collaborative social, geographic, and historical education and research. The Exploratorium will be deployed in a variety of settings, from Web to fully immersive virtual reality environments. Educational activities can be formal (classroom teaching) or informal (conducted in a museum or self–directed online learning setting). The specific goals of the Exploratorium concept are two–fold: 1) to create a set of tools for collecting, organizing, or disseminating knowledge in a collaborative manner at various scales and in various formats; and, 2) to extend and refine a theoretical framework and methodological tools for prototyping and testing future research and learning applications and architectures that benefit from mental mapping and 3D and location aware applications. The heart of the Visible Past Exploratorium concept, the Exploratorium, is an information space built on top of a georeferenced wiki database that can be accessed through a variety of avenues: full immersion 3D environments, Web interfaces, or Geographic Exploration Systems (GES), such as Google Earth or NASA¡¯s World Wind. Mental Maps http://geography.about.com/cs/culturalgeography/a/mentalmaps.htm A person''s perception of the world is known as a mental map. A mental map is an individual''s own internal map of their known world. Geographers like to learn about the mental maps of individuals and how they order the space around them. This can be investigated by asking for directions to a landmark or other location, by asking someone to draw a sketch map of an area or describe that area, or by asking a person to name as many places (i.e. states) as possible in a short period of time. ng Do Maps Create or Represent Reality? Why Place Name Geography is Important and a Call to Action Cultural Geography Related Articles The Global Seismic Hazard Map - World Map of Earthquake Hazard Best of the Net Awards - Geography - 10/05/98 Europe and US Country Size Comparison Map - How Big is Europe Compared to t... Utah menu - places, parks, state information, maps, images New Mexico menu - places, parks, state information, maps, images Mental Mapping: Psychological Space and Distance http://www.ncgia.ucsb.edu/cctp/units/geog_for_GIS/GC_20_11.html The World in Spatial Terms http://www.ncgia.ucsb.edu/education/curricula/giscc/units/u005/u005.html Human Cognition of the Spatial World http://www.ncgia.ucsb.edu/education/curricula/giscc/units/u006/u006_f.html Asking Geographic Questions http://www.ncgia.ucsb.edu/education/curricula/giscc/units/u007/u007_f.html Unit 130 - Process Modeling and Simulations http://www.ncgia.ucsb.edu/giscc/units/u130/u130.html Cartographic Representations http://www.esri.com/technology_trends/cartography/representations.html errors from cartographic data source problems with remotely sensed imagery continuous data fuzzy boundaries map scale map measurements errors in data encoding psychological errors (displacement, undershoot, polygonal knot or ''switch-back'', spike, overshoot) registration error physiological errorsline thickness method of digitizing errors in data ending and conversion topological errors in vector GIS (overshoot remains, unclosed gap, small polygon removed, gap closed Vector to raster classification error rasterization error effects of grid orientation effects of grid orientation origin and datum errors in data processing and analysis errors in data editing and conversion errors in data output Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6VF4-4B7N7T9-1&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1052461132&_rerunOrigin=scholar.google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=6b72ceca694290434baca08203728bfc Remote sensing from airborne and spaceborne platforms provides valuable data for mapping, environmental monitoring, disaster management and civil and military intelligence. However, to explore the full value of these data, the appropriate information has to be extracted and presented in standard format to import it into geo-information systems and thus allow efficient decision processes. The object-oriented approach can contribute to powerful automatic and semi-automatic analysis for most remote sensing applications. Synergetic use to pixel-based or statistical signal processing methods explores the rich information contents. Here, we explain principal strategies of object-oriented analysis, discuss how the combination with fuzzy methods allows implementing expert knowledge and describe a representative example for the proposed workflow from remote sensing imagery to GIS. The strategies are demonstrated using the first object-oriented image analysis software on the market, eCognition, which provides an appropriate link between remote sensing imagery and GIS. Author Keywords: object-oriented image analysis; remote sensing; multi-resolution segmentation; fuzzy classification; GIS Fuzzy representation of geographical boundaries in GIS http://www.ingentaconnect.com/content/tandf/tgisold/1996/00000010/00000005/art00005 Polygon boundaries on thematic maps are conventionally considered to be sharp lines representing abrupt changes of phenomena. However, in reality changes of environmental phenomena may also be partial or gradual. Indiscriminate use of sharp lines to represent different types of change creates a problem of boundary inaccuracy. Specifically, in the context of vector-based GIS, use of sharp lines to represent gradual or partial changes may cause misunderstanding of geographical information and reduce analysis accuracy. In this paper, the expressive inadequacy of the conventional vector boundary representation is examined. A more informative technique the fuzzy representation of geographical boundaries is proposed, in which boundaries describe not only the location but also the rate of change of environmental phenomena. Four methods of determining fuzzy boundary membership grades from different kinds of geographical data are described. An example of applying the fuzzy boundary technique to data analysis is presented and the advantages of the technique are discussed. Conflicts over shared rivers: Resource scarcity or fuzzy boundaries? http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6VG2-4JXRX1K-1&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1052461691&_rerunOrigin=scholar.google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=d811daa3897a4be213f89b2aaedcba0f Countries that share rivers have a higher risk of military disputes, even when controlling for a range of standard variables from studies of interstate conflict. A study incorporating the length of the land boundary showed that the shared river variable is not just a proxy for a higher degree of interaction opportunity. A weakness of earlier work is that the existing shared rivers data do not distinguish properly between dyads where the rivers run mainly across the boundary and dyads where the shared river runs along the boundary. Dyads with rivers running across the boundary would be expected to give rise to resource scarcity-related conflict, while in dyads where the river forms the boundary conflict may arise because river boundaries are fluid and fuzzy. Using a new dataset on shared water basins and two measures of water scarcity, we test for the relevance of these two scenarios. Shared basins do predict an increased propensity for conflict in a multivariate analysis. However, we find little support for the fuzzy boundary scenario. Support for a scarcity theory of water conflict is somewhat ambiguous. Neither the number of river crossings nor the share of the basin upstream is significant. Dry countries have more conflict, but less so when the basin is large. Drought has no influence. The size of the basin, however, is significantly associated with conflict. Modernization theory receives some support in that development interacted with basin size predicts less conflict, and we find some evidence here for an environmental Kuznets curve. The importance of basin size suggests a possible ¡®resource curse¡¯ effect for water resources. Keywords: River basin; Resource scarcity; Boundaries; Conflict Computer modeling in the water industry http://proceedings.esri.com/library/userconf/proc99/proceed/papers/pap615/p615.htm HurricaneMapping''s Shapefile Manager - Users Guide http://www.hurricanemapping.com/support/shapefilemanager/UsersGuide.cfm Shapefile Manager is a lightweight desktop utility designed to help you easily and quickly download and display storm-related ESRI shapefiles and layers created by the HurricaneMapping.com site. It provides a visual interface showing plots of active and archived storms and allows you to auto-download selected active storm shapefiles passively, on a schedule of your choosing. If you wish to download archived storm shapefiles, the visual display shows you the correct advisory for your needs so you only download the necessary information. Creating and editing GIS Databases http://educationally.narod.ru/gismngphotoalbum.html Editing GIS Tutorial: errors in data editing and conversion http://webhelp.esri.com/arcgisdesktop/9.3/pdf/Editing_Tutorial.pdf Data Editing Techniques http://proceedings.esri.com/library/userconf/proc99/proceed/papers/pap142/p142.htm Diagnosing data conversion errors http://publib.boulder.ibm.com/infocenter/cicsts/v3r1/index.jsp?topic=/com.ibm.cics.ts31.doc/dfhws/tasks/dfhws_conversionerrors.htm Topological Error http://en.mimi.hu/gis/topological_error.html the faithfulness of the data structure for a data set. This typically involves spatial data inconsistencies such as incorrect line intersections, duplicate lines or boundaries, or gaps in lines. These are referred to as spatial or topological errors The Nature of Geographic Information http://www.innovativegis.com/basis/primer/nature.html Maps and Spatial Information The main method of identifying and representing the location of geographic features on the landscape is a map. A map is a graphic representation of where features are, explicitly and relative to one another. A map is composed of different geographic features represented as either points, lines, and/or areas. Each feature is defined both by its location in space (with reference to a coordinate system), and by its characteristics (typically referred to as attributes). Quite simply, a map is a model of the real world. The map legend is the key linking the attributes to the geographic features. Attributes, e.g. such as the species for a forest stand, are typically represented graphically by use of different symbology and/or color. For GIS, attributes need to be coded in a form in which they can be used for data analysis (Burrough, 1986). This implies loading the attribute data into a database system and linking it to the graphic features. Accuracy and Quality http://www.innovativegis.com/basis/primer/nature.html#Data%20Accuracy%20and%20Quality Rasterization Errors http://www.gsd.harvard.edu/gis/manual/raster/index.htm Errors http://rostowskaja.narod.ru/giserrorphotoalbum.html http://educationally.narod.ru/giserror1photoalbum.html Effects og grid orientation-Vector to Raster conversion https://classshares.student.usp.ac.fj/GS301/lecture%20notes%20pdf/14%20Raster%20vs%20Vector.pdf Choosing data model and GIS packages Some GIS packages are primarily vector or raster • Raster: GRASS, IDRISI, MOSS • Vector: Intergraph, MapInfo (old versions) • Integrated: ArcGIS Some allow you to convert between types with ease Often convenient to use one model primarily, but convert to the other for certain operations Raster analysis is generally faster than vector analysis Vector analysis provides more accurate and spatially correct results To implement tools available in vector analysis for raster data and vice verse one should convert these data from one model to another Methods and techniques for conversion from vector to raster data is based on fundamental features of corresponding models Raster • Resolution determined by pixel size • Efficiently represents dense or continuous data • – e.g. elevation Vector • Resolution determined by precision of coordinates • Efficiently represents sparse or object-like data • – e.g. buildings Many GIS have tools for automatically converting between raster and vector models Vectorization = converting raster to vector (R2V) Rasterization = converting vector to raster (V2R) Normally, some information/data are lost in the conversion process; consequently, converted data are less accurate than original data Data conversion vs Model conversion Raster to Vector approaches: • on-screen digitizing • Raster data to vector model • vectorization • Raster data to vector data • Vector data to vector model Vector to Raster approaches: • rasterization Conversion of vector data to raster data: (a) coded polygons; (b) a grid with the appropriate cell size overlaid on top of the polygons (dots represent the center of each grid cell; (c) each cell is assigned the attribute code of the polygon to which it belongs. Data and model conversion in rasterization • Points are converted to single cells. • Lines are converted to groups of cells oriented in a linear arrangement. • Polygons are converted to zones. To convert data from vector format to raster format the following parameters should be set by the user • grid cell size • position • grid orientation The rasterization can be reduced to the process of resolving which attribute of the vector feature should be labelled in output grid cells The whole process is mostly automated Rasterization steps: • Set up a raster with a specified cell size to cover the area extend of vector data • Assign all cell values as zeros • Change the values of those cells that correspond to points, lines or polygon boundaries • Fill the interiors of polygons by re-assigning corresponding cells values Vectorization steps • Line thinning • Line extraction • Topological reconstruction Conversion of raster data to vector: (a)each raster cell is assigned an attribute value; (b)boundaries are set up between different attribute classes; (c) a polygon is created by storing x and y coordinates for the points adjacent to the boundaries. Example of errors caused by conversion between raster and vector data models: the original river after raster-to-vector conversion appears to connect the loop back In some GIS packages raster to vector conversion grid layers can only be converted to polyline vector layers Some GIS tool allows for preliminary raster separation (based on pixel colour) in order to preserve topology inherent in the source data model Rasterization errors Rasterization generally involves a loss of precision The precision loss is retained if data are re-converted to vector Vector to raster conversion can cause a number of errors in the resulting data, such as: • topological errors • loss of small polygons • effects of grid orientation • variations in grid origin and datum etc. Rasterization generally involves a loss of precision The precision loss is retained if data are re-converted to vector Further problems with converting vector maps to a raster structure include: • creation of stair-stepped boundaries • small shifts in the position of objects • deletion of small features Raster to Vector: challenges Conversion from raster to vector varies in difficulty with the type of data • points and polygons are relatively easy (especially for classified remote sensing data) • lines are relatively difficult ARCScan is a extension that helps to create vector maps from scanned paper maps Difficult to program and error-prone process Raster to Vector: pro and contra Advantages • Could be very fast and cost effective • Relatively inexpensive • Provides a very accurate representation of the analog map Disadvantages • The analog map needs to be in a pristine condition with minimum extra features and annotation • False recognition of different features and text • Editing can be very labor intensive Further problems with converting raster maps to a vector structure include: • Potentially massive data volumes • Difficulties in line generalization • Topological confusion ArcGIS Spatial Analyst and ArcScan ArGIS tools for Raster-Vector conversion • Raster to polygon conversion • Contour Generation • Surface Interpolation from point data Key Terms Raster-to-vector conversion (R2V) Vector-to-Raster conversion (V2R) Rasterization Vectorization On-screen digitizing Conversion errors NASA - Data Processing and Analysis Errors in data procesing and analysis http://heasarc.nasa.gov/docs/xmm/abc/node9.html http://heasarc.nasa.gov/docs/xmm/abc/node8.html http://heasarc.nasa.gov/docs/xmm/abc/node7.html http://heasarc.nasa.gov/docs/xmm/abc/node6.html http://heasarc.nasa.gov/docs/xmm/abc/node5.html http://heasarc.nasa.gov/docs/xmm/abc/node4.html http://heasarc.nasa.gov/docs/xmm/abc/node3.html http://heasarc.nasa.gov/docs/xmm/abc/node2.html http://heasarc.nasa.gov/docs/xmm/abc/node1.html 7.1 A Quick Look at What You Have 7.2 Rerunning the pipeline 7.3 Potentially useful tips for using the pipeline 7.3.1 A Nearby Bright Optical Source 7.3.2 A Nearby Bright X-ray Source 7.3.3 User-defined Source Coordinates 7.4 Examine and Filter the Data 7.4.1 An Introduction to the SAS GUI and xmmselect 7.4.2 Create and Display an Image 7.4.3 Create and Display a Light Curve 7.4.4 Generating the Good Time Interval (GTI) File 7.4.5 Applying the GTI 7.4.6 Creating the Response Matrices (RMFs) 7.5 Fitting a Spectral Model 7.5.1 Combining RGS1 and RGS2 Spectra 7.6 Approaches to Spectral Fitting 7.6.1 Spectral Rebinning 7.6.2 Maximum-Likelihood Statistics 7.7 Analysis of Extended Sources 7.7.1 Region masks 7.7.2 Fitting spectral models to extended sources 7.7.3 Model limitations 7.8 In A Nutshell Thinking about GIS: geographic information system planning for managers http://books.google.com/books?id=X8XgSAJrJVUC&pg=PA39&lpg=PA39&dq=errors+in+data+output+gis&source=bl&ots=8dzOzBbIqi&sig=dJwea5dilxUravv2SuLrBB6Vclw&hl=en&ei=ddrfSsXnNIbQtgPVnPToCA&sa=X&oi=book_result&ct=result&resnum=1&ved=0CA4Q6AEwAA#v=onepage&q=errors%20in%20data%20output%20gis&f=false Errors in Data output: Sources of Data http://www.innovativegis.com/basis/primer/sources.html Finding and modelling errors: http://mailer.fsu.edu/~xyang/documents/EnvironmentalGIS.pdf http://www.encyclopedia.com/doc/1G1-63691858.html The study of map error and map error propagation raises a distinct set of problems that go beyond traditional error analysis (Taylor 1982). Map data consist of attributes recorded at locations and, with the exception of lines of discontinuity such as shorelines and urban/rural boundaries, attribute values at adjacent locations are often similar (spatially correlated) because of the continuity of ground truth. The error processes that can contaminate map data also raise new problems. Attribute measurement error may not be independent between adjacent Checking for errors: Using GIS to check co-ordinates of genebank accessions http://www.diva-gis.org/docs/clean.pdf error modeling: Error propagation in environmental modelling with GIS http://books.google.com/books?hl=en&lr=&id=C_XWjSsboeUC&oi=fnd&pg=PP9&dq=error+modeling+gis&ots=QSyKrybgB3&sig=D0oB3nkHoYLsi9vzjAePTvQfhIk#v=onepage&q=&f=false error modeling http://books.google.com/books?hl=en&lr=&id=HL2O6J-XtLAC&oi=fnd&pg=PA3&dq=error+modeling+gis&ots=cQhC5fcA6V&sig=mYN3AyvcBG4nLQMZREycwfU4f2Q#v=onepage&q=error%20modeling%20gis&f=false error modeling in categorical data http://www.geog.ucsb.edu/~good/papers/168.pdf Improved Modeling of Elevation Error with Geostatistics http://www.springerlink.com/content/hp9g041xt82688j3/ The elevations recorded within digital models are known to be fraught with errors of sampling, measurement and interpolation. Reporting of these errors according to spatial data standards makes several implicit and unacceptable assumptions about the error: it has no spatial distribution, and it is statistically stationary across a region, or even a nation. The approach explored in this paper employs actual elevations measured in ground and aerial survey at higher precision than the elevations in the DEM and recorded on standard paper maps. These high precision elevations are digitized and used to establish the real statistical and spatial distribution of the error. Direct measurements could also have been taken in the field by GPS or any other means of high precision data collection. These high precision elevations are subtracted from values stored in the DEM for approximately the same locations. The distribution of errors specific to the DEM can then be explored, and can be used in the geostatistical method of conditional stochastic simulation to derive alternative realizations of the error modeled and so of the DEM. Multiple versions of the derived products can also be determined. This paper compares the results of using different methods of error modeling. The best method, which gives widely implementable and defensible results, is that based on conditional stochastic simulation. SRTM resample with short distance-low nugget kriging http://portal.acm.org/citation.cfm?id=1451803 The shuttle radar topography mission (SRTM), was flow on the space shuttle Endeavour in February 2000, with the objective of acquiring a digital elevation model of all land between 60¡Æ north latitude and 56¡Æ south latitude, using interferometric synthetic aperture radar (InSAR) techniques. The SRTM data are distributed at horizontal resolution of 1 arc-second (¡30 m) for areas within the USA and at 3 arc-second (¡90 m) resolution for the rest of the world. A resolution of 90 m can be considered suitable for the small or medium-scale analysis, but it is too coarse for more detailed purposes. One alternative is to interpolate the SRTM data at a finer resolution; it will not increase the level of detail of the original digital elevation model (DEM), but it will lead to a surface where there is the coherence of angular properties (i.e. slope, aspect) between neighbouring pixels, which is an important characteristic when dealing with terrain analysis. This work intents to show how the proper adjustment of variogram and kriging parameters, namely the nugget effect and the maximum distance within which values are used in interpolation, can be set to achieve quality results on resampling SRTM data from 3¡± to 1¡±. We present for a test area in western USA, which includes different adjustment schemes (changes in nugget effect value and in the interpolation radius) and comparisons with the original 1¡± model of the area, with the national elevation dataset (NED) DEMs, and with other interpolation methods (splines and inverse distance weighted (IDW)). The basic concepts for using kriging to resample terrain data are: (i) working only with the immediate neighbourhood of the predicted point, due to the high spatial correlation of the topographic surface and omnidirectional behaviour of variogram in short distances; (ii) adding a very small random variation to the coordinates of the points prior to interpolation, to avoid punctual artifacts generated by predicted points with the same location than original data points and; (iii) using a small value of nugget effect, to avoid smoothing that can obliterate terrain features. Drainages derived from the surfaces interpolated by kriging and by splines have a good agreement with streams derived from the 1¡± NED, with correct identification of watersheds, even though a few differences occur in the positions of some rivers in flat areas. Although the 1¡± surfaces resampled by kriging and splines are very similar, we consider the results produced by kriging as superior, since the spline-interpolated surface still presented some noise and linear artifacts, which were removed by kriging DEM error:Error propagation analysis of DEM-based drainage basin delineation http://www.informaworld.com/smpp/content~db=all~content=a723689154# Managing Errors Description of source data TRANSFORMATION DOCUMENTATION IMPUT/OUTPUT SPECIFICATION APPLICATION-DEPENDENTINFORMATION GIS analysis-based drainage basin delineation has become an attractive alternative to traditional manual delineation methods since the availability and accuracy of Digital Elevation Models (DEMs) and topographic databases has been improved. To investigate the uncertainty in the automatic delineation process, the present study represents a process-convolution-based Monte Carlo simulation tool that offers a powerful framework for investigating DEM error propagation with thousands of GIS-analysis repetitions. Monte Carlo-based probable drainage basin delineations and manual delineations performed by five experts in hydrology or physical geography were also compared. The results showed that automatic drainage basin delineation is very sensitive to DEM uncertainty. The model of this uncertainty can be used to find out the lower bound for the size of drainage basins that can be delineated with sufficient accuracy. error detection management accountability external accountabolity quality reporting source data error and effects of generalization http://www.isprs.org/commission4/proceedings02/pdfpapers/465.pdf geology population accessibility conservation jack-knfing Monte Carlo simulation choice of criteria bootstrapping or ''live one out'' analysis analitycal errors conceptual errors Generalization, Uncertainty, and error modelling HHHHH http://www.geog.ucsb.edu/~good/papers/257.pdf Terminology, types and sources Importance Handling error and uncertainty GIGO: garbage in, garbage out Because it’s in the computer, don’t mean it’s right Accept there will always be errors in GIS GIS - great tool for spatial data analysis and display question: what about error? data quality, error and uncertainty error propagation confidence in GIS outputs be careful, be aware, be upfront various (often confused terms) in use: error uncertainty accuracy precision data quality Error wrong or mistaken degree of inaccuracy in a calculation e.g. 2% error Uncertainty lack of knowledge about level of error unreliable Accuracy extent of system-wide bias in measurement process Precision level of exactness associated with measurement degree of excellence in data general term for how good the data is takes all other definitions into account error uncertainty precision accuracy based on the following elements: positional accuracy attribute accuracy logical consistency data completeness spatial: deviance from true position (horizontal or vertical) general rule: be within the best possible data resolution i.e: for scale of 1:50,000, error can be no more than 25m can be measured in root mean square error (RMS) - measure of the average distance between the true and estimated location temporal: difference from actual time and/or date classification and measurement accuracy a feature is what the GIS thinks it to be i.e. a railroad is a railroad and not a road i.e. a soil sample agrees with the type mapped rated in terms of % correct in a database, forest types are grouped and placed within a boundary in reality - no solid boundary where only pine trees grow on one side and spruce on the other presence of contradictory relationships in the database non-spatial crimes recorded at place of occurrence, others at place where report taken data for one country is for 2000, another for 2001 data uses different source or estimation technique for different years reliability concept are all instances of a feature the GIS claims to include, in fact, there? partially a function of the criteria for including features when does a road become a track? simply put, how much data is missing? sources of error: data collection and input human processing actual changes data manipulation data output inherent instability of phenomena itself random variation of most phenomena (i.e. leaf size) edges may not be sharp boundaries (i.e. forest edges) description of source data data source name, date of collection, method of collection, date of last modification, producer, reference, scale, projection inclusion of metadata instrument inaccuracies: satellite/air photo/GPS/spatial surveying e.g. resolution and/or accuracy of digitizing equipment thinnest visible line: 0.1 - 0.2 mm at scale of 1:20,000 - 6.5 - 12.8 feet anything smaller, not able to capture attribute measuring instruments model used to represent data e.g. choice of datum, classification system data encoding and entry e.g. keying or digitizing errors Attribute uncertainty uncertainty regarding characteristics (descriptors, attributes, etc.) of geographical entities types: imprecise or vague, mixed up, plain wrong sources: source document, misinterpretation, database error misinterpretation (i.e. photos), spatial and attribute effects of classification (nominal/ordinal/ interval) effects of scale change and generalization generalization - simplification of reality by cartographer to meet restrictions of map scale and physical size, effective communication and message source data error effects of generalization gis DATA QUALITY AND ERROR Error detection through consistency checking http://www.cnr.berkeley.edu/~gong/PDFpapers/GongMulanError.pdf Abstract Following a brief discussion on various aspects of data quality, possible methods are examined for the detection of errors in a spatial database. Using examples, we introduce the consistency checking method based on spatial relationships among neighboring objects and attribute relationships among map layers from different sources. Using logical relationships among spatial neighborhoods and among attribute data from different sources, it is desirable to build an error detection mechanism in a spatial database. This mechanism can be automated and has the potential to be one of the powerful tools for error detection and correction suggestion in a spatial database. Introduction Data quality can be assessed through data accuracy (or error), precision, uncertainty, compatibility, consistency, completeness, accessibility, and timeliness as recorded in the lineage data (Chen and Gong, 1998). Spatial error refers to the difference between the true value and the recorded value of non-spatial and non-temporal data in a database. Attribute error is more complicated than other types of spatial errors. It is related to scale of measurements. At one scale of measurement, the difference may be regarded as error while not at another scale. For example, an elevation of 497 m recorded in the database with its true value being 492 m will be considered erroneous at the ratio and interval scales but accurate in a general category such as an elevation class between 450 and 500 m which is at the nominal scale. However, sometimes the true value is not known, error can not be evaluated. Under such circumstances, uncertainty is used. Statistically, we use the average from multiple measurements to estimate the true value and the standard deviation of the multiple values as an indicator of the level of uncertainty. Therefore, in order to know the uncertainty of a value, multiple measurements are necessary. For example, a coastal line – the boundary between ocean and land, is uncertain as it changes constantly with time due to such factors as tides and ocean waves. There are more causes of data uncertainty than that the truth is not measurable or there is no truth at all. The conceptual fuzziness of an attribute or a category, which represents the level of data generalization, could also cause data uncertainty. For example, one can not tell what is the true density of a polygon in a database when its category is “high density residential.” Similarly, one can not tell exactly which tree species are contained in a class of “evergreen broadleaf forest” due to its high level of abstraction. Attribute error has been studied for many years. Particularly in remote sensing image classification, a relatively complete procedure exists for classification error analysis (Jensen, 1996). Chen and Gong (1998) divided the classification error analysis into 5 steps: (1) determine the sampling method for ground truth data collection; Methods include systematic sampling, random sampling, stratified sampling and systematic unaligned sampling, etc. (2) determine the sample size; (3) determine the attribute of sample location; this is usually done by field survey or the use of more accurate data sources such as aerial photo interpretation. (4) compare sample data with classification data and establish the contingency matrix; (5) calculate various types of errors from the contingency matrix. This can be applied to the determination of any type of attribute error at the nominal measurement scale. Precision refers to the closeness of measurements obtained from the same object using the same measurement method. It is related to the level of details contained in the measurement. It can be assessed by the standard deviation of a number of measurements made from the same object (Gong et al., 1995). Compatibility refers to how easy it is when data collected for other purposes can be used in a particular application. It also refers to how easy it is when data from different sources or collected from different locations are used for the same application. Generally, more specific data have better compatibility than more general data because more specific data can be generalized to general data but not vice versa. For example, there may be two forest maps for two neighboring regions but prepared with different methods, or using different data sources. If the content of one map can be made comparable with the other, then the two maps are compatible. If only the classification system needs to be adjusted to make one map to be compatible with the other, we call that one map can be “cross-walked” to the other. Consistency refers to the level of agreement when a certain phenomenon is represented in the database. For example, if the same river looks different on two types of maps, the level of consistency between the two maps is poor. If the same terrain feature from two map layers are represented by different number of contour lines and/or different levels of smoothness of the contours, the consistency between the two maps is poor. If one map is made from data collected at one time and a second map for the neighboring region is made from data collected at a different time, then the two maps may be temporally inconsistent. There are primarily four types of errors in a GIS database: positional, temporal, attribute, and logical. Logical error refers to the inconsistency of relationship among different features presented in a database. It is usually manifested through other types of errors. Thus, logical relationships of mapped features can be checked for error detection. Positional error has been widely investigated for its determination (Gong et al, 1995; Stanislawski et al., 1996; Kiiveri, 1997; Veregin, 2000), modeling (Zheng and Gong, 1997; Shi and Liu, 2000). Essentially, positional error is the error contained in the coordinate values of points, lines and volumes. Thus, it is one type of numeric errors. Numeric error is relatively a simple type of spatial data error. Currently, few GIS systems are truly incorporating the temporal axis as an index that supports explicit query in time. When time is not explicitly used as an index like geographical coordinates, it is treated as an attribute just as elevation is treated in a 2D GIS. Thus, in most existing GISs time and elevation are treated as attributes. Error propagation and uncertainty detection has attracted research attention for the past decade. The following table lists some of the research papers done in this field. Most of the papers deal with single variable, and among them, a lot mentioned modelling the spatial autocorrelation to estimate data uncertainty. Consistency check between variables from different sources has been introduced (Scott, 1994) which is the emphasis of this paper. Paper Error Type Problem and solution Major Approach Ehlschlaeger, 1996 positional inconsistency Single variable elevation Visualization approach to view the elevation surface change by applying a nonlinear interpolation model to develop animations. Visualizati on Griffith, et al. 1994 logical inconsistency Single variable The standard error difference between area mean and population mean caused bias in estimating population mean. Using census tract data at Syracuse, New York, added the underlying spatial autocorrelation in estimating the standard error to get population mean Modelling Heuvelink, 1995 logical inconsistency Single variable Error propagation from different spatial variation model fittings Compared Discrete Model of Spatial Variation (DMSV), Continuous (CMSV), and Mixed (MMSV) models with Netherlands high groundwater level data, and suggested adopting MMSV when undetermined. Modelling Heuvelink, 1998 Single variable Discussed the errors of many models used in soil science coming from not only the input and also the model itself. It discussed the error propagation process in data interpolation and aggregation as well. Simulation Hunter, Goodchild, 1997 logical inconsistency Single variable Slope and aspect uncertainties from realized models Added spatial autoregressive random field as a disturbance to elevation, and propose a worst-case scenario by choosing a rho value within the domain of 0 and 0.25. “Uncertainty” includes “error” Modelling Kiiveri, 1995 Positional inconsistency, polygon Single variable Took a look at the inconsistency though the polygon boundary change, length, perimeters and areas calculations after the overlay operation Calculation , Simulation Mowrer, 1996 Positional inconsistency Single variable Applied Monte Carlo technique of sequential Gaussian simulation to estimate old-growth subalpine forests. Suggest using the technique with the technology of GPS and GIS to improve decision making Simulation Phillips, 1995 attribute inconsistency Multiple variables Use of simulation modelling to measure uncertainties. Model potential evapotranspiration as a function of temperature, humidity, and wind. Simulation Phillips, 1999 Theoretical discussion of a major challenge in physical geography: the detection of the signals of complex deterministic dynamics in real landscapes and data. It introduced the nonlinear dynamical system (NDA) theory, reviewed most recent literatures, and compared approaches relevant to deterministic uncertainty. Scott, 1994 logical inconsistency Multiple variables Introduced Exploratory Data Analysis (EDA) tool to help quality assessment and data integrity in GIS by using statistical techniques. It suggested four components in this issue (p384): (1)distribution checks of both categorical and ratio data; (2)logical consistency checks of the relationships between attribute data values and between attribute classes; (3) proximity checks of the spatial distribution of data attributes; and (4) plot and map reviews of the spatial distribution of geographical features and their associated attributes Modeling, Calculation Shi, 1999 Positional inconsistency, line Developed G-band model to handle positional error of line segments. With end points normal distribution assumption, the model applied stochastic process to discuss the uncertainties of end points as well as points on the segments. Modelling Stanislawski, 1996 Positional inconsistency single variable digitized points Estimated positional accuracy by dividing errors into absolute error and relative error, while the absolute one represents horizontal cartographic data accuracy, and the relative one represents variability in spatial relationships. Modelling In the rest of the paper, we use example to discuss how errors can be detected through consistency checking in spatial databases. Discussions will be made with a suggestion on the development of an spatial data inconsistency checking mechanism in spatial databases. Error detection through consistency checking We can divide inconsistency into spatial inconsistency, temporal inconsistency, attribute inconsistency and inconsistency among any combination of space, time and attribute. Spatial inconsistency is a process that cartographers must deal with on an operational basis. Map generalization is a major task of cartographers. It includes spatial displacement (a process of spatial error introduction), spatial simplification through selection, aggregation and smoothing, and attribute abstraction through classification. This process itself introduces a huge amount of error particularly on small scale maps. Traditional spatial analysis based on maps is restricted by map scale, as maps from different scales can not be overlaid with each other for multilayer (variable) analysis. In a GIS system, maps of the same spatial location can be enlarged or reduced to map with each other regardless their original scales. Spatial inconsistency could occur under this circumstance. 1. An example of spatial inconsistency Figure 1 shows a reservoir and a highway overlap with each other as a result of overlaying a drainage map with a transportation map. The highway extends at the same side of the reservoir and the stream with no reason for it to run over it. Provided that the maps have the same scale and their projections and other factors that control the geometrical properties of the two maps are consistent, it is most likely that the error is caused by some displacement of the reservoir as highways are usually surveyed with high precision. This situation could change if we over a large scale drainage map with a small scale road map. Under such circumstances, the error is most likely due to the generalization of the roads on the road map. Therefore, the ways of correcting the error, or at least removing the inconsistency if we believe the error is not correctable, vary with the actual situation. This requires the knowledge of the scale and accuracy report of each map if there exists any. The relevance of metadata of spatial databases is obvious here. 2. Logical error detection of individual objects Certain objects in a map database have logical relationships with other objects. For example, a parking lot should exit to a road. If a parking lot is by itself without any entrance or exit, then there is a logical error. Consider a bridge, it could either be across a stream, river or another road and its two sides should be connected to roads. These are the knowledge that can be coded to automatically check if there is any logical errors associated with each bridge. Such logical error detection associated with bridges is HWY Spatial inconsistency detected between a reservoir and a highway One way of correcting the error Reservoir Reservoir HWY particularly useful in detecting errors of other attributes that are connected to bridges. Figure 2 illustrates the situation for a parking lot and a bridge. This method can be applied to any type of object whose relationship with other objects can be logically expressed 3. Attribute error identification through logical consistency checking among different map layers In a spatial database, data are often organized in different map layers. Each map layer may be obtained from different sources. Attribute error on one map layer may not be detected without being compared with attribute data from other map layers. For example, a forest fire history map contains the distribution of burnt areas with an attribute of time of fire occurrence (e.g., Figure 3a). Are there any mistakes in the fire history records? Some such errors may be detected when the fire history map is overlaid onto an up-to-date forest cover map (e.g., Figure 3b). Fire history records can be checked according to the current stage of forest restoration. In the example as shown in Figure 3, the The nature of map http://www.fes.uwaterloo.ca/crs/geog165/maps.htm NCGIA http://www.ncgia.ucsb.edu/Publications/Closing_Reports/CR-1.pdf Database development: garbage in, garbage out http://www.geoplace.com/gw/2000/1000/1000gar.asp DTM error http://www.geocomputation.org/1998/80/gc_80.htm GANTT chart http://www.ganttchart.com/ PCRaster http://pcraster.geog.uu.nl/documentation/pcrman/c1194.htm cartographic modelling lab http://cml.upenn.edu/ Smart draw http://www.smartdraw.com/specials/projectchart.asp?id=15063 SWOT analysis http://www.quickmba.com/strategy/swot/ Error http://www.edc.uri.edu/nrs/classes/NRS409/Lectures/7Error/error.htm Positional Lines are in the wrong place Vertical Elevations are wrong Elevations are correct, contour lines in the wrong locations Attributes Features are missing (error of omission) Features are in data but not in reality (error of commission) Features are mislabeled Inherent Error Natural Variation Age and dynamic data |