Geographical Data Sets

Geographical Data Sets

Exact matching

Fuzzy Matching

Scanning

Geographic Data -- Linkages and Matching
Linkages
A GIS typically links different sets. Suppose you want to know the mortality rate to cancer among children under 10 years of age in each country. If you have one file that contains the number of children in this age group, and another that contains the mortality rate from cancer, you must first combine or link the two data files. Once this is done, you can divide one figure by the other to obtain the desired answer.
Exact Matching
Exact matching occurs when you have information in one computer file about many geographic features (e.g., towns) and additional information in another file about the same set of features. The operation to bring them together is easily achieved by using a key common to both files -- in this case, the town name. Thus, the record in each file with the same town name is extracted, and the two are joined and stored in another file.
Name Populaiton
A 4038
B 7030
C 10777
D 5798
E 5606
Name Avg. housing Cost
A 30,500
B 22,000
C 100,000
D 24,000
E 24,000

Name Population Avg. Housing Cost
A 4038 30,500
B 7030 22,000
C 10777 100,100
D 5798 24,000
E 5606 24,000

Hierarchical Matching
Some types of information, however, are collected in more detail and less frequently than other types of information. For example, financial and unemployment data covering a large area are collected quite frequently. On the other hand, population data are collected in small areas but at less frequent intervals. If the smaller areas nest (i.e., fit exactly) within the larger ones, then the way to make the data match of the same area is to use hierarchical matching -- add the data for the small areas together until the grouped areas match the bigger ones and then match them exactly.

The hierarchical structure illustrated in the chart shows that this city is composed of several tracts. To obtain meaningful values for the city, the tract values must be added together.
Fuzzy Matching
On many occasions, the boundaries of the smaller areas do not match those of the larger ones. This occurs often while dealing with environmental data. For example, crop boundaries, usually defined by field edges, rarely match the boundaries between the soil types. If you want to determine the most productive soil for a particular crop, you need to overlay the two sets and compute crop productivity for each and every soil type. In principle, this is like laying one map over another and noting the combinations of soil and productivity.

A GIS can carry out all these operations because it uses geography, as a common key between the data sets. Information is linked only if it relates to the same geographical area.

Why is data linkage so important? Consider a situation where you have two data sets for a given area, such as yearly income by county and average cost of housing for the same area. Each data might be analysed and/or mapped individually. Alternatively, they may be combined. With two data sets, only one valid combination exists. Even if your data sets may be meaningful for a single query you will still be able to answer many more questions than if the data sets were kept separate. By bringing them together, you add value to the database. To do this, you need GIS.

Principal Functions of GIS

Data Capture

Data used in GIS often come from many types, and are stored in different ways. A GIS provides tools and a method for the integration of different data into a format to be compared and analysed. Data sources are mainly obtained from manual digitization and scanning of aerial photographs, paper maps, and existing digital data sets. Remote-sensing satellite imagery and GPS are promising data input sources for GIS.

Database Management and Update

After data are collected and integrated, the GIS must provide facilities, which can store and maintain data. Effective data management has many definitions but should include all of the following aspects: data security, data integrity, data storage and retrieval, and data maintenance abilities.

Geographic Analysis

Data integration and conversion are only a part of the input phase of GIS. What is required next is the ability to interpret and to analyze the collected information quantitatively and qualitatively. For example, satellite image can assist an agricultural scientist to project crop yield per hectare for a particular region. For the same region, the scientist also has the rainfall data for the past six months collected through weather station observations. The scientists also have a map of the soils for the region which shows fertility and suitability for agriculture. These point data can be interpolated and what you get is a thematic map showing isohyets or contour lines of rainfall.

Presenting Results

One of the most exciting aspects of GIS technology is the variety of different ways in which the information can be presented once it has been processed by GIS. Traditional methods of tabulating and graphing data can be supplemented by maps and three dimensional images. Visual communication is one of the most fascinating aspects of GIS technology and is available in a diverse range of output options.

Data Capture an Introduction

The functionality of GIS relies on the quality of data available, which, in most developing countries, is either redundant or inaccurate. Although GIS are being used widely, effective and efficient means of data collection have yet to be systematically established. The true value of GIS can only be realized if the proper tools to collect spatial data and integrate them with attribute data are available.

Manual Digitization

Manual Digitizing still is the most common method for entering maps into GIS. The map to be digitized is affixed to a digitizing table, and a pointing device (called the digitizing cursor or mouse) is used to trace the features of the map. These features can be boundary lines between mapping units, other linear features (rivers, roads, etc.) or point features (sampling points, rainfall stations, etc.) The digitizing table electronically encodes the position of the cursor with the precision of a fraction of a millimeter. The most common digitizing table uses a fine grid of wires, embedded in the table. The vertical wires will record the Y-coordinates, and the horizontal ones, the X-coordinates.

The range of digitized coordinates depends upon the density of the wires (called digitizing resolution) and the settings of the digitizing software. A digitizing table is normally a rectangular area in the middle, separated from the outer boundary of the table by a small rim. Outside of this so-called active area of the digitizing table, no coordinates are recorded. The lower left corner of the active area will have the coordinates x = 0 and y = 0. Therefore, make sure that the (part of the) map that you want to digitize is always fixed within the active area.

Scanning System

The second method of obtaining vector data is with the use of scanners. Scanning (or scan digitizing) provides a quicker means of data entry than manual digitizing. In scanning, a digital image of the map is produced by moving an electronic detector across the map surface. The output of a scanner is a digital raster image, consisting of a large number of individual cells ordered in rows and columns. For the Conversion to vector format, two types of raster image can be used.

In the case of Chloropleth maps or thematic maps, such as geological maps, the individual mapping units can be separated by the scanner according to their different colours or grey tones. The resulting images will be in colours or grey tone images.

In the case of scanned line maps, such as topographic maps, the result is a black-and-white image. Black lines are converted to a value of 1, and the white areas in between lines will obtain a value of 0 in the scanned image. These images, with only two possibilities (1 or 0) are also called binary images.

The raster image is processed by a computer to improve the image quality and is then edited and checked by an operator. It is then converted into vector format by special computer programmes, which are different for colour/grey tone images and binary images.

Scanning works best with maps that are very clean, simple, relate to one feature only, and do not contain extraneous information, such as text or graphic symbols. For example, a contour map should only contain the contour line, without height indication, drainage network, or infrastructure. In most cases, such maps will not be available, and should be drawn especially for the purpose of scanning. Scanning and conversion to vector is therefore, only beneficial in large organizations, where a large number of complex maps are entered. In most cases, however, manual digitizing will be the only useful method for entering spatial data in vector format.

Data Conversion

While manipulating and analyzing data, the same format should be used for all data. This Scanning System implies that, when different layers are to be used simultaneously, they should all be in vector or all in raster format. Usually the conversion is from vector to raster, because the biggest part of the analysis is done in the raster domain. Vector data are transformed to raster data by overlaying a grid with a user-defined cell size.

Sometimes the data in the raster format are converted into vector format. This is the case especially if one wants to achieve data reduction because the data storage needed for raster data is much larger than for vector data.

A digital data file with spatial and attribute data might already exist in some way or another. There might be a national database or specific databases from ministries, projects, or companies. In some cases a conversion is necessary before these data can be downloaded into the desired database.

The commonly used attribute databases are dBase and Oracle. Sometimes spreadsheet programmes like Lotus, Quattro, or Excel are used, although these cannot be regarded as real database softwares.

Remote-sensing images are digital datasets recorded by satellite operating agencies and stored in their own image database. They usually have to be converted into the format of the spatial (raster) database before they can be downloaded.

Spatial Data Management

Geo-Relational Data Model
All spatial data files will be geo-referenced. Geo-referencing refers to the location of a layer or coverage in space defined by the coordinate referencing system. The geo relational approach involves abstracting geographic information into a series of independent layers or coverages, each representing a selected set of closely associated geographic features (e.g., roads, land use, river, settlement, etc). Each layer has the theme of a geographic feature and the database is organized in the thematic layers.

With this approach users can combine simple feature sets representing complex relationships in the real world. This approach borrows heavily on the concepts of relational DBMS, and it is typically closely integrated with such systems. This is fundamental to database organization in GIS.

Topological Data Structure.
Topology is the spatial relationship between connecting and adjacent coverage features (e.g., arc, nodes, polygons, and points). For instance, the topology of an arc includes from and to nodes (beginning of the arc and ending of the arc representing direction) and its left and right polygon. Topological relationships are built from simple elements into complex elements: points (simplest elements), arcs (sets of connected points), and areas (sets of connected arcs). Topological data structure, in fact, adds intelligence to the GIS database.

Attribute Data Management
All Data within a GIS (spatial data as well as attribute data) are stored within databases. A database is a collection of information about things and their relationships to each other. For example, you can have an engineering geological database, containing information about soil and rock types, field observations and measurements, and laboratory results. This is interesting data, but not very useful if the laboratory data, for example, cannot be related to soil and rock types.

The objective of collecting and maintaining information in a database is to relate facts and situations that were previously separate.

The principle characteristics of a DBMS are: -

Centralized control over the database is possible, allowing for better quality management and operator-defined access to parts of the database;

Data can be shared effectively by different applications;

The access to the data is much easier, due to the use of a user-interface and the user-views (especially designed formula for entering and consulting the database);

Data redundancy (storage of the same data in more than one place in the database) can be avoided as much as possible; redundancy or unnecessary duplication of data are an annoyance, since this makes updating the database much more difficult; one can easily overlook changing redundant information whenever it occurs; and

The creation of new applications is much easier with DBMS.

The disadvantages relate to the higher cost of purchasing the software, the increased complexity of management, and the higher risk, as data are centrally managed.

Relational Database -- Concepts & Model
The relational data model is conceived as a series of tables, with no hierarchy nor any predefined relations. The relation between the various tables should be made by the user. This is done by identifying a common field in two tables, which is assigned as the flexibility than in the other two data models. However, accessing the database is slower than with the other two models. Due to its greater flexibility, the relational data model is used by nearly all GIS systems

Choosing geographic data
The main purpose of purchasing a geographic information system (GIS)* is to produce results for your organization. Choosing the right GIS/mapping data will help you produce those results effectively.
The role of base-map data in your GIS,
The common characteristics of geographic data,
The commonly available data sources
Guidelines for evaluating the suitability of any data set for your project.
The world of GIS data is complex, by choosing the right data set, you can save significant amounts of money and, even more importantly, quickly begin your GIS project.

Data: The Core of Your Mapping / GIS Project

When most people begin a GIS project, their immediate concern is with purchasing computer hardware and software. They enter into lengthy discussions with vendors about the merits of various components and carefully budget for acquisitions. Yet they often give little thought to the core of the system, the data that goes insid

Basics of Digital Mapping
Geographical Data Sets