Data structure

Data structure

a set of related records constitutes a data file (Figure 2c)
by related records, it means that the records represent different occurrences of the same type or class of people, things, events and phenomena
a data file made up of a single record type with single-valued data items is called a flat file (Figure 3a)
a data file made up of a single record type with nested repeating groups of items forming a multi-level organization is called a hierarchical file (Figure 3b)
a data file is individually identified by a filename
a data file may contain records having different types of data values or having a single type of data value
a data file containing records made up of character strings is called a text file or ASCII file
a data file containing records made up of numerical values in binary format is called a binary file
in data processing literature, collections of data items or records are sometimes referred to by other terms other than "data file" according to their characteristics and functions
an array is a collection of data items of the same size and type (although they may have different values)
a one-dimensional array is called a vector
a two-dimensional array is called a matrix
a table is a data file with data items arranged in rows and columns
data files in relational databases are organized as tables
such tables are also called relations in relational database terminology
a list is a finite, ordered sequence of data items (known as elements)
by "ordered", it means that each element has a position in the list
an ordered list has elements positioned in ascending order of values; while an unordered list has no permanent relation between element values and position
each element has a data type
in the simple list implementation, all elements must have the same data type but there is no conceptual objection to lists whose elements have different data types
a tree is a data file in which each data item is attached to one or more data items directly beneath it (Figure 4)
the connections between data items are called branches
trees are often called inverted trees because they are normally drawn with the root at the top
the data items at the very bottom of an inverted tree are called leaves; other data items are called nodes
a binary tree is a special type of inverted tree in which each element has only two branches below it
a heap is a special type of binary tree in which the value of each node is greater than the values of its leaves
heap files are created for sorting data in computer processing --- the heap sort algorithm works by first organizing a list of data into a heap
a stack is a collection of cards in Apple Computer''s Hypercard software system
the concept of database is the approach to information organization in computer-based data processing today
a database is defined as an automated, formally defined and centrally controlled collection of persistent data used and shared by different users in an enterprise (Date, 1995 and Everest, 1986)
above definition excludes the informal, private and manual collection of data
"centrally controlled" does not mean "physically centralized" --- databases today tend to be physically distributed in different computer systems, at the same or different locations
a database is set up to serve the information needs of an organization
data sharing is key to the concept of database
data in a database are described as "permanent" in the sense that they are different from "transient" data such as input to and output from an information system
the data usually remain in the database for a considerable length of time, although the actual content of the data can change very frequently
the use of database does not mean the demise of data files
data in a database are still organized and stored as data files
the use of database represents a change in the perception of data, the mode of data processing and the purposes of using the data (Table 1), rather than physical storage of the data
databases can be organized in different ways known as database models
the three conventional database models are: relational, network and hierarchical
relational --- data are organized by records in relations which resemble a table (Figure 5a) (See Section 3.2.1 for further explanation)
network --- data are organized by records which are classified into record types, with 1:n pointers linking associated records (Figure 5b)
hierarchical --- data are organized by records on a parent-child one-to-many relations (Figure 5c)
the emerging database model is object-oriented
data are uniquely identified as individual objects that are classified into object types or classes according to the characteristics (attributes and operations) of the object (Figure 5d) (See Section 3.2.2 for further explanation)
2.1.2. Information organization of graphical data
for graphical data, the most basic element of information organization is called a basic graphical element
there are three basic graphical elements (Figure 6):
point
line, also referred to as arc
polygon, also referred to as area
these basic graphical elements can be individually used to represent geographic features or entities
for example: point for a well; line for a road segment and polygon for a lake)
they can also be used to construct complex features
for example: the geographic entity "Hawaii" on a map is represented by a group of polygons of different sizes and shapes
the method of representing geographic features by the basic graphical elements of points, lines and polygon is said to be the vector method or vector data model, and the data are called vector data
related vector data are always organized by themes, which are also referred to as layers or coverages
examples of themes: geodetic control, base map, soil, vegetation cover, land use, transportation, drainage and hydrology, political boundaries, land parcel and others
for themes covering a very large geographic area, the data are always divided into tiles so that they can be managed more easily
a tile is the digital equivalent of an individual map in a map series
a tile is uniquely identified by a file name
a collection of themes of vector data covering the same geographic area and serving the common needs of a multitude of users constitutes the spatial component of a geographical database
the vector method of representing geographic features is based on the concept that these features can be can be identified as discrete entities or objects
this method is therefore based on the object view of the real world (Goodchild, 1992)
the object view is the method of information organization in conventional mapping and cartography
graphical data captured by imaging devices in remote sensing and digital cartography (such as multi-spectral scanners, digital cameras and image scanners) are made up of a matrix of picture elements (pixels) of very fine resolution
geographic features in such form of data can be visually recognized but not individually identified in the same way that geographic features are identified in the vector method
they are recognizable by differentiating their spectral or radiometric characteristics from pixels of adjacent features
for example, a lake can be visually recognized on a satellite image because the pixels forming it are darker than those of the surrounding features; but the pixels forming the lake are not identified as a single discrete geographic entity, i.e. they remain individual pixels
similarly, a highway can be visually recognized on the same satellite image because of its particular shape; but the pixels forming the highway do not constitute a single discrete geographic entity as in the case of vector data
the method of representing geographic features by pixels is called the raster method or raster data model, and the data are described as raster data
the raster method is also called the tessellation method
a raster pixel is usually a square grid cell but there are there are several variants such as triangles and hexagons (Peuquet, 1991)
a raster pixel represents the generalized characteristics of an area of specific size on or near the surface of the Earth
the actual ground size depicted by a pixel is dependent on the resolution of the data, which may range from smaller than a square meter to several square kilometers
raster data are organized by themes, which is also referred to as layers
for example, a raster geographic database may contain the following themes: bed rock geology, vegetation cover, land use, topography, hydrology, rainfall, temperature
raster data covering a large geographic area are organized by scenes (for remote sensing images) of by raster data files (for images obtained by map scanning)
the raster method is based on the concept that geographic features are represented as surfaces, regions or segments
this method is therefore based on the field view of the real world (Goodchild, 1992)
the field view is the method of information organization in image analysis systems in remote sensing and geographic information systems for resource- and environmental-oriented applications
in the past, the vector and raster methods represented two distinct approaches to information systems
they were based on different concepts of information organization and data structure
they used different technologies for data input and output
recent advances in computer technologies allow these two types of data to be used in the same applications
computers are now capable of converting data from the vector format to the raster format (rasterization) and vice versa (vectorization)
computers are now able to display vector and raster simultaneously
the old debate on the usefulness of these two approaches to information organization does not seem to be relevant any more
vector and raster data are largely seen as complimentary to, rather than competing against, one another in geographic data processing
2.2. The relationship perspective of information organization
relationships represent a important concept in information organization --- it describes the logical association between entities
relationships can be categorical or spatial, depending on whether they describe location or other characteristics
2.2.1. Categorical relationships
categorical relationships describe the association among individual features in a classification system
the classification of data is based on the concept of scale of measurement
there are four scales of measurement:
nominal --- a qualitative, non-numerical and non-ranking scale that classifies features on intrinsic characteristics
for example, in a land use classification scheme, polygons can be classified as industrial, commercial, residential, agricultural, public and institutional
ordinal --- a nominal scale with ranking which differentiates features according to a particular order
for example, in a land use classification scheme, residential land can be denoted as low density, medium density and high density
interval --- an ordinal scale with ranking based on numerical values that are recorded with reference to an arbitrary datum
for example, temperature readings in degrees centigrade are measured with reference to an arbitrary zero (i.e. zero degree temperature does not mean no temperature)
ratio --- an interval scale with ranking based on numerical values that are measured with reference to an absolute datum
for example, rainfall data are recorded in mm with reference to an absolute zero (i.e. zero mm rainfall mean no rainfall)
categorical relationships based on ranking are hierarchical or taxonomic in nature
this means that data are classified into progressively different levels of detail
data in the top level are represented by a limited broad basic categories
data in each basic category are then classified into different sub-categories, which can be further classified into another level if necessary
the classification of descriptive data is typically based on categorical relationships (Figure 7)
2.2.2. Spatial relationships
spatial relationships describe the association among different features in space
spatial relationships are visually obvious when data are presented in the graphical form
however, it is difficult to build spatial relationships into the information organization and data structure of a database
there are numerous types of spatial relationships possible among features (Table 2)
recording spatial relationships implicitly demands considerable storage space
computing spatial relationships on-the-fly slows down data processing particularly if relationship information is required frequently
there are two types of spatial relationships (Figure 8)
topological --- describes the property of adjacency, connectivity and containment of contiguous features
proximal --- describes the property of closeness of non-contiguous features
spatial relationships are very important in geographical data processing and modeling
the objective of information organization and data structure is to find a way that will handle spatial relationships with the minimum storage and computation requirements
2.3. The operating system (OS) perspective of information organization
from the operating system perspective, information is organized in the form of directories
directories are a special type of computer files used to organize other files into a hierarchical structure (Figure 9)
directories are also referred to as folders, particularly in systems using graphical user interfaces
a directory may also contain one of more directories
the topmost directory in a computer is called the root directory
a directory that is below another directory is referred to as a sub-directory
a directory that is above another directory is referred to as a parent directory
directories are designed for bookkeeping purposes in computer systems
a directory is identified by a unique directory name
computer files of the same nature are usually put under the same directory
a data file can be accessed in a computer system by specifying a path that is made up of the device name, one or more directory names and its own file name
for example: c:project101mapdatabasemapnw2367.dat
the concept of workspace used by many geographic information system software packages is based on the directory structure of the host computer
a workspace is a directory under which all data files relating to a particular project are stored (Figure 10)
2.4. The application architecture perspective of information organization
computer applications nowadays tend to be constructed on the client/server systems architecture
client/server is primarily a relationship between processes r