2.4. The application architecture perspective of information organization
computer applications nowadays tend to be constructed on the client/server systems architecture
client/server is primarily a relationship between processes running in the same computer or, more commonly, in separate computers across a telecommunication network (Figure 11)
the client is a process that requests services
the dialog between the client and the server is always initiated by the client
a client can request services from many servers at the same time
the server is a process that provides the service
a server is primarily a passive service provider
a server can service many clients at the same time
there are many ways of implementing a client/server architecture but from the perspective of information organization, the following five are most important
file servers --- the client requests specific records from a file; and the server returns these records to the client by transmitting them across the network
database servers --- the client sends structured query language (SQL) requests to the server; the server finds the required information by processing these requests and then passes the results back to the client
transaction servers --- the client invokes a remote procedure that executes a transaction at the server side; the server returns the result back to the client via the network
Web server --- communicating interactively by the Hypertext Transfer Protocol (HTTP) over the Internet, the Web server returns documents when clients ask for them by name
groupware servers --- this particular type of servers provides a set of applications that allow clients (and their users) to communicate with one another using text, images, bulletin boards, video and other forms of media
from the application architecture perspective, the objective of information organization and data structure is to develop a data design strategy that will optimize system operation by
balancing the distribution of data resources between the client and the server
databases are typically located on the server to enable data sharing by multiple users
static data that are used for reference are usually allocated to the client
ensuring the logical allocation of data resources among different servers
data that are commonly used together should be placed in the same server
data that have common security requirements should be placed in the same server
data intended for a particular purpose (file service, database query, transaction processing, Web browsing or groupware applications) should be placed in the appropriate server
standardizing and maintaining metadata (i.e. data about data) to facilitate the search for the availability and characteristics of existing data

--------------------------------------------------------------------------------

3. Data Structure
3.1. Levels of data abstraction
as noted in Section 1.3, information organization is concerned with the internal organization of data
it represents the user''s view of data, i.e. conceptualization of the real world
it is the lowest level of data abstraction, which can be done with or without any intent for computer implementation
it is expressed in terms of data models (Peuquet, 1991) (Figure 12)
note the differences between "data models" and "database models"
the vector and raster methods of representing the real world as explained in Section 2.1.2 above are "data models"
the relational, network, hierarchical and object-oriented databases are "database models" --- they are the software implementation of data models
data structure represents a higher level of data abstraction than information organization in the sense that it is concerned with the design and implementation of information organization
it represents the human implementation-oriented view of data
it is expressed in terms of database models
this implies that data structure is software-dependent but hardware is not yet a consideration
data structure forms the basis for the next level of data abstraction in information system: file structure or file format
file structure is the hardware implementation-oriented view of data
it reflects the physical storage of the data on some specific computer media such as magnetic tapes or hard disk
this implies that file structure is hardware-dependent
3.2. Descriptive data structures
descriptive data structures describe the design and implementation of the information organization of non-spatial data
as most commercial implementations of information systems today are based on the relational and object-oriented database models, we explain the data structures of these models in the following two sections
3.2.1. Relational data structure
the relational data structure is the table which is formally called a relation (Figure 13)
a relation is a collection of tuples that correspond to the rows of the table
the number of tuples in a relation is called the cardinality
a tuple is made up of attributes that correspond to the columns of the table
the number of attributes in a tuple is called the degree
each relation has a unique identifier called the primary key
the primary key is a column or combination of columns that at any given time has no identical values in any two rows
this means that the values of each row of the primary key are always unique
this allows the use of the primary keys to relate data in different tables in data processing (Figure 13)
the primary keys in those tables are called foreign keys
in order to enforce database integrity, relations are always normalized
normalization is built on the concept of normal form
a relation is said to be in a certain normal form if it satisfies a prescribed set of conditions (Date, 1995)
as a minimum, a relation in the relational database has to satisfy the conditions of the first, second and third normal forms
first normal form (1NF) --- a relation is said to be in 1NF if and only if its tuples contain no repeating attributes (i.e. there must not be multiple values for a single entity which might theoretically result from multiple sampling at a particular location)
second normal form (2NF) --- a relation is said to be in 2NF if it satisfies the condition for 1NF and if every non-key attribute is irreducibly dependent on the primary key
third normal form (3NF) --- a relation is said to be in 3NF if it satisfies the condition for 2NF and the non-key attributes are mutually independent
3.2.2. Object-oriented data structure
unlike the relational data structure, there is not a formalized object-oriented data structure
this means that different object-orientation implementations have different data structures
however, object-oriented data structure can be explained in generic terms using the concepts of object identify, object structure and type constructors (Elmasri and Navathe, 1994)
the concept of object identity
each object in an object-oriented database is provided a unique system-generated object identifier (OID)
the OID is for internal reference by the system and is therefore transparent to the user
the OID is immutable, i.e. its value remains unchanged
even when a particular object is removed from the database, its OID will never be assigned to any new object
the concept of object structure
the concept of object structure allows complex objects to be constructed from simple objects
each object is viewed as a triple (i, c, v) where
i = the object''s unique identifier (OID)
c = a constructor (which indicates how the object value is constructed)
v = object value
different object-oriented systems use different constructors, including: atom, tuple, set, list and array
an object value v is interpreted on the basis of the value of the constructor c in the triple (i, c, v) that represents the object
if c = atom, then v is an atomic value (i.e. it is an indivisible value)
if c = tuple, then v is a tuple containing one or more attributes with their respective OIDs
if c = set, then v is a set of object identifiers (OIDs) for a set of objects of the same type
if c = list, then v is an ordered list of OIDs of the same type
if c = array, then v is an array of OIDs of the same type
the concept of type constructors
a type constructor is used by an object-oriented definition language (OODDL) to define the data structure for an object-oriented database schema (Figure 14)
3.3. Graphical data structures
3.3.1. Raster data structure
in the raster data structure space is subdivided into regular grids of square grid cells or other forms of polygonal meshes known as picture elements (pixels) (Figure 15)
the location of each cell is defined by its row and column numbers
the area that each cell represents defines the spatial resolution of the data
the position of a geographic feature is only recorded to the nearest pixel
the value stored for each cell indicates the types of the object, phenomenon or condition that is found in that particular location
different types of values can be coded: integers, real numbers and alphabets
integer values often act as code numbers, which are referenced to names in an associated table (called the look-up table) or legend
different attributes at the same cell location are stored as separate themes or layers
for example, raster data pertaining to the soil type, forest cover and slope covering the same area are stored separately in a soil type theme, a forest cover theme and a slope theme
there are several variants to the regular grid raster data structure, including: irregular tessellation (e.g. triangulated irregular network (TIN)), hierarchical tessellation (e.g. quad tree) and scan-line (Peuquet, 1991)
3.3.2. Vector data structure
there are many implementations of vector data structures, including:
spaghetti --- a direct line-for-line unstructured translation of the paper map (Figure 16)
this structure has very limited practical use
it is usually an interim data structure for map digitizing
hierarchical --- a vector data structure developed to facilitate data retrieval by separately storing points, lines and areas in a logically hierarchical manner (Figure 17)
topological --- a vector data structure that aims at retaining spatial relationship by explicitly storing adjacency information (Figure 18)
the basic logical feature for line and area coverage is a straight line segment
each individual line segment is defined by the coordinates of its end points called nodes
topological information is stored by recording
the from-node and to-node of each line segment
the left-polygon and right-polygon (in the direction of the from-node to the to-node) of each line segment
3.4. The georelational data structure
the georelational data structure was developed to handle geographic data
it allows the association between spatial (graphical) and non-spatial (descriptive) data
it is the data structure used by many vector-based GIS software packages
both spatial and non-spatial data are stored in relational tables
point, line and polygon data are stored in separate feature attribute tables (FAT) (Figure 19)
in the FAT, each entity is assigned a unique feature identifier (FID)
topological information is explicitly stored by employing a method similar to the topological data structure described above
non-spatial data are stored in relational tables
entities in the spatial and non-spatial relational tables are linked by the common FIDs of entities (Figure 20)

--------------------------------------------------------------------------------

4. Data Modeling
data modeling is the process of defining real world phenomena or geographic features of interest in terms of their characteristics and their relationships with one another
it is concerned with different phases of work carried out to implement information organization and data structure
there are three steps in the data modeling process, resulting in a series of progressively formalized data models as the form of the database becomes more and more rigorously defined
conceptual data modeling --- defining in broad and generic terms the scope and requirements of a database
logical data modeling --- specifying the user''s view of the database with a clear definition of attributes and relationships
physical data modeling --- specifying internal storage structure and file organization of the database
data modeling is obviously closely related to the three levels of data abstraction in database design as noted in Section 3.1 above:
conceptual data modeling ----> data model
logical data modeling ---------> data structure
physical data modeling -------> file structure
4.1. Conceptual data modeling
entity-relationship (E-R) modeling is probably the most popular method of conceptual data modeling
it is sometimes referred to as a method of semantic data modeling because it used a human language-like vocabulary to describe information organization
it involves four aspects of work:
identifying entities
an entity is defined as a person, a place, an event, a thing, etc.
identifying attributes
determining relationships
drawing an entity-relationship diagram (E-R diagram) (Figure 21)
4.2. Logical data modeling
logical data modeling is a comprehensive process by which the conceptual data model is consolidated and refined
the proposed database is reviewed in its entirety in order to identify potential problems such as
irrelevant data that will not be used
omitted or missing data
inappropriate representation of entities
lack of integration between various parts of the database
unsupported applications
potential additional cost to revise the database
the end product of logical data modeling is a logical schema
the logical schema is developed by mapping the conceptual data model (such as the E-R diagram) to a software-dependent design document (Figure 22)
4.3. Physical data modeling
physical data modeling is the database design process by which the actual tables that will be used to store the data are defined in terms of
data format --- the format of the data that is specific to a database management system (DBMS)
storage requirements --- the volume of the database
physical location of data --- optimizing system performance by minimizing the need to transmit data between different storage devices or data servers
the end product of physical data modeling is a physical schema (Figure 23)
a physical schema is also variably known as data dictionary, item definition table, data specific table or physical database definition
it is both software- and hardware specific
this means the physical schemas for different systems look different from one another

-------------------------------------------------

Hosted by uCoz