| |||
ArcGIS using a computer (photos) http://educationally.narod.ru/gisarcphotoalbum.html Analytical modelling in GIS (examples and application of Terrain and Hydrogeological Modelling) http://educationally.narod.ru/gis4links.html The Research on Spatial Process Modeling in GIS http://www.aars-acrs.org/acrs/proceeding/ACRS2002/Papers/GIS02-6.pdf The functionality of spatial process analysis is the foundation of GIS’s comprehensive application . To realize such a goal. it is desirable that not only the data model of GIS can represent the datasets needed by spatial process models, but also GIS is able to do multitemporal spatial analysis. But most of current GISs are static in essence, they cannot represent temporal information and temporal topology, and have not rich functionality of multitemporal analysis. Based on this point of view, a GIS spatial processes modeling system(GSPMS) is proposed , which extends current GIS, supports the whole life cycle of an GIS spatial process modeling and supports integration of models, GIS and other computer systems. This paper first describes the conception of spatial process and spatial process modeling. And then the conception framework for GIS spatial processes modeling system is investigated and a graph representative method for the framework architecture is introduced. At last the functions of GSPMS and its supporting technologies are discussed. GIS and Modeling Overview http://www.geog.ucsb.edu/~good/papers/414.pdf Modeling can be defined in the context of geographic information systems (GIS) as occurring whenever operations of the GIS attempt to emulate processes in the real world, at one point in time or over an extended period. Models are useful and used in a vast array of GIS applications, from simple evaluation to the prediction of future landscapes. In the past it has often been necessary to couple GIS with special software designed for high performance in dynamic modeling. But with the increasing power of GIS hardware and software, it is now possible to reconsider this relationship. Modeling in GIS raises a number of important issues, including the question of validation, the roles of scale and accuracy, and the design of infrastructure to facilitate sharing of models. Natural and scaled models in "data model" http://www.geokemp.net/papers/kemp.htm The term "data model" was coined in computer science. The original definition may be attributed to Date who defined it as "a set of defined entities and the relationships between them" (Date 1975). There is little ambiguity in this definition and those working in the field of database management clearly use the term with confidence that the meaning is broadly understood. However, within this single field, it has been necessary to identify several classes of data models as one moves from reality to the digital world (Figure 1). Conceptual data models refer to entities in an enterprise and their relationships. Relational, network and hierachical data models are of the logical data model class. Physical data models refer to the digital structures used to or ganize and store the data within the computer. However, although GIS textbooks frequently make reference to these standard DBMS terms, the term data model in GIS is used in a number of different ways and, as a result, confusion results. For example, data model is commonly used in GIS in the following contexts: vector and raster data models field and object data models representations for fields - pointgrids, contours, TIN data models even data structures such as GBF/DIME and chaincodes have been called data models (Peuquet 1990) While all of these definitions are clearly related and similar, they are by no means synonymous. Each one addresses a similar issue, but from a different perspective. Without common agreement on what we mean, in GIS, when we say "data model", we cannot truly understand the fundamental importance of the concept. Why do we need a better understanding of what we mean by "data model"? Evolution in the understanding of these important issues is reflected in the hot topics arising from the three Environmental Modeling with GIS conferences hosted by the NCGIA in 1991 (Goodchild et al 1993), 1993 (Goodchild et al 1996) and 1996 (these proc eedings). Attention at the first conference was focused on the sophisticated uses being made of GIS in various environmental modeling disciplines. The second conference highlighted papers dealing with integration of data, individual GI systems and compu ter models of the environment. The third conference stressed interoperability--thus a concern with the integration of specific computing environments has been overtaken by consideration of the development of overarching theory and implementations through which everyone and every system is able to communicate with each other. The need for a common language is clear. But why are the issues raised in discussions of data models important? While, as suggested above, there are many perspectives to the issue, all of the papers addressing the topic of data models seek to provide some further understanding of how we represe nt the world in a computer and/or database. Indeed, much progress is already being made in this direction. In (Kemp 1993), we outlined the need for a formalization of the relationship between the concept of a real world "field" and its representation in one of 6 different spatial data models (here referring to grids, polygons, TINs, contour lines, pointgrids and irregular points). Each of these representations imposes different interpretations of the continuous nature of the field and implies different techniques for interpolating values at points between those few for which data is stored in the digital database. We argued that it is necessary to retain certain information about the relationship between the reality being represented and the model use d to store it. Some of this information is implicit in the data model chosen, some of it must be explicitly stated (eg. as encapsulated operations). In a guest editorial in International Journal of Geographical Information Systems, Burrough and Frank have mused upon the importance of understanding "the philosophical and experiential foundations of human perception of geographical phenomena and their abstraction and coding in geographical information systems" (Burrough and Frank 1995, p. 101). They consider how "geographic data models" reflect how people view the world. While they do not specifically define what they mean by their data model t erm, the discussion incorporates a consideration of spatial data paradigms and the variable aspects of representations affected by differences arising from different user communities and cultures. They conclude that: the question arises of how one can sensibly integrate different kinds of spatial data if each has been observed, recorded, modelled and stored according to its own particular set of paradigms.... The main conclusion must be that methods of ha ndling spatial information must be linked to the paradigms of the users'' disciplines and that inter-disciplinary research to determine more accomodating paradigms than the object-field models is essential. (Burrough and Frank 1995, p. 114) What does a data model do? If we are to uncover what we are trying to get at by using the term data model, it is useful to express what a data model is intended to do. Writing in the database management literature, Brodie suggests that: a clear goal for a data model is that it be expressive. Using the data model, one should be able to represent any static or dynamic property of interest to the desired degree of precision in order to capture the intended meaning (Brodie et al . 1984, p. 41). Similarly, Goodchild and others have recently written that "in essence, a data model captures the choices made by scientists and others in creating digital representations of phenomena, and thus constrains later analysis, modeling and interpretation" (Goo dchild et al. 1995, p. 10). Data modeling is the process by which entities in the real world are discretized. While sampling the real world requires abstraction so that the natural complexity can be reduced to simple data, data models allow the addition of information to raw data. For example, the TIN model allows a network of points to be joined in such a way that sloped surfaces are represented. Thus, complexity is returned to simple data. Therefore we suggest that the term data model be understood within the GIS context to exist across the full spectrum between the real world and its binary representation. Thus data models may include reference to any of the following: data structures - points, polylines and polygons, raster and vector representations of fields as pointgrids, polygons, contours, TINs, cellgrids, irregular points a link and node network representing a hydrological system a transportation network used as an addressing system abstract discrete models identified simply as fields or networks and importantly, analog data models such as those used during field sampling and data collection. Moving from the real world through various data models to model output requires transformations in both information structure and information content. These transformations from the real world to binary representations of it include: abstraction, generalization and selection of relevant concepts, processes and relationships in the real world conceptual modeling of the relationships between abstract entities mathematical modeling of the relationships between defined entities physical sampling of the real world storage of data in computers - may or may not include the necessity to model space transformation of data between different representations (models). Transformations take place as real world data is collected, recorded, manipulated and eventually stored in digital databases. Along this transformational path, the people who are manipulating the data transition from environmental scientist to computer an alyst to "naive" user. How much real world knowledge can be passed from each of these individuals to the next through the data itself? As the data model and, possibly, associated metadata and functionality or procedures are the media, it is critical tha t these incorporate all the relevant information. As described earlier, the transformation from the real world through data models to binary representations involves some loss and, sometimes, recovery of information. There are a number of dimensions in the information which may be captured or lost in th e data modeling process (after Burrough and Frank 1995): dimensionality - fields or objects, 2D, 3D scale - single- or multi-scaled representation - generalized to detailed exactness - can phenomena be exactly described? logic - single value (Boolean) or mulitvalued (fuzzy) static or dynamic measurement scale - nominal, ordinal, interval or ratio enumeration - complete or sampled deterministic or stochastic A natural analoque models uses actual events of reality as a basis for model construction. There are also scale analoque models such as topographic graphs and areal photographs wich are generalized of reality. We have also conceptual models and mathematical models. Analog data models versus digital data models http://www.geokemp.net/papers/kemp.htm Analog data models define fundamental primitives which conceptually discretize the infinite complexity of reality. However, unlike digital data models, analog data models can be continuous and they may or may not include the same primitives used in the d ata models to represent the same phenomena digitally. Data collected in analog field data models include: points on lines geology - transects recording significant rock type transitions hydology - stream cross-sections oceanography - transects taken from ships areas ecology - sample plots soils - homogeneous areas defined from air photos points geology - boreholes hydrology - well logs (which provide a continuous measurement of a changing surface) soils - soil pits The point is that while the data may be discrete, the analog model relates these discrete values to continuous mental models of the real world. Interoperability through data models Data models provide entities and relationships at various levels of definition and discreteness. Interoperability requires the identification of what information is provided by the specific data models used in each computing component, a method for trans ferring that information with as little noise as possible plus, ideally, some measure of the amount of information lost in any processing stage. Interoperability through generic data models are described by papers presented in this session. Vckovski and Bucher lay out a specification through which information about data and data models can be included in the Open Geodata Interoperability Specification (OGIS). Albrecht supports interoperability by identifying a generic set of functions which operate on standardized data models. In another major project addressing related issues, Smith and others are attempting to implement a computational modeling system (CMS) "which is intended to provide scientific investigators with a unified computational environment and easy access to a bro ad range of modeling tools" (Smith et al. 1995, p. 127). It is an impressive effort and sets in place many fundamental programming concepts necessary for interoperating modeling environments. In particular, it outlines a process by which symbolic repres entations of phenomena or concepts can be constructed from fundamental primitives and provides a means for relating these different concepts through transformations. The system allows the existence of both abstract representations, which may simply ident ify instances of a particular concept by name, and mulitiple concrete representations of these which are their various digital representations. Existing in an object oriented programming environment, concrete representations are built up inductively from primitives and previously defined super- or parent-representations. While this CMS promises to provide an extremely powerful and flexible tool for environmental scientists conducting a modeling project, it does not adequately address the conceptual end of the data modeling process. No support is provided to assist the en vironmental scientist in formulating the abstract representations of the phenomena and concepts being studied. The system requires representations to be built up from primitives, but how does one go about identifying the concepts and their transformation s and relating them to appropriate primitive? What is still missing is a mechanism for ensuring the appropriate information is passed from the real world to the data models. Data models and GIS research Given all of the above, it should be clear that data models provide fundamental areas of investigation in many GIS research environments. For example, within the NCGIA''s research agenda, we can find the following very different considerations of the data model problem: Initiative 15: Multiple roles for GIS in US Global Change Research In order for digital data and geographic information tools to be generally useful for the global change community, there is a need to understand the spectrum of different data models used by these scientists. Initiative 17: Collaborative Spatial Decision Making Data models of different decision makers must be intergrated in order to compare different positions. Initiative 20: Naive Geography Naive Geography is defined as the field of study that is concerned with formal models of the common-sense world. More specifically, the research agenda on data models from the first I-15 Specialist Meeting includes: characterization of existing different models reconciliation of different models methods to measure the representational efficiency of a data model metrics for measuring user satisfaction with a data model a common underlying idea of time and space on which data models can be based support for a representation of change in a data model development of a language for data model description documentation of models from each domian abstraction and identification of shared elements of data models construction of translators between models at the conceptual level Conclusion Much progress has been made in recognizing and structuring the information content of digital data models used in GIS. However, much more effort is needed in understanding analog spatial data models (i.e. the models used by environmental scientists) and their relationship to existing and future digital spatial data models. Acknowledgements Research at the NCGIA is supported by a grant from the National Science Foundation (SBR 88-10917). References Brodie, M. L., J. Mylopoulos, J. W. Schmidt. (1984). On conceptual modelling : perspectives from artificial intelligence, databases, and programming languages. New York, Springer-Verlag. Burrough, P. A. and A. U. Frank (1995). Concepts and paradigms in spatial information: are current geographical information systems truly generic? International Journal of Geographical Information Systems 9(2): 101-116. Couclelis, H. (1992). People manipulate objects (but cultivate fields): beyond the raster-vector debate in GIS. Theories and Methods of Spatio-Temporal Reasoning in Geographic Space. A. U. Frank, I. Campari and U. Formentini, Springer-Verlag. 639: 65-77. Conceptual Model for Disaster Management in India using GIS http://www.gisdevelopment.net/proceedings/mapindia/2006/disaster%20management/mi06disa_245abs.htm Conceptual process models are expressed in verbal or graphycal form, and attempt to describe in words interactions between real - world features. INTEGRATING MATHEMATICAL MODEL WITH GIS http://libraries.maine.edu/Spatial/gisweb/spatdb/gis-lis/gi94095.html Mathematical process models use a range of techniques including deterministic, stochastic and optimization modelling. Implementing the boll weevil eradication program in Oklahoma would initially increase contamination risks of insecticide leaching and runoff from cotton fields. The mathematical model EPIC-PST and geographic information system (GIS) techniques were integrated to evaluate the potential of insecticide losses. The GIS database was developed by digitizing cotton areas obtained from aerial photographs and soil mapping unit data from Soil Conservation Service Soil Surveys. The cotton and soil coverages were intersected by use of GIS to delineate the soil types within the cotton field. The model simulation was performed for each soil type to identify the insecticide leaching and runoff potential. The model output was graphically presented for visualizing the spatial pattern of the potential. Combining the capability of GIS with the transport model enhances the model application and provides a tool for agricultural nonpoint source pollution control Mathematical model http://en.wikipedia.org/wiki/Mathematical_model The term model has a different meaning in model theory, a branch of mathematical logic. An artifact which is used to illustrate a mathematical idea is also called a mathematical model and this usage is the reverse of the sense explained below. A mathematical model uses mathematical language to describe a system. Mathematical models are used not only in the natural sciences and engineering disciplines (such as physics, biology, earth science, meteorology, and engineering) but also in the social sciences (such as economics, psychology, sociology and political science); physicists, engineers, computer scientists, and economists use mathematical models most extensively. The process of developing a mathematical model is termed ''mathematical modelling'' (also modeling). Eykhoff (1974) defined a mathematical model as ''a representation of the essential aspects of an existing system (or a system to be constructed) which presents knowledge of that system in usable form''.[1] Mathematical models can take many forms, including but not limited to dynamical systems, statistical models, differential equations, or game theoretic models. These and other types of models can overlap, with a given model involving a variety of abstract structures. deterministic model Hide links within definitionsShow links within definitions Definition Mathematical model in which outcomes are precisely determined through known relationships among states and events, without any room for random variation. In such models, a given input will always produce the same output, such as in a known chemical reaction. In comparison, stochastic models use ranges of values for variables in the form of probability distributions Application of mathematical and computer programming techniques to the construction of deterministic models, principally for business and economics. For models that only require linear algebraic equations, the techniques are called linear programming; for models that require more complex equations, it is called nonlinear programming. In either case, models frequently involve hundreds or thousands of equations. The discipline emerged during World War II to solve large-scale military logistics problems. Mathematical programming is also used in planning civilian production and transportation schedules and in calculating economic growth GIS Applications of Deterministic Solute Transport Models http://www.ars.usda.gov/SP2UserFiles/Place/53102000/pdf_pubs/P1406.pdf In recent years, worldwide attention has shifted from point source to non-point source (NPS) pollutants, particularly with regard to the pollution of surface and subsurface sources of drinking water. This is due to the widespread occurrence and potential chronic health effects of NPS pollutants. The ubiquitous nature of NPS pollutants poses a complex technical problem. The area1 extent of their contamination increases the complexity and sheer volume of data required for assessment far beyond that of typical point source pollutants. The spatial nature of the NPS pollution problem necessitates the use of a geographic information system (GIS) to manipulate, retrieve, and display the large volumes of spatial data. This chapter provides an overview of the components (i.e., spatial variability, scale dependency, parameter-data estimation and measurement, uncertainty analysis, and others) required to successfully model NPS pollutants with GIS and a review of recent applications of GIS to the modeling of non-point source pollutants in the vadose zone with deterministic solute transport models. The compatibility, strengths, and weaknesses of coupling a GIS to deterministic one-dimensional transport models are discussed Stochastic_modelling http://en.wikipedia.org/wiki/Stochastic_modelling_(insurance) This page is concerned with the stochastic modelling as applied to the insurance industry. For other stochastic modelling applications, please see Monte Carlo method. For mathematical definition, please see Stochastic process "Stochastic" means being or having a random variable. A stochastic model is a tool for estimating probability distributions of potential outcomes by allowing for random variation in one or more inputs over time. The random variation is usually based on fluctuations observed in historical data for a selected period using standard time-series techniques. Distributions of potential outcomes are derived from a large number of simulations (stochastic projections) which reflect the random variation in the input(s). Its application initially started in physics. It is now being applied in engineering, life sciences, social sciences, and finance. See also Economic capital [edit] Valuation Like any other company, an insurer has to show that its assets exceeds its liabilities to be solvent. In the insurance industry, however, assets and liabilities are not known entities. They depend on how many policies result in claims, inflation from now until the claim, investment returns during that period, and so on. So the valuation of an insurer involves a set of projections, looking at what is expected to happen, and thus coming up with the best estimate for assets and liabilities, and therefore for the company''s level of solvency. Unit 130 - Process Modeling and Simulations http://www.ncgia.ucsb.edu/giscc/units/u130/u130.html Introduction Definition of process modeling and simulation: theoretical concepts and computational methods that describe, represent and simulate the functioning of real-world processes; computer simulations are becoming a ''third way'' of performing research, expanding thus traditional experimental and theoretical approaches: simulation can be regarded as a numerical experiment, but it often requires advancements in theory simulations can provide information which is impossible or too expensive to measure, as well as insights which are not amenable or too complicated for analytical theory methods models are simplified abstractions of reality representing or describing its most important/driving elements and their interactions simulations can be regarded as model runs for certain initial conditions (real or designed) Purpose of modeling and simulations: analysis and understanding of observed phenomena testing of hypotheses and theories prediction of spatio-temporal systems behavior under various conditions and scenarios (both existing and simulated, often performed to support decision making) new discoveries of functioning of geospatial phenomena enabled by unique capabilities of computer experiments Role of GIS : storing and managing input data and results pre-processing of input data (editing, transformation, interpolation, derivation of parameters, etc.) analysis and visualization of results (Mitas et al 1997) providing computational environment and tools for simulations -------------------------------------------------------------------------------- 2. Types of process models based on area of application, models represent: natural processes atmospherical (global and regional circulation, air pollution) hydrological (water cycle and related processes, see unit 179) geological, geomorphological (solid earth processes) biological and ecosystem (see unit 171) interactions between hydrosphere, atmosphere, lithosphere and biosphere socio-economic/anthropogenic processes transportation urban, population production (manufacturing, farming (see unit 181) distribution and services (see unit 174) interactions between socio-economic processes interaction of natural and anthropogenic phenomena (e.g., environmental models, food production, forestry, mining) based on the type of spatial distribution, process models describe the behavior of phenomena represented by: homogeneous or spatially averaged units, e.g. subwatersheds, counties, polygons (sometimes referred to as lumped models) with processes described by ordinary differential equations fields/multivariate functions discretized as rasters, grid cells or meshes (distributed models) with processes described by partial differential equations or cellular automata (see unit 054) Table 1. Representation of phenomena as multivariate fields. networks (systems of nodes and links, see unit 064) points representing individuals and agents combinations of fields, networks and points based on the nature of spatial interactions, (see unit 021 and unit 123) models involve : no spatial interaction, only location dependent behavior short-range, close neighborhood interaction long-range/expanding interaction based on the type of underlying physical or social process, models simulate fluxes (over a surface, through network, in 3D space), including: diffusion, dispersion, advection, convection, reaction, radiation and heat transfer proliferation and decay (chemical processes, radioactive decay) population dynamics (birth/death, competition, predator/prey, epidemics) intelligent agents (systems of independent entities which interact between themselves and with environment with a certain degree of decision making capabilities) based on the spatial extent of modeled phenomena models are local regional global multiscale or nested models (Steyaert 1993 in Goodchild et al. 1993), with high resolution models used to calibrate the large scale, low resolution models, output of large scale models used as an input for small scale models -------------------------------------------------------------------------------- 3. Approaches to modeling and simulations real processes are complex and often include non-linear behavior, stochastic components and feedback loops over spatial and temporal scales, therefore models can represent the processes only at a certain level of simplification empirical models are based on statistical analysis of observed data, and they are usually applicable only to the same conditions under which the observations were made (for example the Universal Soil Loss Equation for modeling annual soil loss based on terrain, soil, rainfall and land cover factors, Renard et al. 1991) process based models are based on understanding of physical, chemical, geological, and biological processes and their mathematical description (for example, hydrologic and erosion models SIMWE: Mitas and Mitasova 1998, CASC2D: Saghafian 1996 in Goodchild et al 1996) models of complex systems often use combination of empirical and process based approaches 3.1 Deterministic models model processes which are often described by differential equations, with a unique input leading to unique output for well-defined linear models and with multiple outputs possible for non-linear models; equations can be solved by different numerical methods (after discretization: modification to run on a grid or a mesh, and parametrization: seting parameters to account for subgrid processes): finite difference principle (Press et al 1992) example (Saghafian 1996 in Goodchild et al 1996: CASC2d) finite element principle, meshes (Burnett 1987) example (Vieux 1996 in Goodchild et al 1996: r.water.fea) path simulation principle: based on random walker representation, note: not to be confused with stochastic simulations example: Figure 1 - path simulation solution of sediment flow continuity equation and resulting spatial distribution of erosion (red) and deposition (blue) (SIMWE model, animation, Mitas and Mitasova 1998). models describe processes at various levels of temporal variation steady state, with no temporal variations, often used for diagnostic applications time series of steady state events, computed by running a steady state model with time series of input parameters, this approach is commonly used for estimation of long term average spatial distributions of modeled phenomena dynamic, describing the spatio-temporal variations during a modeled event, used for prognostic applications and forcasting 3.2 Stochastic models model spatio-temporal behavior of phenomena with random components unique input leads to different output for each model run, due to the random component of the modeled process, single simulation gives only one possible result multiple runs are used to estimate probability distributions conditional simulations combine stochastic modeling and geostatistics to improve characterization of geospatial phenomena behavior of dynamic stochastic systems can be described by different types of stochastic processes, such as Poisson and renewal, discrete-time and continuous-time Markov process, matrices of transition probabilities, Brownian processes and diffusion (Nelson 1995, Molchanov and Woyczynski 1997) 3.3 Rule based models model processes governed by local rules using cellular automata: non-linear dynamic mathematical systems based on discrete time and space (Wolfram 1984) principles cellular automaton evolves in discrete time-steps by updating its state according to a transition rule which is applied universally and synchronously to each cell at each time step. value of each cell is determined based on a geometric configuration of neighbor cells which is specified as a part of the transition rule. complex global behavior may emerge from application of simple local rules, thus useful for simulating systems that are not fully understood but for which their local processes are well known. examples urban growth simulation models: diffusion-limited aggregation (Batty and Longley 1994), innovation diffusion model (Clarke 1996, Park and Wagner 1997) spatially explicit ecological models forest fire simulation (Clarke et al 1994) 3.4 Multi-agent simulation of complex systems model movement and development of groups of many interacting agents agent is any actor in a system, that can generate events that affect itself and other agents, a typical agent is modeled as a set of rules, responses to stimuli individual-based models represent movement/development of individual entities over space and time based on local rules hierarchical models can be built by nesting multiple collections of agents with their schedule of activity example: SWARM - multiagent simulation of complex systems (Minar et al 1996, Booth 1997) -------------------------------------------------------------------------------- 4. Models and reality calibration the role of parameters and their limits are evaluated by parameter scans (Clarke 1996 in Goodchild et al 1997 CDROM, Mitas et al. 1997) Figure 2 - parameter scan for sediment flow and erosion/deposition for detachment capacity coefficient changing from 0.001-10, animation model results are compared with experiments and parameters are set to values which ensure the best reproduction of the experimental data sensitivity analysis, error propagation and uncertainty is performed to estimate impact of errors in input data on the model results (2.10: u096) causes of inconsistency between models and reality (Steyaert 1993 in Goodchild et al 1993) only limited number of interacting processes can be treated process may not be well understood or is treated inadequately resolution and/or scale may be inadequate numerical solution can be too sensitive to initial conditions model can be incorrectly applied to conditions when its assumptions are not valid errors in input data Unit 130 - Process Modeling and Simulations http://www.ncgia.ucsb.edu/giscc/units/u130/u130.html GIS implementation simple modeling is supported by most commercial GIS, especially within the raster subsystems (ARCGRID, ArcView Spatial Analyst, Intergraph ERMA, IDRISI, GRASS, ERDAS) full integration of complex models may require extensions of standard GIS functions such as support for temporal and 3D/4D data and meshes for finite element methods opening of data formats and incorporation of customization and application development tools stimulate coupling of commercial GIS and modeling use of object oriented technology facilitates more efficient GIS implementation and merges the different levels of coupling. 5.1 Full integration - embeded coupling model is developed and implemented within a GIS using the programing and development tools of a given GIS (Application Programming Interface (API), scripting tools, map algebra operations) model is run as a GIS command, inputs and outputs are in a GIS database and no data transfer is needed, computation is efficient for adequately coded models, models written with scripting tools may be slower, portability is restricted because of dependence on a GIS within which the model was developed and implemented, examples embeded coupling r.hydro.CASC2d, r.water.fea in GRASS (Saghafian 1996, Vieux 1996 in Goodchild et al 1996) Darcyflow, Particletrack in ARCGRID (ESRI 1994) map algebra implementation water flow (example in GRASS r.mapcalc, Shapiro and Westervelt 1992) dispersion: simple fire spread model (example in ARCGRID, ESRI 1994) model development is supported by customization and application development tools and extensions to map algebra (for example, Wesseling et al. 1996: DYNAMITE for PCRaster, Park and Wagner 1997: Cellular-IDRISI, ESRI 1996: Avenue) 5.2 Integration under a common interface - tight coupling model is developed outside GIS and has its own data structures with exchange of data between model and GIS hidden from user, although in some cases the data files can be shared GIS and model are linked through a common interface interface often supports integration of GIS and several different models for simulation of complex systems with interrelated processes portability is restricted examples: (see Web references) SWAT, AGNPS, ANSWERS coupled with GRASS SWAT, IDOR3D, BASIN-2 coupled with ArcView 5.3 Loose coupling model is developed and run independently of GIS input data are exported from GIS and results are imported to GIS for analysis and visualization portability - model can be used with different GIS examples: PAYSAGE-forest and habitat change (Hansen et al. 1996), coupled with Arcview/ArcInfo SIMWE-erosion and deposition (Mitas and Mitasova 1998), coupled with GRASS, but can run with any GIS which supports raster data 5.4 Modeling environments linked to GIS aimed at modular, reusable model development modeling environment is linked to GIS through interface or data import/export examples: SME: Spatial Modeling Environment (Maxwell and Constanza 1997) SWARM: Multi-agent simulation of complex systems (Minar et al. 1996) MMS-Modular Modeling System (Leavesley 1993 in Goodchild et al. 1993) -------------------------------------------------------------------------------- 6. Application examples and future trends 6.1. Natural resources water, sediment and contaminants (see unit 179): SWAT: Soil and Water Assessment Tool IDOR2D,3D: Hydrodynamic Pollutant Transport Simulation Hydrology models in GRASS solid earth processes SAND: conditional simulations for mining landscape evolution atmospheric modeling spatially explicit ecological models (see unit 171) Dynamic ecological simulations Forest dynamics SWARM: multiagent simulation of complex systems Mallard production models 6.2 Socio-economic transportation Transportation management with GIS population growth and migration urban growth (see unit 163) Clarke Urban Growth Model food production Interfacing crop growth models with GIS military 6.3 Integrated models of complex systems atmospheric+hydrologic+plant growth+erosion/sedimentation Modular Modeling System economic-ecological systems Integrated Ecological Economic Modeling And valuation of Watersheds: Patuxten watershed model 6.4 Future trends real-time simulations distributed on-line modeling complex systems: integrated models of interacting processes dynamic systems in 3D space object oriented reusable model development environments -------------------------------------------------------------------------------- 7. Summary process modeling is aimed at improving our understanding and predicting the impact of natural and socio-economic processes and their interactions GIS provides supporting tools for modeling, especially spatial data management, analysis and visualization process models describe the behavior of phenomena represented by fields, networks and individual agents with various types of spatial interactions at local, regional or global scale models can be rule based, deterministic, stochastic, multiagent issues of calibration, error propagation and scale are important for realistic simulations GIS and models can be fully integrated or linked through data and interface well developed applications are in hydrology, sediment and contaminants transport, ecosystem modeling and urban growth -------------------------------------------------------------------------------- 8. Review and study questions find examples of proces models using phenomena represented by fields, networks and points name the type of processes and disciplines where deterministic models are often used give examples of models when GIS implementation as a script would be the most effective write a simple diffusion model using map algebra -------------------------------------------------------------------------------- 9. Reference materials 9.1 Print References Batty M and Longley P 1994 Fractal Cities. London, Academic Press. Batty, M. and Xie, Y. 1994 Modelling inside GIS: Part 2. Selecting and calibrating urban models using Arc/Info, International Journal of Geographical Information Systems, vol. 8, no. 5, pp. 429-450 Urban growth modeling by cellular automata Booth G 1997 Gecko: A continuous 2D World for ecological modeling. Artficial Life Journal 3:3, 147-163. Individual based simulation system for multiple species at multiple trophic levels Burnett D. S. 1987, Finite Element Analysis: From Concepts to Applications, Addison-Wesley, Reading, MA. Clarke K, Brass J, and Riggan P 1994 A cellular automaton model of wild fire propagation and extinction. Photogrammetric Engineering and Remote Sensing, 60, 1355-67. The NCGIA Core Curriculum in GIS Science: types of process models http://www.ncgia.ucsb.edu/giscc/units/u130/ Types of process models based on area of application, models represent: natural processes atmospherical (global and regional circulation, air pollution) hydrological (water cycle and related processes, see unit 179) geological, geomorphological (solid earth processes) biological and ecosystem (see unit 171) interactions between hydrosphere, atmosphere, lithosphere and biosphere socio-economic/anthropogenic processes transportation urban, population production (manufacturing, farming (see unit 181) distribution and services (see unit 174) interactions between socio-economic processes interaction of natural and anthropogenic phenomena (e.g., environmental models, food production, forestry, mining) based on the type of spatial distribution, process models describe the behavior of phenomena represented by: homogeneous or spatially averaged units, e.g. subwatersheds, counties, polygons (sometimes referred to as lumped models) with processes described by ordinary differential equations fields/multivariate functions discretized as rasters, grid cells or meshes (distributed models) with processes described by partial differential equations or cellular automata (see unit 054) Table 1. Representation of phenomena as multivariate fields. networks (systems of nodes and links, see unit 064) points representing individuals and agents combinations of fields, networks and points based on the nature of spatial interactions, (see unit 021 and unit 123) models involve : no spatial interaction, only location dependent behavior short-range, close neighborhood interaction long-range/expanding interaction based on the type of underlying physical or social process, models simulate fluxes (over a surface, through network, in 3D space), including: diffusion, dispersion, advection, convection, reaction, radiation and heat transfer proliferation and decay (chemical processes, radioactive decay) population dynamics (birth/death, competition, predator/prey, epidemics) intelligent agents (systems of independent entities which interact between themselves and with environment with a certain degree of decision making capabilities) based on the spatial extent of modeled phenomena models are local regional global multiscale or nested models (Steyaert 1993 in Goodchild et al. 1993), with high resolution models used to calibrate the large scale, low resolution models, output of large scale models used as an input for small scale models -------------------------------------------------------------------------------- The NCGIA Core Curriculum in GIScience PPPPPPPPPPP http://www.ncgia.ucsb.edu/giscc/ What is GIS? (002), Michael Goodchild 1. Fundamental Geographic Concepts for GIScience (004) 1.1. The World in Spatial Terms (005), ed. Reg Golledge 1.1.1. Human Cognition of the Spatial World (006), Dan Montello 1.1.2. Asking Geographic Questions (007), Tim Nyerges and Reg Golledge 1.2. Representing the earth digitally (008) features, pictures, variables; points, lines, areas, fields, 3D; processes and time 1.3. Position on the earth (012), ed. Ken Foote 1.3.1. Coordinate Systems Overview (013), Peter Dana 1.3.2. Latitude and Longitude (014), Anthony Kirvan 1.3.3. The Shape of the Earth (015), Peter Dana 1.3.4. Discrete Georeferencing (016), David Cowen 1.3.5. Global Positioning Systems Overview (017), Peter Dana 1.4. Mapping the earth (018) 1.4.1. Projections and transformations (019), **from the old CC, see also GC notes 1.4.2. Maps as Representations of the World (020), Judy Olson 1.5. Spatial relationships (021) connections and topology; networks; distance and direction; flow and diffusion; spatial hierarchies; boundaries; spatial patterns; attributes of relationships 1.6. Abstraction and incompleteness (030) 1.6.1. Sampling the World (031), **from the old CC 1.6.2. Line Generalization (034), **from the old CC scale and geographic detail; uncertainty; generalization 2. Implementing Geographic Concepts in GISystems (035) 2.1. Defining characteristics of computing technology (036) 2.1.1. Fundamentals of Data Storage - Carol Jacobson (037) 2.1.2. Algorithms (040) 2.1.2.1. Simple Algorithms for GIS I: Intersection of Lines (184), **from the old CC 2.1.2.2. Simple Algorithms for GIS II: Operations on Polygons, (185) **from the old CC 2.1.2.3. The Polygon Overlay Operation (186), **from the old CC data versus processes; history; object orientation 2.2. Fundamentals of computing systems (042) operating systems; programming languages and software engineering; developing algorithms; user interfaces; computer networks; hardware for GISystems 2.3. Fundamentals of information science (050) 2.3.1. Information Organization and Data Structure (051), Albert Yeung 2.3.2. Non-spatial Database Models (045), Thomas Meyer data modeling 2.4. Representing fields (054), Michael Goodchild 2.4.1. Rasters (055), Michael Goodchild 2.4.2. TINs (056), **from the old CC 2.4.3. Quadtrees and Scan Orders (057), Michael Goodchild polygon coverages 2.5. Representing discrete objects (059) storing relationships; computing relationships; topology for geodata; object hierarchies 2.6. Representing networks (064), Benjamin Zhan 2.7. Representing time and storing temporal data (065) 2.8. Populating the GISystem (066) - see the GC notes and the CCTP creating digital data - sampling the world; remote sensing; GPS as a data source; digitizing and scanning; editing accessing existing data - data exchange; open GIS; finding data; data conversion; transfer standards; distributed networked databases; generating data from existing data metadata 2.9. Kinds of geospatial data (082) 2.9.1. Transportation Networks (183), Val Noronha 2.9.2. Natural Resources Data (090), Peter Schut 2.9.2.1. Soil Data for GIS (091), Peter Schut hydrography; land cover and vegetation; geology; climate; terrain 2.9.3. Land Records - see Unit 164 administrative boundary data; demographic and health data; global data 2.10. Handling uncertainty (096), ed. Gary Hunter (see also GC notes) 2.10.1. Managing Uncertainty in GIS (187), Gary Hunter 2.10.2. Uncertainty Propagation in GIS (098), Gerard Heuvelink 2.10.3. Detecting and Evaluating Errors by Graphical Methods (099), Kate Beard 2.10.4. Data Quality Measurement and Assessment (100), Howard Veregin storing uncertainty information 2.11. Visualization and cartography (101) 2.11.1. cartographic fundamentals (102) - GC notes principles of graphic design; digital output options; scientific visualization; animation and virtual worlds; cognitive basis of visualization 2.12. User interaction (107) user interfaces; forms of user interaction with GIS 2.13. Spatial analysis (110) combining data; map algebra; terrain modeling; finding and quantifying relationships; generalization; spatial statistics; geostatistics; spatial econometrics; spatial interpolation; spatial search; location/allocation; districting; spatial interaction modeling; cellular automata; distance modeling; neighborhood filtering; pattern recognition; genetic algorithms 2.14. Implementation paradigms (126) 2.14.1. Spatial Decision Support Systems (127), Jacek Malczewski - GC notes 2.14.2. Exploratory Spatial Data Analysis (128), Robert Haining and Stephen Wise 2.14.3. Process Modeling and Simulation (130), Lubos Mitas and Helena Mitsova 2.14.4. Multimedia and Virtual Reality (131), George Taylor 2.14.5. WebGIS (133), Kenneth Foote and Anthony Kirvan 2.14.6. Artificial Neural Networks for Spatial Data Analysis (188), Suchi Gopal interoperability; object oriented GIS; knowledge based and expert systems; collaborative spatial decision making 3. Geographic Information Technology in Society (135), Robert Maher 3.1. Making it work (136), Hugh Calkins and others needs assessment; conceptual design of the GIS; survey of available data; evaluating hardware and software; database planning and design; database construction; pilot studies and benchmark tests; acquisition of GIS hardware and software; GIS system integration; GIS application development; GIS use and maintenance 3.2. Supplying the data (143) 3.2.1. Public access to geographic information (190), Albert Yeung 3.2.2. WWW Basics (148), Albert Yeung 3.2.3. Digital Libraries (191), Albert Yeung 3.2.4. Legal Issues (147) - GC notes and old CC transfer standards; national and international data infrastructures; marketing data 3.3. The social context(149) digital democracy; geographic information in decision making; human resources and education; ethics of GIS use 3.4. The industry (154) history and trends; current products and services; careers in GIS 3.5. Teaching GIS (158), David Unwin 3.5.1. Curriculum Design for GIS (159), David Unwin 3.5.2. Teaching and Learning GIS in Laboratories (160), David Unwin 4. Application areas and case studies (161) 4.1. Land Information Systems and Cadastral Applications (164), Steve Ventura 4.2. Precision Agriculture (194), links to material by PrecisionAg.org also: facilities management; network applications; emergency response and E911; recreation, resource management (agriculture, forestry), urban planning and management, environmental health, environmental modeling, emergency management, studying and learning geography, business and marketing (real estate) GIS "hydromonitoring" and optimization model of http://iahs.info/redbooks/a231/iahs_231_0263.pdf Optimization Models For Decision Making http://ioe.engin.umich.edu/people/fac/books/murty/opti_model/ Regression analysis basics http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Regression_analysis_basics Regression analysis allows you to model, examine, and explore spatial relationships, and can help explain the factors behind observed spatial patterns. Regression analysis is also used for prediction. You may want to understand why people are persistently dying young in certain regions, for example, or may want to predict rainfall where there are no rain gauges. OLS is the best known of all regression techniques. It is also the proper starting point for all spatial regression analyses. It provides a global model of the variable or process you are trying to understand or predict (early death/rainfall); it creates a single regression equation to represent that process. Geographically Weighted Regression (GWR) is one of several spatial regression techniques, increasingly used in geography and other disciplines. GWR provides a local model of the variable or process you are trying to understand/predict by fitting a regression equation to every feature in the dataset. When used properly, these methods are powerful and reliable statistics for examining/estimating linear relationships. Linear relationships are either positive or negative. If you find that the number of search and rescue events increases when daytime temperatures rise, the relationship is said to be positive; there is a positive correlation. Another way to express this positive relationship is to say that search and rescue events decrease as daytime temperatures decrease. Conversely, if you find that the number of crimes goes down as the number of police officers patrolling an area goes up, the relationship is said to be negative. You can also express this negative relationship by stating that the number of crimes increases as the number of patrolling officers decreases. The graphic below depicts both positive and negative relationships, as well as the case where there is no relationship between two variables Regression analysis http://en.wikipedia.org/wiki/Regression_analysis In statistics, regression analysis refers to techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps us understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables — that is, the average value of the dependent variable when the independent variables are held fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution. Regression analysis is widely used for prediction (including forecasting of time-series data). Use of regression analysis for prediction has substantial overlap with the field of machine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables. A large body of techniques for carrying out regression analysis has been developed. Familiar methods such as linear regression and ordinary least squares regression are parametric, in that the regression function is defined in terms of a finite number of unknown parameters that are estimated from the data. Nonparametric regression refers to techniques that allow the regression function to lie in a specified set of functions, which may be infinite-dimensional. The performance of regression analysis methods in practice depends on the form of the data-generating process, and how it relates to the regression approach being used. Since the true form of the data-generating process is not known, regression analysis depends to some extent on making assumptions about this process. These assumptions are sometimes (but not always) testable if a large amount of data is available. Regression models for prediction are often useful even when the assumptions are moderately violated, although they may not perform optimally. However when carrying out inference using regression models, especially involving small effects or questions of causality based on observational data, regression methods must be used cautiously as they can easily give misleading results REGRESSION MODELS http://www.psychstat.missouristate.edu/introbook/sbk16.htm Regression models are used to predict one variable from one or more other variables. Regression models provide the scientist with a powerful tool, allowing predictions about past, present, or future events to be made with information about past or present events. The scientist employs these models either because it is less expensive in terms of time and/or money to collect the information to make the predictions than to collect the information about the event itself, or, more likely, because the event to be predicted will occur in some future time. Before describing the details of the modeling process, however, some examples of the use of regression models will be presented. LINEAR TRANSFORMATIONS http://www.psychstat.missouristate.edu/introbook/sbk15.htm A linear transformation is a transformation of the form X'' = a + bX. If a measurement system approximated an interval scale before the linear transformation, it will approximate it to the same degree after the linear transformation. Other properties of the distribution are similarly unaffected. For example, if a distribution was positively skewed before the transformation, it will be positively skewed after. The symbols in the transformation equation, X''i = a + bXi, have the following meaning. The raw score is denoted by Xi, the score after the transformation is denoted by X''i, read X prime or X transformed. The "b" is the multiplicative component of the linear transformation, sometimes called the slope, and the "a" is the additive component, sometimes referred to as the intercept. The "a" and "b" of the transformation are set to real values to specify a transformation. The transformation is performed by first multiplying every score value by the multiplicative component "b" and then adding the additive component "a" to it. For example, the following set of data is linearly transformed with the transformation X''i = 20 + 3*Xi, where a = 20 and b = 3. SCORE TRANSFORMATIONS http://www.psychstat.missouristate.edu/introbook/sbk14.htm If a student, upon viewing a recently returned test, found that he or she had made a score of 33, would that be a good score or a poor score? Based only on the information given, it would be impossible to tell. The 33 could be out of 35 possible questions and be the highest score in the class, or it could be out of 100 possible points and be the lowest score, or anywhere in between. The score that is given is called a raw score. The purpose of this chapter is to describe procedures to transform raw scores into transformed scores. Why Do We Need to Transform Scores? Transforming scores from raw scores into transformed scores has two purposes: 1) It gives meaning to the scores and allows some kind of interpretation of the scores, 2) It allows direct comparison of two scores. For example, a score of 33 on the first test might not mean the same thing as a score of 33 on the second test. The transformations discussed in this section belong to two general types; percentile ranks and linear transformations. Percentile ranks are advantageous in that the average person has an easier time understanding and interpreting their meaning. However, percentile ranks also have a rather unfortunate statistical property which makes their use generally unacceptable among the statistically sophisticated. Each will now be discussed in turn. STATISTICS http://www.psychstat.missouristate.edu/introbook/sbk13.htm A statistic is an algebraic expression combining scores into a single number. Statistics serve two functions: they estimate parameters in population models and they describe the data. The statistics discussed in this chapter will be used both as estimates of m and d and as measures of central tendency and variability. There are a large number of possible statistics, but some are more useful than others. MEASURES OF CENTRAL TENDENCY Central tendency is a typical or representative score. If the mayor is asked to provide a single value which best describes the income level of the city, he or she would answer with a measure of central tendency. The three measures of central tendency that will be discussed this semester are the mode, median, and mean. THE SUMMATION SIGN http://www.psychstat.missouristate.edu/introbook/sbk12.htm It is necessary to enhance the language of algebra with an additional notational system in order to efficiently write some of the expressions which will be encountered in the next chapter on statistics. The notational scheme provides a means of representing both a large number of variables and the summation of an algebraic expression. SUBSCRIPTED VARIABLES Suppose the following were scores made on the first homework assignment for five students in the class: 5, 7, 7, 6, and 8. These scores could be represented in the language of algebra by the symbols: V, W, X, Y, and Z. This method of representing a set of scores becomes unhandy when the number of scores is greater than 26, so some other method of representation is necessary. The method of choice is called subscripted variables, written as Xi, where the X is the variable name and the i is the subscript. The subscript (i) is a "dummy" or counter variable in that it may take on values from 1 to N, where N is the number of scores, to represent which score is being described. In the case of the example scores, then, X1=5, X2=7, X3=7, X4=6, and X5=8. If one wished to represent the scores made on the second homework by these same students, the symbol Yi could be used. The variable Y1 would be the score made by the first student, Y2 the second student, etc. THE SUMMATION SIGN Very often in statistics an algebraic expression of the form X1+X2+X3+...+XN is used in a formula to compute a statistic. The three dots in the preceding expression mean that something is left out of the sequence and should be filled in when interpretation is done. It is tedious to write an expression like this very often, so mathematicians have developed a shorthand notation to represent a sum of scores, called the summation notation. THE NORMAL CURVE http://www.psychstat.missouristate.edu/introbook/sbk11.htm As discussed in the previous chapter, the normal curve is one of a number of possible models of probability distributions. Because it is widely used and an important theoretical tool, it is given special status as a separate chapter MODELS OF DISTRIBUTIONS http://www.psychstat.missouristate.edu/introbook/sbk10.htm GROUPED FREQUENCY DISTRIBUTIONS An investigator interested in finger-tapping behavior conducts the following study: Students are asked to tap as fast as they can with their ring finger. The hand is cupped and all fingers except the one being tapped are placed on the surface. Either the right or the left hand is used, at the preference of the student. At the end of 15 seconds, the number of taps for each student is recorded. Example data using 18 subjects are presented below: 53 35 67 48 63 42 48 55 33 50 46 45 59 40 47 51 66 53 COMPARING FREQUENCY DISTRIBUTIONS When one variable is assumed to be measured on an interval scale, and another is dichotomous, that is, has only two levels, it is possible to illustrate the relationship between the variables by drawing overlapping frequency distributions. In the data presented in the preceding chapter shoe size could be treated as an interval measure and sex was a dichotomous variable with two levels, male and female. The relationship between sex and shoe size is thus an appropriate candidate for overlapping frequency distributions. Overlapping frequency distributions would be useful for two reasons: males wear different styles of shoes than females, and male and female shoe sizes are measured using different scales. FREQUENCY DISTRIBUTIONS As discussed earlier, there are two major means of summarizing a set of numbers: pictures and summary numbers. Each method has advantages and disadvantages and use of one method need not exclude the use of the other. This chapter describes drawing pictures of data, which are called frequency distributions. Measurement http://www.psychstat.missouristate.edu/introbook/sbk06.htm Measurement consists of rules for assigning numbers to attributes of objects based upon rules. The language of algebra has no meaning in and of itself. The theoretical mathematician deals entirely within the realm of the formal language and is concerned with the structure and relationships within the language. The applied mathematician or statistician, on the other hand, is concerned not only with the language, but the relationship of the symbols in the language to real world objects and events. The concern about the meaning of mathematical symbols (numbers) is a concern about measurement. By definition any set of rules for assigning numbers to attributes of objects is measurement. Not all measurement techniques are equally useful in dealing with the world, however, and it is the function of the scientist to select those that are more useful. The physical and biological scientists generally have well-established, standardized, systems of measurement. A scientist knows, for example, what is meant when a "ghundefelder fish" is described as 10.23 centimeters long and weighing 34.23 grams. The social scientist does not, as a general rule, have such established and recognized systems. A description of an individual as having 23 "units" of need for achievement does not evoke a great deal of recognition from most scientists. For this reason the social scientist, more than the physical or biological scientist, has been concerned about the nature and meaning of measurement systems. PROPERTIES OF MEASUREMENT SYSTEMS S. S. Stevens (1951) described properties of measurement systems that allowed decisions about the quality or goodness of a measurement technique. A property of a measurement system deals with the extent that the relationships which exists between the attributes of objects in the "real world" are preserved in the numbers which are assigned these objects. For an example of relationships existing in the "real world", if the attribute in question is height, then objects (people) in the "real world" have more or less of the attribute (height) than other objects (people). In a similar manner, numbers have relationships to other numbers. For example 59 is less than 62, 48 equals 48, and 73 is greater than 68. One property of a measurement system that measures height, then, is whether the relationships between heights in the "real world" are preserved in the numbers which are assigned to heights; that is, whether taller individuals are given bigger numbers. Before describing in detail the properties of measurement systems, a means of symbolizing the preceding situation will be presented. The student need not comprehend the following formalism to understand the issues involved in measurement, but mathematical formalism has a certain "beauty" which some students appreciate. Objects in the real world may be represented by Oi where "O" is a shorthand notation for "object" and "I" is a subscript referring to which object is being described and may take on any integer number. For example O1 is the first object, O2 the second, O3 the third and so on. The symbol M(Oi) will be used to symbolize the number, or measure (M), of any particular object which is assigned to that object by the system of rules; M(O1) being the number assigned to the first object, M(O2) the second, and so on. The expression O1 > O2 means that the first object has more of something in the "real world" than does the second. The expression M(O1) > M(O2) means that the number assigned to the first object is greater than that assigned to the second. THE LANGUAGE OF ALGEBRA This section is intended as a review of the algebra necessary to understand the rest of this book, allowing the student to gauge his or her mathematical sophistication relative to what is needed for the course. The individual without adequate mathematical training will need to spend more time with this chapter. The review of algebra is presented in a slightly different manner than has probably been experienced by most students, and may prove valuable even to the mathematically sophisticated reader. Algebra is a formal symbolic language, composed of strings of symbols. Some strings of symbols form sentences within the language (X + Y = Z), while others do not (X += Y Z). The set of rules that determines which strings belong to the language and which do not, is called the syntax of the language. Transformational rules change a given sentence in the language into another sentence without changing the meaning of the sentence. This chapter will first examine the symbol set of algebra, which is followed by a discussion of syntax and transformational rules. MODELS The knowledge and understanding that the scientist has about the world is often represented in the form of models. The scientific method is basically one of creating, verifying, and modifying models of the world. The goal of the scientific method is to simplify and explain the complexity and confusion of the world. The applied scientist and technologist then use the models of science to predict and control the world. This book is about a particular set of models, called statistics, which social and behavioral scientists have found extremely useful. In fact, most of what social scientists know about the world rests on the foundations of statistical models. It is important, therefore, that social science students understand both the reasoning behind the models, and their application in the world. DEFINITION OF A MODEL A model is a representation containing the essential structure of some object or event in the real world. The representation may take two major forms: 1) Physical, as in a model airplane or architect''s model of a building or 2) Symbolic, as in a natural language, a computer program, or a set of mathematical equations. In either form, certain characteristics are present by the nature of the definition of a model. CHARACTERISTICS OF MODELS 1. Models are necessarily incomplete. Because it is a representation, no model includes every aspect of the real world. If it did, it would no longer be a model. In order to create a model, a scientist must first make some assumptions about the essential structure and relationships of objects and/or events in the real world. These assumptions are about what is necessary or important to explain the phenomena. For example, a behavioral scientist might wish to model the time it takes a rat to run a maze. In creating the model the scientist might include such factors as how hungry the rat was, how often the rat had previously run the maze, and the activity level of the rat during the previous day. The model-builder would also have to decide how these factors interacted when constructing the model. The scientist does not assume that only factors included in the model affect the behavior. Other factors might be the time-of-day, the experimenter who ran the rat, and the intelligence of the rat. The scientist might assume that these are not part of the "essential structure" of the time it takes a rat to run a maze. All the factors that are not included in the model will contribute to error in the predictions of the model. 2. The model may be changed or manipulated with relative ease. To be useful it must be easier to manipulate the model than the real world. The scientist or technician changes the model and observes the result, rather than doing a similar operation in the real world. He or she does this because it is simpler, more convenient, and/or the results might be catastrophic. A race car designer, for example, might build a small model of a new design and test the model in a wind tunnel. Depending upon the results, the designer can then modify the model and retest the design. This process is much easier than building a complete car for every new design. The usefulness of this technique, however, depends on whether the essential structure of the wind resistance of the design was captured by the wind tunnel model. Changing symbolic models is generally much easier than changing physical models. All that is required is rewriting the model using different symbols. Determining the effects of such models is not always so easily accomplished. In fact, much of the discipline of mathematics is concerned with the effects of symbolic manipulation. If the race car designer was able to capture the essential structure of the wind resistance of the design with a mathematical model or computer program, he or she would not have to build a physical model every time a new design was to be tested. All that would be required would be the substitution of different numbers or symbols into the mathematical model or computer program. As before, to be useful the model must capture the essential structure of the wind resistance. The values, which may be changed in a model to create different models, are called parameters. In physical models, parameters are physical things. In the race car example, the designer might vary the length, degree of curvature, or weight distribution of the model. In symbolic models parameters are represented by symbols. For example, in mathematical models parameters are most often represented by variables. Changes in the numbers assigned to the variables change the model. THE LANGUAGE OF MODELS Of the two types of models, physical and symbolic, the latter is used much more often in science. Symbolic models are constructed using either a natural or formal language (Kain, 1972). Examples of natural languages include English, German, and Spanish. Examples of formal languages include mathematics, logic, and computer languages. Statistics as a model is constructed in a branch of the formal language of mathematics, algebra. Natural and formal languages share a number of commonalties. First, they are both composed of a set of symbols, called the vocabulary of the language. English symbols take the form of words, such as those that appear on this page. Algebraic symbols include the following as a partial list: 1, -3.42, X, +, =, >. The language consists of strings of symbols from the symbol set. Not all possible strings of symbols belong to the language. For instance, the following string of words is not recognized as a sentence, "Of is probability a model uncertainty," while the string of words "Probability is a model of uncertainty." is recognized almost immediately as being a sentence in the language. The set of rules to determine which strings of symbols form sentences and which do not is called the syntax of the language. The syntax of natural languages is generally defined by common usage. That is, people who speak the language ordinarily agree on what is, and what is not, a sentence in the language. The rules of syntax are most often stated informally and imperfectly, for example, "noun phrase, verb, noun phrase." The syntax of a formal language, on the other hand, may be stated with formal rules. Thus it is possible to determine whether or not a string of symbols forms a sentence in the language without resorting to users of the language. For example, the string "x + / y =" does not form a sentence in the language of algebra. It violates two rules in algebra: sentences cannot end in "=" and the symbols "+" and "/" cannot follow one another. The rules of syntax of algebra may be stated much more succinctly as will be seen in the next chapter. Both natural and formal languages are characterized by the ability to transform a sentence in the language into a different sentence without changing the meaning of the string. For example, the active voice sentence "The dog chased the cat," may be transformed to the sentence "The cat was chased by the dog," by using the passive voice transformation. This transformation does not change the meaning of the sentence. In an analogous manner, the sentence "ax + ay" in algebra may be transformed to the sentence "a(x+y)" without changing the meaning of the sentence. Much of what has been taught as algebra consists of learning appropriate transformations, and the order in which to apply them. The transformation process exists entirely within the realm of the language. The word proof will be reserved for this process. That is, it will be possible to prove that one algebraic sentence equals another. It will not be possible, however, to prove that a model is correct, because a model is never complete. MODEL-BUILDING IN SCIENCE The scientific method is a procedure for the construction and verification of models. After a problem is formulated, the process consists of four stages. 1. Simplification/Idealization. As mentioned previously, a model contains the essential structure of objects or events. The first stage identifies the relevant features of the real world. 2. Representation/Measurement. The symbols in a formal language are given meaning as objects, events, or relationships in the real world. This is the process used in translating "word problems" to algebraic expressions in high school algebra. This process is called representation of the world. In statistics, the symbols of algebra (numbers) are given meaning in a process called measurement. 3. Manipulation/Transformation. Sentences in the language are transformed into other statements in the language. In this manner implications of the model are derived. 4. Verification. Selected implications derived in the previous stage are compared with experiments or observations in the real world. Because of the idealization and simplification of the model-building process, no model can ever be in perfect agreement with the real world. In all cases, the important question is not whether the model is true, but whether the model was adequate for the purpose at hand. Model-building in science is a continuing process. New and more powerful models replace less powerful models, with "truth" being a closer approximation to the real world. MODELS ADEQUACY AND GOODNESS OF MODELS In general, the greater the number of simplifying assumptions made about the essential structure of the real world, the simpler the model. The goal of the scientist is to create simple models that have a great deal of explanatory power. Such models are called parsimonious models. In most cases, however, simple yet powerful models are not available to the social scientist. A trade-off occurs between the power of the model and the number of simplifying assumptions made about the world. A social or behavioral scientist must decide at what point the gain in the explanatory power of the model no longer warrants the additional complexity of the model. MATHEMATICAL MODELS The power of the mathematical model is derived from a number of sources. First, the language has been used extensively in the past and many models exist as examples. Some very general models exist which may describe a large number of real world situations. In statistics, for example, the normal curve and the general linear model often serve the social scientist in many different situations. Second, many transformations are available in the language of mathematics. Third, mathematics permit thoughts which are not easily expressed in other languages. For example, "What if I could travel approaching the speed of light?" or "What if I could flip this coin an infinite number of times?" In statistics these "what if" questions often take the form of questions like "What would happen if I took an infinite number of infinitely precise measurements?" or "What would happen if I repeated this experiment an infinite number of times?" Finally, it is often possible to maximize or minimize the form of the model. Given that the essence of the real world has been captured by the model, what values of the parameters optimize (minimize or maximize) the model. For example, if the design of a race car can be accurately modeled using mathematics, what changes in design will result in the least possible wind resistance? Mathematical procedures are available which make these kinds of transformations possible. Building a Better Boat - Example of Model-Building Suppose for a minute that you had lots of money, time, and sailing experience. Your goal in life is to build and race a 12-meter yacht that would win the America''s Cup competition. How would you go about doing it? Twelve-meter racing yachts do not have to be identical to compete in the America''s Cup race. There are certain restrictions on the length, weight, sail area, and other areas of boat design. Within these restrictions, there are variations that will change the handling and speed of the yacht through the water. The following two figures (Lethcer, Marshall, Oliver, and Salvesen, 1987) illustrate different approaches to keel design. The designer has the option of whether to install a wing on the keel. If a wing is chosen, the decision of where it will be placed must be made. You could hire a designer, have him or her draw up the plans, build the yacht, train a crew to sail it, and then compete in yachting races. All this would be fine, except it is a very time-consuming and expensive process. What happens if you don''t have a very good boat? Do you start the whole process over again? The scientific method suggests a different approach. If a physical model was constructed, and a string connected to weights was connected to the model through a pulley system, the time to drag the model from point A to point B could be measured. The hull shape could be changed using a knife and various weights. In this manner, many more different shapes could be attempted than if a whole new yacht had to be built to test every shape. One of the problems with this physical model approach is that the designer never knows when to stop. That is, the designer never knows that if a slightly different shape was used, it might be faster than any of the shapes attempted up to that point. In any case the designer has to stop testing models and build the boat at some point in time. Suppose the fastest hull shape was selected and the full-scale yacht was built. Suppose also that it didn''t win. Does that make the model-building method wrong? Not necessarily. Perhaps the model did not represent enough of the essential structure of the real world to be useful. In examining the real world, it is noticed that racing yachts do not sail standing straight up in the water, but at some angle, depending upon the strength of the wind. In addition, the ocean has waves which necessarily change the dynamics of the movement of a hull through water. If the physical model was pulled through a pool of wavy water at an angle, then the simulation would more closely mirror the real world. Assume that this is done, and the full-scale yacht built. It still doesn''t win. What next? One possible solution is the use of symbolic or mathematical models in the design of the hull and keel. Lethcer, et. al. (1987) describe how various mathematical models were employed in the design of Stars and Stripes. The mathematical model uses parameters which allow one to change the shape of the simulated hull and keel by setting the values of the parameters to particular numbers. That is, a mathematical model of a hull and keel shape does not describe a particular shape, but a large number of possible shapes. When the parameters of the mathematical model of the hull shape are set to particular numbers, one of the possible hull shapes is specified. By sailing the simulated hull shape through simulated water, and measuring the simulated time it takes, the potential speed of a hull shape may be evaluated. The advantage of creating a symbolic model over a physical model is that many more shapes may be assessed. By turning a computer on and letting it run all night, hundreds of shapes may be tested. It is sometimes possible to use mathematical techniques to find an optimal model, one that guarantees that within the modeling framework, no other hull shape will be faster. However, if the model does not include the possibility of a winged keel, it will never be discovered. Suppose that these techniques are employed, and the yacht is built, but it still does not win. It may be that not enough of the real world was represented in the symbolic model. Perhaps the simulated hull must travel at an angle to the water and sail through waves. Capturing these conditions makes the model more complex, but are necessary if the model is going to be useful. In building Stars and Stripes, all the above modeling techniques were employed (Lethcer et. al., 1987). After initial computer simulation, a one-third scale model was constructed to work out the details of the design. The result of the model-building design process is history. In conclusion, the scientific method of model-building is a very powerful tool in knowing and dealing with the world. The main advantage of the process is that model may be manipulated where it is often difficult or impossible to manipulate the real world. Because manipulation is the key to the process, symbolic models have advantages over physical models. Statistics Review of Algebra Measurement Frequency Distributions The Normal Curve Statistics First Test Interpretation of Scores Regression Correlation Second Test Logic of Inferential Statistics The Sampling Distribution Some Hypothesis Tests The t-tests REGRESSION MODELS Regression models are used to predict one variable from one or more other variables. Regression models provide the scientist with a powerful tool, allowing predictions about past, present, or future events to be made with information about past or present events. The scientist employs these models either because it is less expensive in terms of time and/or money to collect the information to make the predictions than to collect the information about the event itself, or, more likely, because the event to be predicted will occur in some future time. Before describing the details of the modeling process, however, some examples of the use of regression models will be presented. CORRELATION http://www.psychstat.missouristate.edu/introbook/sbk17.htm The Pearson Product-Moment Correlation Coefficient (r), or correlation coefficient for short is a measure of the degree of linear relationship between two variables, usually labeled X and Y. While in regression the emphasis is on predicting one variable from the other, in correlation the emphasis is on the degree to which a linear model may describe the relationship between two variables. In regression the interest is directional, one variable is predicted and the other is the predictor; in correlation the interest is non-directional, the relationship is the critical aspect. The computation of the correlation coefficient is most easily accomplished with the aid of a statistical calculator. The value of r was found on a statistical calculator during the estimation of regression parameters in the last chapter. Although definitional formulas will be given later in this chapter, the reader is encouraged to review the procedure to obtain the correlation coefficient on the calculator at this time. The correlation coefficient may take on any value between plus and minus one. The sign of the correlation coefficient (+ , -) defines the direction of the relationship, either positive or negative. A positive correlation coefficient means that as the value of one variable increases, the value of the other variable increases; as one decreases the other decreases. A negative correlation coefficient indicates that as one variable increases, the other decreases, and vice-versa. Taking the absolute value of the correlation coefficient measures the strength of the relationship. A correlation coefficient of r=.50 indicates a stronger degree of linear relationship than one of r=.40. Likewise a correlation coefficient of r=-.50 shows a greater degree of relationship than one of r=.40. Thus a correlation coefficient of zero (r=0.0) indicates the absence of a linear relationship and correlation coefficients of r=+1.0 and r=-1.0 indicate a perfect linear relationship. UNDERSTANDING AND INTERPRETING THE CORRELATION COEFFICIENT The correlation coefficient may be understood by various means, each of which will now be examined in turn. HYPOTHESIS TESTING http://www.psychstat.missouristate.edu/introbook/sbk18.htm Hypothesis tests are procedures for making rational decisions about the reality of effects. Rational Decisions Most decisions require that an individual select a single alternative from a number of possible alternatives. The decision is made without knowing whether or not it is correct; that is, it is based on incomplete information. For example, a person either takes or does not take an umbrella to school based upon both the weather report and observation of outside conditions. If it is not currently raining, this decision must be made with incomplete information. A rational decision is characterized by the use of a procedure which insures the likelihood or probability that success is incorporated into the decision-making process. The procedure must be stated in such a fashion that another individual, using the same information, would make the same decision. One is reminded of a STAR TREK episode. Captain Kirk, for one reason or another, is stranded on a planet without his communicator and is unable to get back to the Enterprise. Spock has assumed command and is being attacked by Klingons (who else). Spock asks for and receives information about the location of the enemy, but is unable to act because he does not have complete information. Captain Kirk arrives at the last moment and saves the day because he can act on incomplete information. This story goes against the concept of rational man. Spock, being the ultimate rational man, would not be immobilized by indecision. Instead, he would have selected the alternative which realized the greatest expected benefit given the information available. If complete information were required to make decisions, few decisions would be made by rational men and women. This is obviously not the case. The script writer misunderstood Spock and rational man. Effects When a change in one thing is associated with a change in another, we have an effect. The changes may be either quantitative or qualitative, with the hypothesis testing procedure selected based upon the type of change observed. For example, if changes in salt intake in a diet are associated with activity level in children, we say an effect occurred. In another case, if the distribution of political party preference (Republicans, Democrats, or Independents) differs for sex (Male or Female), then an effect is present. Much of the behavioral science is directed toward discovering and understanding effects. The effects discussed in the remainder of this text appear as various statistics including: differences between means, contingency tables, and correlation coefficients THE SAMPLING DISTRIBUTION The sampling distribution is a distribution of a sample statistic. While the concept of a distribution of a set of numbers is intuitive for most students, the concept of a distribution of a set of statistics is not. Therefore distributions will be reviewed before the sampling distribution is discussed. TESTING HYPOTHESES ABOUT SINGLE MEANS http://www.psychstat.missouristate.edu/introbook/sbk20.htm THE HEAD-START EXPERIMENT Suppose an educator had a theory which argued that a great deal of learning occurrs before children enter grade school or kindergarten. This theory explained that socially disadvantaged children start school intellectually behind other children and are never able to catch up. In order to remedy this situation, he proposes a head-start program, which starts children in a school situation at ages three and four. A politician reads this theory and feels that it might be true. However, before he is willing to invest the billions of dollars necessary to begin and maintain a head-start program, he demands that the scientist demonstrate that the program really does work. At this point the educator calls for the services of a researcher and statistician. Because this is a fantasy, the following research design would probably never be used in practice. This design will be used to illustrate the procedure and the logic underlying the hypothesis test. At a later time, we will discuss a more appropriate design. A random sample 64 four-year old children is taken from the population of all four-year old children. The children in the sample are all enrolled in the head-start program for a year, at the end of which time they are given a standardized intelligence test. The mean I.Q. of the sample is found to be 103.27. On the basis of this information, the educator wishes to begin a nationwide head-start program. He argues that the average I.Q. in the population is 100 (m =100) and that 103.27 is greater than that. Therefore, the head-start program had an effect of about 103.27-100 or 3.27 I.Q. points. As a result, the billions of dollars necessary for the program would be well invested. The statistician, being in this case the devil''s advocate, is not ready to act so hastily. He wants to know whether chance could have caused the large mean. In other words, head start doesn''t make a bit of difference. The mean of 103.27 was obtained because the sixty-four students selected for the sample were slightly brighter than average. He argues that this possibility must be ruled out before any action is taken. If not ruled out completely, he argues that although possible, the likelihood must be small enough that the risk of making a wrong decision outweighs possible benefits of making a correct decision. To determine if chance could have caused the difference, the hypothesis test proceeds as a thought experiment. First, the statistician assumes that there were no effects; in this case, the head-start program didn''t work. He then creates a model of what the world would look like if the experiment were performed an infinite number of times under the assumption of no effects. The sampling distribution of the mean is used as this model. EXPERIMENTAL DESIGNS Before an experiment is performed, the question of experimental design must be addressed. Experimental design refers to the manner in which the experiment will be set up, specifically the way the treatments were administered to subjects. Treatments will be defined as quantitatively or qualitatively different levels of experience. For example, in an experiment on the effects of caffeine, the treatment levels might be exposure to different amounts of caffeine, from 0 to .0375 mg. In a very simple experiment there are two levels of treatment; none, called the control condition, and some, called the experimental condition. The type of analysis or hypothesis test used is dependent upon the type of experimental design employed. The two basic types of experimental designs are crossed and nested CROSSED DESIGNS In a crossed design each subject sees each level of the treatment conditions. In a very simple experiment, such as one that studies the effects of caffeine on alertness, each subject would be exposed to both a caffeine condition and a no caffeine condition. For example, using the members of a statistics class as subjects, the experiment might be conducted as follows. On the first day of the experiment, the class is divided in half with one half of the class getting coffee with caffeine and the other half getting coffee without caffeine. A measure of alertness is taken for each individual, such as the number of yawns during the class period. On the second day the conditions are reversed; that is, the individuals who received coffee with caffeine are now given coffee without and vice-versa. The size of the effect will be the difference of alertness on the days with and without caffeine. The distinguishing feature of crossed designs is that each individual will have more than one score. The effect occurs within each subject, thus these designs are sometimes referred to as WITHIN SUBJECTS designs. Crossed designs have two advantages. One, they generally require fewer subjects, because each subject is used a number of times in the experiment. Two, they are more likely to result in a significant effect, given the effects are real. Crossed designs also have disadvantages. One, the experimenter must be concerned about carry-over effects. For example, individuals not used to caffeine may still feel the effects of caffeine on the second day, even though they did not receive the drug. Two, the first measurements taken may influence the second. For example, if the measurement of interest was score on a statistics test, taking the test once may influence performance the second time the test is taken. Three, the assumptions necessary when more than two treatment levels are employed in a crossed design may be restrictive. NESTED DESIGNS In a nested design, each subject receives one, and only one, treatment condition. The critical difference in the simple experiment described above would be that the experiment would be performed on a single day, with half the individuals receiving coffee with caffeine and half without receiving caffeine. The size of effect in this case is determined by comparing average alertness between the two groups. The major distinguishing feature of nested designs is that each subject has a single score. The effect, if any, occurs between groups of subjects and thus the name BETWEEN SUBJECTS is given to these designs. The relative advantages and disadvantages of nested designs are opposite those of crossed designs. First, carry over effects are not a problem, as individuals are measured only once. Second, the number of subjects needed to discover effects is greater than with crossed designs. Some treatments by their nature are nested. The effect of sex, for example, is necessarily nested. One is either a male or a female, but not both. Current religious preference is another example. Treatment conditions which rely on a pre-existing condition are sometimes called demographic or blocking factors. http://www.psychstat.missouristate.edu/introbook/sbk22.htm - crossed NESTED t-TESTS -http://www.psychstat.missouristate.edu/introbook/sbk23.htm ANALYSIS OF CROSSED DESIGNS THE t DISTRIBUTION The t distribution is a theoretical probability distribution. It is symmetrical, bell-shaped, and similar to the standard normal curve. It differs from the standard normal curve, however, in that it has an additional parameter, called degrees of freedom, which changes its shape. ONE AND TWO-TAILED t-TESTS - http://www.psychstat.missouristate.edu/introbook/sbk25.htm ERRORS IN HYPOTHESIS TESTING http://www.psychstat.missouristate.edu/introbook/sbk26.htm A superintendent in a medium size school has a problem. The mathematical scores on nationally standardized achievement tests such as the SAT and ACT of the students attending her school are lower than the national average. The school board members, who don''t care whether the football or basketball teams win or not, is greatly concerned about this deficiency. The superintendent fears that if it is not corrected, she will loose her job before long. As the superintendent was sitting in her office wondering what to do, a salesperson approached with a briefcase and a sales pitch. The salesperson had heard about the problem of the mathematics scores and was prepared to offer the superintendent a "deal she couldn''t refuse." The deal was teaching machines to teach mathematics, guaranteed to increase the mathematics scores of the students. In addition, the machines never take breaks or demand a pay increase. The superintendent agreed that the machines might work, but was concerned about the cost. The salesperson finally wrote some figures. Since there were about 1000 students in the school and one machine was needed for every ten students, the school would need about one hundred machines. At a cost of $10,000 per machine, the total cost to the school would be about $1,000,000. As the superintendent picked herself up off the floor, she said she would consider the offer, but didn''t think the school board would go for such a big expenditure without prior evidence that the machines actually worked. Besides, how did she know that the company that manufactures the machines might not go bankrupt in the next year, meaning the school would be stuck with a million dollar''s worth of useless electronic junk. The salesperson was prepared, because an offer to lease ten machines for testing purposes to the school for one year at a cost of $500 each was made. At the end of a year the superintendent would make a decision about the effectiveness of the machines. If they worked, she would pitch them to the school board; if not, then she would return the machines with no further obligation. An experimental design was agreed upon. One hundred students would be randomly selected from the student population and taught using the machines for one year. At the end of the year, the mean mathematics scores of those students would be compared to the mean scores of the students who did not use the machine. If the means were different enough, the machines would be purchased. The astute student will recognize this as a nested t-test. In order to help decide how different the two means would have to be in order to buy the machines, the superintendent did a theoretical analysis of the decision process. This analysis is presented in the following decision box. http://www.psychstat.missouristate.edu/introbook/sbk27.htm http://www.psychstat.missouristate.edu/introbook/sbk26.htm CHI-SQUARE AND TESTS OF CONTINGENCY TABLES Hypothesis tests may be performed on contingency tables in order to decide whether or not effects are present. Effects in a contingency table are defined as relationships between the row and column variables; that is, are the levels of the row variable diferentially distributed over levels of the column variables. Significance in this hypothesis test means that interpretation of the cell frequencies is warranted. Non-significance means that any differences in cell frequencies could be explained by chance. Hypothesis tests on contingency tables are based on a statistic called Chi-square. In this chapter contingency tables will first be reviewed, followed by a discussion of the Chi-squared statistic. The sampling distribution of the Chi-squared statistic will then be presented, preceded by a discussion of the hypothesis test. A complete computational example will conclude the chapter. REVIEW OF CONTINGENCY TABLES Frequency tables of two variables presented simultaneously are called contingency tables. Contingency tables are constructed by listing all the levels of one variable as rows in a table and the levels of the other variables as columns, then finding the joint or cell frequency for each cell. The cell frequencies are then summed across both rows and columns. The sums are placed in the margins, the values of which are called marginal frequencies. The lower right hand corner value contains the sum of either the row or column marginal frequencies, which both must be equal to N. For example, suppose that a researcher studied the relationship between having the AIDS Syndrome and sexual preference of individuals. The study resulted in the following data for thirty male subjects: AIDS NY Y N N N Y N N N Y N N N Y N N N N N N N Y N Y Y N Y N Y N M B F F B F F F M F F F F B F F B F M F F M F B M F M F M F TESTING A SINGLE CORRELATION COEFFICIENT - http://www.psychstat.missouristate.edu/introbook/sbk29.htm Interoperable GIS and Spatial Process Modelling http://www.geocomputation.org/1997/papers/marr.pdf GEM 402 Geospatial Technologies For Environmental Mapping with GIS • Understand fundamental geospatial concepts and their relationship • Use topographic maps along or as a basement for thematic mapping in their environmental studies • Have the ability to recognize and interpret objects on aerial photographs • Design and create artistically aesthetic and technically sound thematic maps, using fundamental cartographic concepts and map design principles • Apply different sampling techniques depending on goals of study • Understand principles of GPS and mobile mapping and its applicability for environmental studies • Have the ability to discuss complicity of issues related to integration of geospatial data, techniques and methods for applied environmental studies http://learn.environment.utoronto.ca/distance-education/courses/gem-402-geospatial-technologies-for-environmental-mapping-with-gis.aspx GEM 404 GIS Modeling for Environmental Applications http://learn.environment.utoronto.ca/distance-education/courses/gem-404-gis-modeling-for-environmental-applications.aspx Spatial Modeling in GIS: GIS Modeling Theory, Classifying Models, Modeling Process Modeling of Spatial Distribution, Pattern and Density: Distance statistics and metric, Nearest neighbour methods, Autocorrelation, Moran coefficients, Significant test, Kernel density Cost Movement Analysis: Distance operators, Cost distance, Weighted distance, Path distance, Distance and cost allocation Correlation and Regression Spatial Analysis: Aspatial correlation, Spatial correlation of two surfaces, Regression and Factor analysis Transportation Network Analysis in GIS: Optimal Routing, Finding Closest Facilities, Resource Allocation, Location/Allocation Utility Network Modeling with GIS: Geometrical network, utility analysis for engineering network (water, sewerage) Hydrological Analysis in GIS: DTMs for Hydrology, Watershed modeling, Raster and Vector Hydrological Analysis, Integrated Hydrological Systems Animal Movement Analysis with GIS: Compositional analysis of habitat use, movement path modeling, techniques such as spider diagram, convex probability polygons etc. Air Pollution Modeling with GIS: Modeling source factors, atmosphere factors and environmental factors, statistical and dynamic models Modeling of Environment Spatial Databases: Spatial database modeling process, geodatabase model schemas samples Environmental Modelling and Monitoring http://www.geos.ed.ac.uk/geography/research/EMMGIS/ Environmental Monitoring This interdisciplinary strand involves two main themes: neotropical biogeography and the remote sensing of aquatic systems and vegetation. Biogeography research is concentrated in the Yucatбn (Central America) and the forests and savannas of Brazil, Guyana and Southern America. Collaborative work is investigating how variations in soil properties can explain the spatial distribution of vegetation and form a basis for land development planning and conservation. Published highlights have covered the dynamics of forest-savanna boundaries, environmental controls on mangrove distribution and coastal wetlands, the impact of forest clearance in the Amazon, bio-indicators of human disturbance over tropical coastlands and the impact of historic and prehistoric vegetation change on plant diversity. Lacustrine cores from Belize are bringing new understanding to the balance between human occupation and vegetation. In terms of the significance for global change, for example, there is growing evidence that the biodiversity of Belizean forests owes much to Mayan agricultural occupation. The use of remote sensing in measuring biological activities is principally focused on high-spectral resolution and radar remote sensing techniques with particular attention to understanding reflectance/backscatter signals. Current projects involve the recognition of algal types through spectral differences, better understanding of in-lake processes through integrating remote sensing with 3-dimensional models of lake dynamics, reflectance modelling using Monte Carlo techniques and the remote sensing of freshwater and marine macrophytic vegetation involving both field-based and modelling approaches. Environmental Modelling Modelling of environmental systems currently focuses on the impacts of climate change on glacial ice sheets, the interaction of climate change, glacial and tectonic forces on geomorphological development, acid deposition modelling, modelling of snow transport in real time for use in forecasting and the modelling of water quality and quantity. Applications include the use of geostatistical modelling techniques to understand the spatial scales of dynamics in lakes as determined using remote sensing techniques. Acid deposition modelling focuses on the UK within the broader UN ECE. Much of our work has a clear policy orientation (DETR, Environment Agency), and also considers the scientific bases of modelling using improved understanding of the processes of emission, transformation and deposition of acidifying pollutants (NERC). Future developments will include better representation of O3 chemistry to explore the likely effects of combined NOx/VOC control strategies. Systems Technology Research is investigating the application of parallel processing computer architectures and techniques for meeting the increasing performance demands placed on GIS. Applications include dedicated parallel algorithms, alternative approaches to performance optimisation on parallel architectures. This work is being undertaken in conjunction with the Edinburgh Parallel Computing Centre. Staff are examining other methods for extending GIS theory and the capabilities of GIS technologies. These include database design and management, the specific design of user interfaces, integration of real-time GIS with the World Wide Web, spatial analysis techniques and improved visualisation and generalisation methods. Remote Sensing New developments in remote sensing are particularly taking place in the fields of radar remote sensing and the acquisition of optical data at higher spectral and spatial resolutions. Similarly, the development and availability of airborne imaging spectrometry and field spectroscopy has established these as valuable research areas. Aspects of these developments are being explored by staff. Integrated remote sensing and GIS data are also widely used to support the activities of other Department research groups. Current research projects include applications of high spectral resolution remote sensing techniques in freshwater and marine environments, GIS for conservation planning and forest management in Belize and in Brazil. Other areas of interest include techniques for improved classification and visualisation of remotely sensed imagery; the integration of remotely sensed data within GIS; applications of airborne imaging spectrometry and radar polarimetry; and improved algorithms for inverse modelling. Application of an Emission Inventory GIS-Based Tool http://www.epa.gov/ttn/chief/conference/ei16/session9/altena.pdf GIS applications in air pollution modeling http://www.gisdevelopment.net/application/environment/air/mi03220.htm Modelling urban transportation emissions: role of GIS http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V9K-482YTP2-3&_user=5719616&_rdoc=1&_fmt=&_orig=search&_sort=d&_docanchor=&view=c&_acct=C000068207&_version=1&_urlVersion=0&_userid=5719616&md5=6a7fb864133850d89da65805002a546d Global Inventories http://www.geiacenter.org UK National Inventory http://www.naei.org.uk Manchester Air Quality Management http://www.mapac.org.uk/mapac_frame_airquality.htm ADMS-Urban Model http://www.cerc.co.uk Forest Fire Management Using Geospatial Information System http://www.gisdevelopment.net/application/natural_hazards/fire/me05_032.htm Modelling the Spatial Dynamics and Social Interaction of Human Recreators Using GIS http://www.srnr.arizona.edu/~gimblett/modsim97.html programming the gis-user interface to operationalize model http://www.state.in.us/indot/files/section_3.pdf Modelling examples http://educationally.narod.ru/gismodellingphotoalbum.html Incorporating GIS and MCE for Suitability Assessment Modelling of Coral Reef Resources http://www.ingentaconnect.com/content/klu/emas/2006/00000114/F0030001/00004628 Abstract: Assessment, planning and management for coral reef ecosystems are particularly challenging tasks, especially in developing countries. In this study, a methodological approach which integrates Geographic Information Systems (GIS) and Multicriteria Evaluation (MCE) for the development of suitability assessment models for a variety of uses of coral reef resources is discussed. Such an approach is sustained by an extensive use of local expert knowledge coded in the form of automated decision trees (DTs). The usefulness of the approach and the models developed is demonstrated by their application to the participatory assessment and resources planning at “Alacranes Reef National Park” (ARNP), Yucatбn, Mйxico. Overlaying of resulting suitability maps was also applied for identifying potential conflicting areas. Keywords: coral reefs management; decision trees; expert knowledge; GIS; multicriteria evaluation; suitability assessment Classical Boiotia through the GIS http://www.uam.es/proyectosinv/sterea/beocia/boiotia_gis.htm Multi-criteria analysis in GIS environment for natural resource development http://www.gisdevelopment.net/application/geology/mineral/geom0002.htm GIS-based analysis of spatial data has been a new specialized process, capable of analyzing complex problem of evaluating and allocating natural resources for targeting potential areas for mineral exploration. This paper explains developing a data-driven decision-tree approach with multi-criteria evaluations in mineral potential mapping at the Hutti-Maski schist belt. An inference network based spatial data integration has been attempted which allows for incorporation of uncertainties into a predictive model. The procedure has produced a posterior probability map identifying favorable areas for gold exploration GIS in Natural Resources Development Land resource evaluation and allocation is one of the most fundamental activities of resource development (FAO, 1976). With the advent of GIS, there is ample of opportunities for a more explicitly reasoned land evaluation. Prediction of suitable areas for mineral exploration in a virgin area of specific type are problems that require use of various procedures and tools for development of decision rule and the predictive modeling of expected outcomes. GIS has come out as an emerging tool to address the need of decision makers and to cope with problems of uncertainties. A decision rule typically contains procedures for combining criteria into a single composite index and a statement of how alternatives are to be compared using this index. It is as simple as threshold applied to a single criterion. It is structured in the context of a specific objective. An objective is thus a perspective that serves to guide the structuring of decision rules. To meet a specific objective, it is frequently the case that several criteria will need to be integrated and evaluated, called multi-criteria evaluations. Weighted linear combinations and concordance-disconcordance analysis (Voogd, 1983 and Carver, 1991) are two most common procedures in GIS based multi-criteria evaluations. In the former, each factor is multiplied by a weight and then summed to arrive at a final suitability index, while in the later, each pair of alternatives is analyzed for the degree to which it outranks the other on the specified criteria. The former is straight forward in a raster GIS, while the later is computationally impractical when a large number of alternatives are present. Information vital to the process of decision support analysis, is rarely perfect in earth sciences. This leads to uncertainties, which arises from the manner in which criteria are combined and evaluated to reach a decision. When uncertainty is present, the decision rule needs to incorporate modifications to the choice function or heuristic to accommodate the propagation of uncertainty through the rule and replace the hard decision procedures of certain data with soft-data of uncertainty. Bayesian probability theory (Bonham-Carter et al., 1988; 1990; 1995), Dempster-Shafer theory (Cambell et al., 1982) and fuzzy set theory (Duda et al., 1977) have been extensively in use in mineral targeting. Theory of multi-criteria evaluation Multi-criteria evaluation is primarily concerned with how to combine the information from several criteria to form a single index of evaluation. In case of Boolean criteria (constraints), the solution usually lies in the union (logical OR) or intersection (logical AND) of conditions. However, for continuous factors, a weighted linear combination (Voogd, 1983) is a usual technique. As the criteria are measured at different scales, they are standardised and transformed such that all factor maps are positively correlated with suitability. Establishing factor weights is the most complicated aspect, for which the most commonly used technique is the pair-wise comparison matrix. Evaluation of the relationship between evidence (criteria) and belief is a forward chaining expert system. In this system the propagation of favourability measure through the inference network may include the Bayesian updating and fuzzy logic for computation of posterior values of favourability given evidence(s). In the real world, the evidences and hypotheses are uncertain. We cope with the problem by assigning probability values to evidences (Duda et al., 1977). There is unidirectional propagation of evidences through a hierachial network carries on towards an ultimate hypotheses. Integrating spatial multi-criteria evaluation and expert knowledge for GIS-based habitat suitability modelling http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V91-43C5F1N-1&_user=5719616&_rdoc=1&_fmt=&_orig=search&_sort=d&_docanchor=&view=c&_acct=C000068207&_version=1&_urlVersion=0&_userid=5719616&md5=7ffd6183f666fae46eff2a2958015856 GIS data processing and spatial analysis, together with modern decision analysis techniques, were used in this study to improve habitat suitability evaluation over large areas. Both empirical evaluation models and models based on expert knowledge can be applied in this approach. The habitat requirements of species were described as map layers within GIS so that each map layer represented one criterion. GIS was used as the platform in managing, combining and displaying the criterion data and also as a tool for producing new data, especially by utilising spatial analysis functions. Criterion standardisation, weighting and combining were accomplished by means of multi-criteria evaluation (MCE) methods, the theoretical background being based on the multi-attribute utility theory (MAUT). By using continuous priority and sub-priority functions in the evaluation, no classification of continuous attributes was needed and also non-linear relationships between habitat suitability and the attributes could be considered. Sensitivity analysis was applied to consider the temporal factor in the analysis and to find out the effect of different criteria weights on the spatial pattern of the suitability index. Changing the weights of permanent and time-changeable habitat factors shifted the location of optimal habitats for the species. In the long run, permanent factors such as soil properties define the habitat potential, which is important to take into consideration; e.g. in forest management planning and species conservation. The method is illustrated by a case study in which habitat suitability maps were produced for an old-forest polypore, Skeletocutis odora. Author Keywords: Habitat suitability modelling; Multi-criteria evaluation; Geographical information systems; Expert knowledge; Sensitivity analysis; Skeletocutis odora Article Outline 1. Introduction 2. Methods 2.1. The general outline 2.2. Assessment of the suitability structure 2.3. Producing map layers 2.4. Cartographic modelling 2.5. Sensitivity analysis 3. Case study 3.1. Study area and general course of case study 3.2. Estimation of habitat suitability indices 3.2.1. Data procurement 3.2.2. Overlay analysis 3.3. Sensitivity analysis 4. Discussion References Vitae Multi-Criteria and Multi-Objective Desision http://books.google.com/books?hl=en&lr=&id=_dDLF6Q_zUcC&oi=fnd&pg=PA227&dq=multi-criteria+evaluation+gis&ots=nN0g_q4qTK&sig=cOLguTvurG1erDWfcU_02JKpmwg#v=onepage&q=multi-criteria%20evaluation%20gis&f=false Multi-criteria modell http://books.google.com/books?hl=en&lr=&id=tm3ZJfDgU_gC&oi=fnd&pg=PA127&dq=multi-criteria+evaluation+gis&ots=IPCwzAXDKh&sig=LhG8o2U7Wc0UEOTXwhOjVybgDS4#v=onepage&q=multi-criteria%20evaluation%20gis&f=false Implementing MCE: Guidelines for applying multi-criteria analysis http://books.google.com/books?hl=en&lr=&id=zSwnRW3UnrcC&oi=fnd&pg=PA5&dq=implementing+multi-criteria++gis&ots=J_i8nOh0Y7&sig=oH-Y9VkIl0pXlM8_Dg09uBs73ms#v=onepage&q=&f=false Integrating geographical information systems and multiple criteria decision-making methods http://books.google.com/books?hl=en&lr=&id=qR0vfnwVuU0C&oi=fnd&pg=PA265&dq=implementing+multi-criteria++gis&ots=0v-M2tkM7O&sig=uN5nAsr6MF-vzhgtHxV3DAW36yM#v=onepage&q=implementing%20multi-criteria%20%20gis&f=false Technique for MCE: 1.Selection of criteria - define a problem 2.Standartization of criteria scores for mixed data sources 3.Allocation of weights 4.Applying the MCE algorithm MCE techniques - related fields, conflicted fields Public Participation GIS http://en.wikipedia.org/wiki/Public_Participation_GIS Public Participation Geographic Information Systems (PPGIS) was born, as a term, in 1996 at the meetings of the National Center for Geographic Information and Analysis (NCGIA).[1] [2] PPGIS is meant to bring the academic practices of GIS and mapping to the local level in order to promote knowledge production. The idea behind PPGIS is empowerment and inclusion of marginalized populations, who have little voice in the public arena, through geographic technology education and participation. PPGIS uses and produces digital maps, satellite imagery, sketch maps, and many other spatial and visual tools, to change geographic involvement and awareness on a local level. Public Participation GIS http://www.crssa.rutgers.edu/ppgis/ What is PPGIScience? Public participatory geographic information science is (1) a study of the uses and applications of geographic information and/or geographic information systems technology (2) used by members of the public, both as individuals and grass-root groups, (3) for participation in the public processes (data collection, mapping, analysis and/or decision-making) affecting their lives. Public Participation GIS Guide principlles http://www.iapad.org/ppgis_principles.htm Is an interdisciplinary research, community development and environmental stewardship tool grounded in value and ethical frameworks that promote social justice, ecological sustainability, improvement of quality of life, redistributive justice, nurturing of civil society, etc; Is validly practiced in streams relating to place (urban, rural), organizational context (community-based organization, grassroots group, non-governmental organization, local government, regional government, state/provincial government), or sector (transportation, watershed restoration, food security, housing, public health, etc.); Endeavors to involve youth, elders, women, First Nations and other segments of society that are traditionally marginalized from decision making processes; Is both functionally and holistically based, that is, can be applied to help solve problems in specific sectors of society, and/or to provide broader integrated assessments of place-based or bioregional identity; Is best applied via partnerships developed between individuals, communities, non-governmental organizations, academic institutions, religious or faith-based institutions, governments and the private sector; Endeavors to always include a strong capacity building dimension in its application; Is linked to social theories and methods originating in planning, anthropology, geography, social work and other social sciences; Is linked to applied qualitative research tools including participatory action research, grounded research, participatory rural appraisal, etc; Is a tool that is best applied in a wide variety of manual, digital, 2-and 3-dimensional formats and data types (digital, oral, image); Enables public access to cultural, economic and biophysical data generated by governments, private sector organizations and academic institutions; Supports a range of interactive approaches from face-to-face contact to web-based applications; Promotes development of software that is accessible to broad acquisition and ease of use; Supports lifelong learning of its practitioners in a manner that helps to bridge the divides that exist between cultures, academic disciplines, gender and class; Is about sharing the challenges and opportunities of place and situation in a transparent and celebratory manner. Empowerment, Marginalization And Public Participation GIS http://www.ncgia.ucsb.edu/varenius/ppgis/ GIS is alternatively seen as a powerful tool for empowering communities or as an invasive technology that advantages some people and organizations while marginalizing others. This is a critical issue which divides both academicians and thoughtful critics of society. "GIS and Society" is therefore one of the top GIS research issues facing this country, as determined by UCGIS, the University Consortium for Geographic Information Science. The NCGIA addressed the issue in Initiative-19 (March 1996) as well. In neither of those cases was there a successful merger of the two positions, merely an acknowledgment of the validity of the stance of the other. This initiative will examine the two-edged nature of the GIS sword by defining and executing research projects that involve researchers looking critically at the use of GIS by community groups or by others using the technology in ways that impact individuals and communities. Another of the major themes which arose from Initiative 19, "GIS and Society," was the nature of alternative GIS designs which might better reflect community interests and empower its members. This topic was explored at the Minnesota specialist meeting in the context of public participation, then more deeply in a subsequent workshop to define the characteristics of this alternative GIS, sometimes called GIS2, in Orono Maine in July 1996. This topic continues to be salient as we try to understand the nature of distortion enforced by the map metaphor of reality and enhance the technology to incorporate a broader spectrum of ways of knowing. Finally, COST-UCE C4 Urban Civil Engineering, Information Systems, European project for Cooperation in the field of Scientific and Technical Research held a February 1998 workshop on Groupware for Urban Planning which included a component on public participation. NCGIA was a cosponsor of this workshop. Collectively, we use the term Public Participation GIS (PPGIS) to cover the range of topics raised by the intersection of community interests and GIS technology. The research agendas generated by the various meetings listed above provide the foundation to explore, in context, the relationships between GIS and communities. This initiative is concerned with the social, political, historical, and technological conditions in which GIS both empowers and marginalizes individuals and communities. Included in this list of research agendas are the following potential topics: ''successful'' implementations of a public participatory GIS changes in local politics and power relationships resulting from the use of GIS in spatial decision making what community groups need in the way of information and the role GIS plays or could play in meeting this need current attempts to use GIS to ''empower'' communities for spatial decision making the impacts on communities of differential access to hardware, software, data, and expertise in GIS production and use the educational, social, political, and economic reasons for lack of access and exemplary ways communities have overcome these barriers the implications of map-based representations of information for community groups the nature of GIS knowledge distortion from ''grassroots'' perspectives the value of sophisticated analyses for understanding key issues as opposed to the negative impact of such analyses in confusing and marginalizing individuals and groups implications of conflicting knowledge and multiple realities for spatial decision making the ways in which socially differentiated communities and their local knowledge are or might be represented within GIS GIS as local surveillance identify which public data policies have positive influences on small neighborhood businesses and which are negative develop prospective models (economic, organizational, legal, and technological) that might result in increased and more equitable opportunities among the diverse segments of society in accessing geographic information and toolsA specialist meeting will be held to sort through the various research agendas, prioritize them, and write a proposal(s) to fund a limited number of studies that will illuminate the most important issues. other tracks at the meeting may attempt to identify new issues, such as collaborative decision-making involving the public. The specialist meeting will involve representatives of the academic community, as well as representatives from government and the nonprofit community. The core planning group met in Boston on March 28, 1998 in conjunction with the annual conference of the Association of American Geographers. The group decided its primary concern was to learn from those using GIS and information technology to support the community in the decision-making process. To this end, we envisage three activities as part of this initiative. Identification of Major Community IT/GIS Activities Around the Country This effort will build on an inventory completed by the Urban Institute a few years ago in its National Neighborhood Indicators Project. If will go beyond efforts supported by local government to identify significant efforts by academic and non-governmental organizations (NGOs). this inventory will be completed by a graduate student, working under the direction of the core planning group, during the summer of 1998. Specialist Meeting Through an open, widely distributed solicitation we will attempt to attract professionals who have been deeply involved in a rich array of experiences. Acceptance into participation in this specialist meeting includes a requirement to write a paper reflecting on these experiences; we hope to publish a collection of these papers as a book on PPGIS. A second major activity of this specialist meeting is planning for the conference described below. This meeting will be held in Santa Barbara in October 1998. A Large Conference of Participants in Major PPGIS Activities A major conference will be held summer or fall of 1999 featuring speakers who have been involved in a wide range of community IT/GIS activities. tentatively we intend to feature activities from the following types of communities: urban neighborhoods, indigenous people, third world, and environmental. Speakers will be selected who represent a rich array of experiences in their communities. We hope to raise funds to cover expenses for multiple individuals from each of the selected sites, representing a range of experiences from technical to policy to citizen Public Participation GIS Research and Agricultural Farmworkers http://proceedings.esri.com/library/userconf/proc08/papers/papers/pap_1703.pdf Tools for web-based GIS mapping of a “fuzzy” vernacular geography http://www.geocomputation.org/2003/Papers/Waters_Paper.pdf The vast majority of people don’t use a scientific geographical vocabulary, nevertheless most use a wide variety of geographical terms on a day to day basis. Identifiers like “Downtown” or “The grim area around the docks” are part of a vernacular geographical terminology which is vastly more used than the coordinate systems and scientifically defined variables so beloved of professional geographers. These terms not only identify areas, but give members of our sociolinguistic group information about them, building up a jointly-defined cultural world-view within which we all act on a daily basis. Despite its importance for policy making and quality of life, attention is rarely paid to this vernacular geography because it is so hard to capture and use. This paper presents a new set of tools for capturing these “fuzzy” psychogeographical areas and their associated attributes, through a web based mapping system. The system contains a spraycan tool that allows users to tag information onto diffuse areas of varying density. An example of their use to define areas people consider are “high crime” within a UK city is also presented, along with users’ responses to the system. Such a system aims to pull together professional and popular geographical understanding, to the advantage of both. Mapping vernacular geography: web-based GIS tools for capturing ''fuzzy'' or ''vague'' entities http://inderscience.metapress.com/app/home/contribution.asp?referrer=parent&backto=issue,3,7;journal,9,31;linkingpublicationresults,1:110893,1 Mapping vernacular geography: web-based GIS tools for capturing ''fuzzy'' or ''vague'' entities A.J. Evans A1 and T. Waters A2 A1 Centre for Computational Geography, School of Geography, University of Leeds, Leeds LS2 9JT, UK. A2 Policy and Research Unit, City of Bradford Metropolitan District Council, Jacobs Well, Bradford, West Yorkshire, BD1 5RW, UK Abstract: Most people do not use a formal geographical vocabulary, however they do use a wide variety of geographical terms on a daily basis. Identifiers such as ''Downtown'' are components of a vernacular geography which is vastly more used than the coordinates and scientifically defined variables beloved of most professional analysts. Terms like these build into the jointly defined world-views within which we all act. Despite its importance for policymaking and quality of life, attention is rarely paid to this vernacular geography because it is hard to capture and use. This paper presents tools for capturing this geography, an example of the tools'' use to define ''High Crime'' areas, and an initial discussion of the issues surrounding vernacular data. While the problems involved in analysing such data are not to be underestimated, such a system aims to pull together professional and popular geographical understanding, to the advantage of both. -------------------------------------------------------------------------------- Keywords: vague entities, fuzzy entities, geographical information systems, web based GIS, vernacular geography, high crime areas, internet Fuzzy GIS http://www.springerlink.com/content/x27176085572g456/ Cartographic Symbols & Map Symbols Library http://www.map-symbol.com/sym_lib.htm "In cartography, symbols are everything. The very nature of a map as an abstract representation of the Earth requires symbols to perform the abstraction. To not have symbols is to not have maps." The Digital Wisdom cartographic symbol library provide graphic artists and mapmakers with a complete, well-designed contemporary set of map symbols packaged into an Illustrator vector library covering most of the features that need symbolizing on a map. Each library contains point, line and area symbols (including colors and patterns) which are fully editable and scalable. The design of each and every symbol has been very carefully thought through to allow intuitive recognition of the main features of a map and use a minimal footprint to cut down on map clutter. These beautifully designed point, line and area symbols and color sets for both Illustrator, FreeHand and Corel Draw, are compatible with their earlier versions wherever possible, but also recognizing that a higher level of cartographic presentation, quality and efficiency possible with Illustrator 10 should be a key feature. This allows the definition of points and symbols to a higher level, depicting the purpose and function of a wide range of activities, specifically geared towards tourism, travel, orientation, navigation and entertainment. Type: point, line, area Size, density, colour, shape, texture, orientation Arc/Info & ArcView Symbol Sets http://www.mapsymbols.com/symbols2.html Cartographic Symbol Construction http://mapserver.org/mapfile/symbology/construction.html Cartographic representations http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?id=180&pid=174&topicname=Cartographic_representations Using map layers in ArcMap allows you to assign symbols and labels to the underlying feature geometry. Yet sometimes, you''ll need additional control over how to portray cartographic representations in the map, and often the freedom to use a depiction that employs map representations that differ from your GIS feature geometries. One of the mechanisms that can be used to portray features using map layers in ArcMap is to use cartographic representations. A cartographic representation is a set of rules, overrides, and graphical edits that allow you to represent features cartographically without having to modify the underlying feature geometry. In the example above, you can see the geographic features for roads and how their cartographic representations in the map differ from the GIS feature geometry Choropleth map http://en.wikipedia.org/wiki/Choropleth_map A choropleth map (Greek ÷ùñá + ðëçèáßí:, ("area/region" + "multiply") is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income. The choropleth map provides an easy way to visualize how a measurement varies across a geographic area or it shows the level of variability within a region. Choice of shading patterns Choice of classification system Choice of spatial unit Thematic maps: their design and production http://books.google.com/books?id=9ho-AAAAIAAJ&pg=PA50&lpg=PA50&dq=chouces+in+choropleth+mapping&source=bl&ots=8wmam_-OqA&sig=9eMtiwvjmOvIywDmdBtIPAOg_Zo&hl=en&ei=9-DTStqoEIbAsQO9rZ3PCg&sa=X&oi=book_result&ct=result&resnum=1&ved=0CAwQ6AEwAA#v=onepage&q=&f=false Choropleth maps http://www.esds.ac.uk/international/support/user_guides/gisoverview.asp#choropleth This cartographic visualisation technique involves displaying numeric attribute values as colours or shades. Using this method it is easy to compare and view the relationship of neighbouring objects. CommonGIS supports three types of choropleth map; i) Unclassified, ii) Classified and iii) Multiple. Unclassified choropleth map In an unclassified choropleth map the numeric attribute values are converted directly into proportional degrees of darkness, i.e. the higher the object''s attribute value, the darker the shade representing it. The diagram below shows the fertility rate attribute values for countries in Europe. The maximum fertility rate is 2.27 children born per woman in Albania and is represented by the darkest shade of orange and the minimum fertility rate is 1.13 children born per woman in Bulgaria represented by the lightest shade of orange. Classified choropleth map In a classified choropleth map the value range of a numeric attribute is divided into differently coloured intervals. The country objects are therefore displayed on the map according to the interval their associated attribute values fall into. Multiple choropleth maps The Multiple choropleth map visualisation method allows you to view a number of attributes at the same time. After clicking on Display Wizard > Display and choosing the attributes you wish to view, click the Multiple choropleth maps check box and click OK. The multiple choropleth map in the screen shot below shows Birth and Death rates in Europe. The darkest blue shade in the top choropleth map representing the Birth rate attribute shows Albania with the highest birth rate of 18.59, with Turkey close behind with 17.95. The lower choropleth map represents the Death rate attribute and shows Turkey with the darkest orange shade has the lowest Death rate of 5.95. ESDS International http://www.esds.ac.uk/international/support/user_guides/gisoverview.asp Appearance of interface: Toolbar buttons Toolbar menu options Displaying attribute values Altering default colour scheme Visualisation map tools: Choropleth maps Standalone bars Graduated circles Pies Triangles Cross classification Visualisation chart tools: Scatter plot Scatter plot matrix Query tools: Dynamic queries Measuring distances Calculation tools: Sum of columns Percentages and ratios Change, difference Geography 281 Map Making with GIS http://geography.fullerton.edu/281/proj4.pdf techniques of Geographic Information Systems (GIS) or Geographic Information Science. Generically, a GIS can be defined as any computer-based, software-hardware platform capable of capturing, storing, displaying, manipulating, and analyzing any set of geo-referenced data. GIS represents the technological synthesis of traditional cartographic principle, advances in computer-assisted and analytical cartography, spatial statistics, relational database design, and digital image processing an analysis. It is assumed that you have a familiarity with desktop computers (pc), the operating system of Windows NT/2000, web browsing, as well as software such as MS Excel and MS Word. The course has two components: learning the theories of GIS and learning to apply these theories in GIS software (ArcGIS). I therefore teach GIS in two ways: through lectures and through assignments. While mastering GIS software is an important part of being a GIS user, it is impossible to correctly perform any GIS operation or analysis in software without the proper understanding of GIS theories. I will therefore strongly concentrate upon teaching GIS theories, including data structures, database systems, GIS operations, spatial analysis, mapping with GIS and other selected current issues. I expect you to become familiar with the GIS vocabulary. Such concepts are often abstract in nature, and perceived to be difficult. You are thus strongly encouraged to read (and re-read!!) assigned readings and my lecture notes. Exam questions will heavily test your knowledge and understanding of GIS theories and vocabulary, and lack of reading will adversely affect your performance. You will also learn extensive operations and use of ArcGIS software through lab assignments. Your T.A. will be in charge of conducting the lab assignments. She will assign tutorial assignments as well as approximately 10 laboratory assignments that are designed to further explore GIS concepts and ArcGIS. These tutorials and assignments will be done during the scheduled lab time and other times at the lab on your own. While you will be given sufficient guidance, you are also expected to solve these project assignments on your own, since the best pedagogical approach for learning GIS software is through trial and error. Many of these projects require considerable time and patience, so be prepared to spend some hours in the laboratory. Overall, this course will provide you with a good introduction to GIS and will prepare you for further GIS courses and an exciting career in this field. However, you will not be able to learn everything on GIS (or all software operations) from this course alone. If you want to gain expertise in this field, be prepared to take several courses in GIS and other relevant courses. choropleth mapping (class interval systems): equal intervals (split data into of classes of equal width), percentile or quartiles9data divided sothat an equal numberof observations fall into each class), nested means (divided and subdivided data on the basisof mean values to give 2,4,8,16 classes), natural breacks (split data into classes based on natural breaks represented in the data histogram),box-and-whisker (split data using mean, upper and lower quartiles, outlier and extrime valuesderived from box-and-whisker plot Equal intervals http://books.google.com/books?id=DUhHKUU-3OkC&pg=PA50&lpg=PA50&dq=equal+interval+splits+data++gis&source=bl&ots=6S4-Bo2kgm&sig=7D3Ya9EaXKbVCMv42mRgV7SOmOU&hl=en&ei=NubTSs69EJPSsgPuhOXYCg&sa=X&oi=book_result&ct=result&resnum=3&ved=0CBUQ6AEwAg# Univariate classification schemes examples of class interval system: Geospatial Analysis - a comprehensive guide. 3rd edition © 2006-2009 de Smith, Goodchild, Longley http://www.spatialanalysisonline.com/output/html/Univariateclassificationschemes.html Within GIS software, univariate classification facilities are found as tools to: aid in the production of choropleth or thematic maps; explore untransformed or transformed datasets; analyze (classify and re-classify) image data (see further, Section 4.2.12.3); and display continuous field data. In the majority of cases these procedures perform classification based purely on the input dataset, without reference to separate external evaluation criteria. In almost all instances the objects to be classified are regarded as discrete, distinct items that can only reside in one class at a time (sometimes referred to as hard classification). Separate schemes exist for classifying objects that have uncertain class membership (soft or fuzzy classification) and/or unclear boundaries (as discussed briefly in Section 4.2.13.4), or which require classification on the basis of multiple attributes (see Section 4.2.12.2). Typically the attributes used in classification have numerical values that are real or integer type. In most instances these numeric values represent interval or ratio-scaled variables. Purely nominal data are effectively already classified (see, for example, Figure 4‑21 in which each zone has been assigned a unique color from a pre-selected palette; and see Section 2.2.2 for a brief discussion of commonly used scales). Of course, there is always the option to apply no predefined classification procedure, but instead when mapping such data to apply a continuous gradation of grey tones or colors, using a linear or non-linear (e.g. ogive) scaling of the data values (see further Tobler, 1973 and Cromley and Ye, 2006). Table 4‑5 provides details of a number of univariate classification schemes together with comments on their use. Most of the main GIS packages provide classification options of the types listed, although some (such as the box and percentile methods) are only available in a limited number of software tools (e.g. GeoDa). A useful variant of the box method, known as hybrid equal interval, in which the inter-quartile range is itself divided into equal intervals, does not appear to be implemented in mainstream GIS packages. Schemes that take into account spatial contiguity, such as the so-called minimum boundary error method described by Cromley (1996), also do not appear to be readily available as a standard means of classification. Table 4‑5 Selected univariate classification schemes Classification scheme Description/application Unique values Each value is treated separately, for example mapped as a distinct color Manual classification The analyst specifies the boundaries between classes required as a list, or specifies a lower bound and interval or lower and upper bound plus number of intervals required Equal interval, Slice The attribute values are divided into n classes with each interval having the same width=Range/n. For raster maps this operation is often called slice Defined interval A variant of manual and equal interval, in which the user defines each of the intervals required Exponential interval Intervals are selected so that the number of observations in each successive interval increases (or decreases) exponentially Equal count or quantile Intervals are selected so that the number of observations in each interval is the same. If each interval contains 25% of the observations the result is known as a quartile classification. Ideally the procedure should indicate the exact numbers assigned to each class, since they will rarely be exactly equal Percentile Percentile plots are a variant of equal count or quantile plots. In the standard version equal percentages (percentiles) are included in each class. In GeoDa’s implementation of percentile plots (specifically designed for exploratory spatial data analysis, ESDA) unequal numbers are assigned to provide classes that contain 6 intervals: <=1%, 1% to <10%, 10% to <50%, 50% to <90%, 90% to <99% and >=99% Natural breaks/Jenks Widely used within GIS packages, these are forms of variance-minimization classification. Breaks are typically uneven, and are selected to separate values where large changes in value occur. May be significantly affected by the number of classes selected and tends to have unusual class boundaries. Typically the method applied is due to Jenks, as described in Jenks and Caspall (1971), which in turn follows Fisher (1958). See Figure 4‑22 for more details. Unsuitable for map comparisons Standard deviation The mean and standard deviation of the attribute values are calculated, and values classified according to their deviation from the mean (z-transform). The transformed values are then mapped, usually at intervals of 1.0 or 0.5 standard deviations. Note that this often results in no central class, only classes either side of the mean and the number of classes is then even. SD classifications in which there is a central class (defined as the mean value +/-0.5SD) with additional classes at +/- 1SD intervals beyond this central class, are also used Box A variant of quartile classification designed to highlight outliers, due to Tukey (1977, Section 2C). Typically six classes are defined, these being the 4 quartiles, plus two further classifications based on outliers. These outliers are defined as being data items (if any) that are more than 1.5 times the inter-quartile range (IQR) from the median. An even more restrictive set is defined by 3.0 times the IQR. A slightly different formulation is sometimes used to determine these box ends or hinge values. Box plots (see Section 5.2.2.2) are implemented in GeoDa and STARS, but are not generally found in mainstream GIS software. They are commonly implemented in statistics packages, for example the MATLab Statistics Toolbox Brewer and Pickle (2002) examined many of these methods, with particular attention being paid to ease of interpretation and comparison of map series, e.g. mapping time series data, for example of disease incidence by health service area over a number of years. They concluded that simple quantile methods were amongst the most effective, and had the great advantage of consistent legends for each map in the series, despite the fact that the class interval values often vary widely. They also found the schemes that resulted in a very large central class (e.g. 40%+) were more difficult to interpret. In addition to selection of the scheme to use, the number of breaks or intervals and the positioning of these breaks are fundamental decisions. The number of breaks is often selected as an odd value: 5, 7 or 9. With an even number of classes there is no central class, and with a number of classes less than 4 or 5 the level of detail obtained may be too limited. With more than 9 classes gradations may be too fine to distinguish key differences between zones, but this will depend a great deal on the data and the purpose of the classification being selected. In a number of cases the GIS package provides linked frequency diagrams with breaks identified and in some cases interactively adjustable. In other cases generation of frequency diagrams should be conducted in advance to help determine the ideal number of classes and type of classification to be used. For data with a very large range of values, smoothly varying across a range, a graduated set of classes and coloring may be entirely appropriate. Such classification schemes are common with field-like data and raster files. Continuous graded shading of thematic maps is also possible, with some packages, such as Manifold’s Thematic Formatting facilities providing a wide range of options that reduce or remove the dependence on formal class boundaries. The Jenks method, as implemented in a number of GIS packages such as ArcGIS, is not always well-documented so a brief description of the algorithm is provided in Figure 4‑22. This description also provides an initial model for some of the multivariate methods described in subsection 4.2.12.2. Fuzzy boundaries: Geospatial Analysis - a comprehensive guide. 3rd edition © 2006-2009 de Smith, Goodchild, Longley http://www.spatialanalysisonline.com/output/html/Fuzzyboundaries.html number of GIS packages include special facilities for identifying, selecting and analyzing boundaries. These may be the borders of distinct zones or areas, within which values on one or more attributes are relatively homogeneous, and distinct from those in adjacent areas, or they may be zones of rapid spatial change in more continuously varying attribute data. Amongst the tools available for boundary determination are raster-based GIS products, like Idrisi, and the more specialized Terraseer family of spatial analysis tools. Indeed Terraseer has a specific product, Boundaryseer, for boundary detection and analysis, using techniques such as overlap and sub-boundary statistical analysis (the pattern analysis software, PASSaGE, provides similar facilities). The problem these products seek to address is that of identifying where and how boundaries between different zones should be drawn and interpreted when the underlying data varies continuously, often without sharp breaks. For example, soil types may be classified by the proportion of sand, clay, organic matter and mineral content. Each of these variables will occur in a mix across a study region and may gradually merge from one recognized type to another. In order to define boundaries between areas with, for example, different soil types, clear polygon-like boundaries are inappropriate. In such cases a widely used procedure is to allocate all grid cells to a set of k zones or classes based on fuzzy membership rather than binary or 0/1 membership. The idea here is to assign a number between 0 and 1 to indicate how much membership of the zone or class the cell is to be allocated, where 0 means not a member and 0.5 means the cell’s grade of membership is 50% (but this is not a probabilistic measure). If a zone is mapped with a fuzzy boundary, the 50% or 0.5 set of cells is sometime regarded as the equivalent to the boundary or delimiter (cross-over point) in a crisp, discrete model situation. Using this notion, crisp polygon boundaries or identified zone edges in raster modeled data may be replaced with a band based on the degree of zone membership. Alternatively grid datasets may be subjected to fuzzy classification procedures whereby each grid cell acquires a membership value for one of k fuzzy classes, and a series of membership maps are generated, one for each class (see further, the discussion on soft classifiers in Section 4.2.12.4). Subsequent processing on these maps may then be used to locate boundaries. Several fuzzy membership functions (MFs) have been developed to enable the assignment process to be automated and to reflect expert opinion on the nature and extent of transitional zones. Burrough and McDonnell (1998, Ch.11) provides an excellent discussion of fuzzy sets and their applications in spatial classification and boundary determination problems. The GIS product Idrisi, which utilizes some of Burrough’s work, provides four principal fuzzy MFs: Sigmoidal or (double) s-shaped functions, which are produced by a combination of linear and cos2() functions in Idrisi’s case, or as an algebraic expression of the form m=1/(1+a(z‑c)2) where a is a parameter that varies the width of the function, z is the property being modeled (e.g. proportion of clay) and c is the value of this proportion at the function midpoint (see Figure 4‑31; here membership values of >0.5 are regarded as being definitely members of the set A) J-shaped functions, which are rather like the sigmoidal MFs but with the rounded top sliced off flat over some distance, x (if x=0 then the two sides of the J meet at a point). The equation used in this case is of the form: m=1/(1+((z‑a)/(a‑c))2) Linear functions, which are like the J-shaped function but with linear sides, like the slope of a pitched roof, and are thus simple to calculate and have a fixed and well-defined extent, and User-defined functions, which are self-explanatory. In most applications MFs are symmetric, although monotonic increasing or decreasing options are provided in Idrisi An alternative to using MFs as spreading functions is to apply a two-stage process: first, use a (fuzzy) classification procedure to assign a membership value (mik=0.0‑1.0) to each grid cell, i, for each of k classes. This value is based on how similar the cell attributes are to each of k (pre-selected) attribute types or clusters. These may then be separately mapped, as discussed earlier. Having generated the membership assignments boundaries are then generated by a second algorithm. Boundaryseer provides three alternative procedures: · Wombling (a family of procedures named after the author, W H Womble, 1951) · Confusion Index (CI) measures, which are almost self-explanatory, and · Classification Entropy (CE) based on information theoretic ideas Wombling involves examining the gradient of the surface or surfaces under consideration in the immediate neighborhood of each cell or point. Typically, with raster datasets, this process examines the four cells in Rook’s or Bishop’s move position relative to the target cell or point — boundaries are identified based on the rate of change of a linear surface fitted through these four points, with high or sudden rates of change being the most significant. Wombling methods can be applied to vector and image datasets as well as raster data. Wombling edge-detection is provided in a number of other software products, including the SAM and PASSaGE macroecology packages. The Confusion Index (CI) works on the presumption that if we compute the ratio of the second highest membership value of a cell i, mi2, to the highest, mi1, then any values close to 1 indicate that the cell could realistically be assigned to either class, and hence is likely to be on the boundary between the two classes. Finally, Classification Entropy, devised by Brown (1998), creates a similar value to the CI measure, again in the range [0,1], but this time using the normalized entropy measure: where the summation extends over all the k-classes and the measure applies to each of the j cells or locations. This latter kind of measure is used in a number of spatial analysis applications (see further, Section 4.6). Fuzzy procedures are by no means the only methods for detecting and mapping boundaries. Many other techniques and options exist. For example, the allocation of cells to one of k-classes can be undertaken using purely deterministic (spatially constrained) clustering procedures or probabilistic methods, including Bayesian and belief-based systems. In the latter methods cells are assigned probabilities of membership of the various classes. A fuller description of such methods is provided in the documentation for the Boundaryseer and Idrisi packages. As is apparent from this discussion, boundary detection is not only a complex and subtle process, but is also closely related to general-purpose classification and clustering methods, spatially-constrained clustering methods and techniques applied in image processing. As noted earlier, boundaries may also be subject to analysis, for example to test hypotheses such as: “is the overlap observed between these two boundaries significant, and if so in what way, or could it have occurred at random?”; “is the form of a boundary unusual or related to those of neighboring areas?” Tools to answer such questions are provided in Boundaryseer but are otherwise not generally available. Type of Output: Maps and Alternative http://educationally.narod.ru/gisoutphotoalbum.html Retail Geovisualization http://www.accessmylibrary.com/article-1G1-166777471/geovisualization-retail-structural-change.html Geovisualization refers to the visual exploration, analysis, synthesis and presentation of geospatial data. This paper presents findings from research that has focused on developing and applying geovisualization techniques and technologies for use within retail location decision support. Retailers represent a major user group of Geographic Information System-based (GIS) decision support technologies, with applications ranging from trade area mapping to store portfolio planning. However, the ability to handle spatial-temporal data, visualize change, and explore the temporal dimension of spatial data is limited within conventional GIS. The paper details the development of a prototype geovisualization system that has been designed to enable visualization of spatial-temporal change of retail-related data. From this explicitly visual paradigm, a number of examples of potential analysis are examined at four different scales of analysis: national, regional, market and micro-level. The paper highlights both the challenges and potential to enhance retail decision support by integrating geovisualization techniques and technology within decision support activities. 3-D geovisualization http://www.accessmylibrary.com/coms2/summary_0286-2741061_ITM Professional Tutorials http://www.spatialhydrology.com/tutorial.html Retail Geovisualization:3D and animation for retail decision support http://educationally.narod.ru/gisretailphotoalbum.html data animated dynamic legends Serving GIS Data Through the World Wide Web http://www.ncgia.ucsb.edu/conf/SANTA_FE_CD-ROM/sf_papers/engel_bernard/engel.html Web-based GIS http://gislounge.com/web-based-gis/ Web Map Service http://en.wikipedia.org/wiki/Web_Map_Service |