Statistical Modeling

Statistical Modeling

Site-specific management, often referred to as Precision Farming or Precision Agriculture, is about doing the right thing, in the right way, at the right place and time. It involves assessing and reacting to field variability and tailoring management actions, such as fertilization levels, seeding rates and variety selection, to match changing field conditions. It assumes that managing field variability leads to cost savings, production increases and better stewardship of the land. Site-specific management isn’t just a bunch of pretty maps, but a set of new procedures that link mapped variables to appropriate management actions.

Several of the procedures, such as map similarity, level-slicing, clustering and regression, used in precision agriculture are discussed thoroughly in section 4.0, Spatial Statistics Techniques. The following discussion outlines how the analytical techniques can be used to relate crop yield to driving variables like soil nutrient levels that support management action.

6.3.1 Elements of Precision Agriculture

To date, much of the analysis of yield maps have been visual interpretations. By viewing a map, all sorts of potential relationships between yield variability and field conditions spring to mind. These ‘visceral visions’ and explanations can be drawn through the farmer’s knowledge of the field— "…this area of low yield seems to align with that slight depression," or "…maybe that’s where all those weeds were," or "…wasn’t that where the seeder broke down last spring?"

Data visualization can be extended by GIS analysis that directly links yield maps to field conditions. This processing involves three levels— cognitive, analysis and synthesis. At the cognitive level (termed desktop mapping) computer maps of variables, such as crop yield and soil nutrients, are generated. These graphical descriptions form the foundation of site-specific management. The analysis level uses the GIS’s analytical toolbox to discover relationships among the mapped variables. This step is analogous to a farmer’s visceral visions of relationships, but uses the computer to establish mathematical and statistical connections. To many farmers this step is an uncomfortable leap of scientific faith from pretty maps to pure, dense technical gibberish.

However, map analysis greatly extends data visualization and can more precisely identify areas of statistically high yield and correlate them to a complex array of mapped field conditions. The synthesis level of processing uses GIS modeling to translate the newly discovered relationships into management actions (prescriptions). The result is the prescription map needed by intelligent implements in guiding variable rate control of field inputs. Admittedly, the juvenile science of site-specific management is a bit imprecise, and raises several technical issues. In less than two decades, however, the approach has placed millions of acres world-wide under site-specific management, as well as completely altering equipment with GPS and variable-rate hardware.

The precision agriculture process can be viewed as four steps: Data Logging, Point Sampling, Data Analysis and Prescription Modeling as depicted in figure 6.3.1-1. Data logging continuously monitors measurements, such as crop yield, as a tractor moves through a field. Point sampling, on the other hand, uses a set of dispersed samples to characterize field conditions, such as phosphorous, potassium, and nitrogen levels. The nature of the data derived by the two approaches are radically different— a "direct census" of yield consisting of thousands of on-the fly samples versus a "statistical estimate" of the geographic distribution of soil nutrients based on a handful of soil samples.

In data logging, issues of accurate measurement, such as GPS positioning and material flow adjustments, are major concerns. Most systems query the GPS and yield monitor every second, which at 4 mph translates into about 6 feet. With differential positioning the coordinates are accurate to about a meter. However the paired yield measurement is for a location well behind the harvester, as it takes several seconds for material to pass from the point of harvest to the yield monitor. To complicate matters, the mass flow and speed of the harvester are constantly changing when different terrain and crop conditions are encountered. The precise placement of GPS/Yield records are not reflected as much in the accuracy of the GPS receiver as in "smart" yield mapping software.

In point sampling, issues of surface modeling (estimating between sample points) are of concern, such as sampling frequency/pattern and interpolation technique to use. The cost of soil lab analysis dictates "smart sampling" techniques based on terrain and previous data be used to balance spatial variability with a farmer’s budget. In addition, techniques for evaluating alternative interpolation techniques and selecting the "best" map using residual analysis are available in some of the soil mapping systems.

In both data logging and point sampling, the resolution of the analysis grid used to geographically summarize the data is a critical concern. Like a stockbroker’s analysis of financial markets, the fluctuations of individual trades must be "smoothed" to produce useful trends. If the analysis grid is too coarse, information is lost in the aggregation over large grid spaces; if too small, spurious measurement and positioning errors dominate the information.

The technical issues surrounding mapped data analysis involve the validity of applying traditional statistical techniques to spatial data. For example, regression analysis of field plots has been used for years to derive crop production functions, such as corn yield (dependent variable) versus potassium levels (independent variable). In a GIS, you can use regression to derive a production function relating mapped variables, such as the links among a map of corn yield and maps of soil nutrients—like analyzing thousands of sample plots. However, technical concerns, such as variable independence and autocorrelation, have yet to be thoroughly investigated. Statistical measures assessing results of the analysis, such as a spatially responsive correlation coefficient, await discovery and acceptance by the statistical community, let alone the farm community.

In theory, prescription modeling moves the derived relationships in space or time to determine the optimal actions, such as the blend of phosphorous, potassium and nitrogen to be applied at each location in the field. In current practice, these translations are based on existing science and experience without a direct link to data analysis of on-farm data. For example, a prescription map for fertilization is constructed by noting the existing nutrient levels (condition) then assigning a blend of additional nutrients (action) tailored to each location forming an if-(condition)-then-(action) set of rules. The issues surrounding spatial modeling are similar to data analysis and involve the validity of using traditional "goal seeking" techniques, such as linear programming or genetic modeling, to calculate maps of the optimal actions.

6.3.2 The Big Picture

The precision agriculture process is a special case of spatial data mining described in section 2.1. It involves deriving statistical models that spatially relate map variables to generate predictions that guide management actions. In the earlier case discussed the analysis involved a prediction model relating product sales to demographic data. In the precision agriculture example, corn yield is substituted for sales and nutrient levels are substituted for demographics.

The big picture however, is the same—relating a dependent map variable (sales or yield) to independent map variables (demographics or nutrients) that easily obtained and thought to drive relationship. What separates the two applications is their practical implementation. Sales mapping utilizes existing databases to derive the maps and GIS to form the solution.

Precision agriculture, on the other hand, is a much more comprehensive solution. It requires the seamless integration of three technologies— global positioning system (GPS), geographic information systems (GIS) and intelligent devices and implements (IDI) for on-the-fly data collection (monitors) and variable-rate application (controls) as depicted in the left-side of figure 6.3.2-1.

Modern GPS receivers are able to establish positions within a field to few feet. When connected to a data collection device, such as a yield/moisture monitor, these data can be assigned geographic coordinates. The GIS is used to extend map visualization of yield to analysis of the relationships among yield variability and field conditions.

Once established these relationships are used to derive a prescription map of management actions required for each location in a field. The final element, variable rate implements, notes a tractor’s position through GPS, continuously locates it on the prescription map, and then varies the application rate of field inputs, such as fertilizer blend or seed spacing, in accordance with the instructions on the prescription map.

Site-specific management through statistical modeling extends our traditional understanding of farm fields from "where is what" to analytical renderings of "so what" by relating variations in crop yield to field conditions, such as soil nutrient levels, available moisture and other driving variables. Once these relationships are established, they can be used to insure the right thing is done, in the right way, at the right place and time. Common sense leads us to believe the efficiencies in managing field variability outweigh the costs of the new technology. However, the enthusiasm for site-specific management must be dampened by reality consisting of at least two parts: empirical verification and personal comfort.

To date, there have not been conclusive studies that economically justify site-specific management in all cases. In addition, the technological capabilities appear somewhat to be ahead of scientific understanding and a great deal of spatial research lies ahead. In the information age, a farmer’s ability to react to the inherent variability within a field might determine survival and growth of tomorrow’s farms.

From the big picture perspective however, precision agriculture is pushing the envelope of GIS modeling and analysis as well as linking it to robotics. This is quite a feat for a discipline that had minimal use of mapping just a decade ago.

7.0 Conclusions

Research, decision-making and policy development in the management of land have always required information as their cornerstone. Early information systems relied on physical storage of data and manual processing. With the advent of the computer, most of these data and procedures have been automated during the past three decades. As a result, land-based information processing has increasingly become more quantitative. Systems analysis techniques developed links between descriptive data of the landscape to the mix of management actions that maximizes a set of objectives. This mathematical approach to land management has been both stimulated and facilitated by modern information systems technology. The digital nature of mapped data in these systems provides a wealth of new analysis operations and an unprecedented ability to model complex spatial issues. The full impact of the new data form and analytical capabilities is yet to be determined.

Effective map analysis applications have little to do with data and everything to do with understanding, creativity and perspective. It is a common observation of the Information Age that the amount of knowledge doubles every fourteen months or so. It is believed, with the advent of the information super highway, this periodicity will likely accelerate. But does more information directly translate into better decisions? Does the Internet enhance information exchange or overwhelm it? Does the quality of information correlate with the quantity of information? Does the rapid boil of information improve or scorch the broth of decisions?

Geotechnology technology is a major contributor to the tsunami of information, as terra bytes of mapped data are feverishly released on an unsuspecting (and seemingly ungrateful) public. From a GIS-centric perspective, the delivery of accurate base data is enough. However, the full impact of the technology is in the translation of “where is what, to why and so what.” The effects of information rapid transit on our changing perceptions of the world around us involve a new expression of the philosophers’ view of the stages of enlightenment— data, information, knowledge, and wisdom. The terms are often used interchangeably, but they are distinct from one another in some subtle and not-so-subtle ways.

The first is data, the "factoids" of our Information Age. Data are bits of information, typically but not exclusively, in a numeric form, such as cardinal numbers, percentages, statistics, etc. It is exceedingly obvious that data are increasing at an incredible rate. Coupled with the barrage of data, is a requirement for the literate citizen of the future to have a firm understanding of averages, percentages, and to a certain extent, statistics. More and more, these types of data dominate the media and are the primary means used to characterize public opinion, report trends and persuade specific actions.

The second term, information, is closely related to data. The difference is that we tend to view information as more word-based and/or graphic than numeric. Information is data with explanation. Most of what is taught in school is information. Because it includes all that is chronicled, the amount of information available to the average citizen substantially increases each day. The power of technology to link us to information is phenomenal. As proof, simply "surf" the exploding number of "home pages" on the Internet.

The philosophers'' third category is knowledge, which can be viewed as information within a context. Data and information that are used to explain a phenomenon become knowledge. It probably does not double at fast rates, but that really has more to do with the learner and processing techniques than with what is available. In other words, data and information become knowledge once they are processed and applied.

The last category, wisdom, certainly does not double at a rapid rate. It is the application of a