Spatial Data Integration

From Pyxis public wiki
Jump to: navigation, search



The growth of geospatial data on the web and adoption of interoperability protocols has made it possible to access a wide variety of geospatial content, however, challenges remain. Once a user accesses this abundance of data, how is it possible to combine all this data in a meaningful way? How can one create valuable information products from the multiple sources of information? These challenges fall under the essential requirements of on-demand geospatial data integration, a requirement of many spatial data architectures.

Much of the work on information integration has focused on the dynamic integration of structured data sources - such as text, databases, non-spatial imagery or XML streams - but much less advancement has been gained with the integration of geospatial content. Integration of geospatial data is more complex than structured data sources because geospatial data obtained from various sources have significant complexities and inconsistencies related to how the data was obtained, the expectations for its use and level of granularity and coverage of the data. Resulting reliance on metadata explanations support the complex nature of the problem; even basic steps to understand the geo-coordinates of a map served may prevent the integration of two or more sources of spatial data.

Formally, we can define Geospatial Data Integration as geometrically combining two or more different sources of geospatial content to facilitate visualization, analysis, fusion and simulation products. In this context we are not speaking of semantic integration, which may presuppose a common attribute data model. We are also particularly interested in differentiating traditional approaches of geospatial data integration with those techniques that allow on-demand integration. On-demand integration means the spatial content can be combined from disparate sources as necessary without considering complex requirements for manual conflation, pre-compiling or re-processing the existing datasets. On-demand geospatial integration assumes that the content creators have no a priori knowledge of their contents eventual use. Solutions provide the content integrator with greater flexibility and control over the data application leading to user pull models and products such as on-demand mapping and automosaicking.

 Why Do we Need this – INTEGRATION - layer? 
 Conceptual Architecture for Circumpolar Biodiversity Monitoring Program
 SEE an example of the challenge within the Architecture for the Circumpolar 
 Biodivesity Monitoring Program 
 ->[1] page 12,13.  
 Another example put out by the International Polar Year organization.  
 I love this chart and not just because of the cell shapes and the nice way to 
 define the need for an integration filter.

Further, data integration includes functionalities such as data conversion among different spatial formats or scales, sampling among different data values, formats and projections, handling many types of data products, vectors, raster imagery, various sensor data, accessing and combining the distributed data sources, data co-registration among inconsistent data files and so on. We consider that there are different levels of spatial data integration such as loose and tight coupling, (simple overlays could be loosely coupled) and full integration (embedded coupling).

Here it may be important to make a significant differentiation between digital map making and digital map utilization. Integration tends to be a cartographic exercise in map making. Geographic Information Systems, although most provide functions for data integration and thus map making, are focused on the usability of the map. A full integration product provides a general framework to enable GIS application: advanced 3D Data Visualization, Complex Spatial Analysis, Data Fusion and Simulation or Modeling.

=Traditional Approach to Integrating Geospatial Data=
Wooden Blocks.gif

Current computer geospatial integration solutions must rely on the inefficient process of skilled professionals utilizing sophisticated software to continuously rectify and re-project geo-spatial data with other data sets. These techniques involve aligning and matching spatial content to the particular data source that exhibits the ideal characteristics of a basemap. The coordinate systems are determined by the intent of the new map, however, almost exclusively, lattice coordinates of spherical or rectilinear coordinates are used. The continuum of a lattice on such coordinates provides infinite opportunity to transform the data – translate, scale, rotate, filter, resample – through the conflation process. This flexibility also contributes to an often complex, lengthy and expensive process that requires highly skilled decisions and sophisticated software. This is the business of cartographers and geodesists. As an analogy, it is a job of creating a beautiful structure from blocks of variable size and shape with infinite opportunity and challenge to align one object with another.

The results are an accurate scale dependent representation of physical characteristics usually suitable for extraction of geography, mass, and quantity. These results are well understood and firmly established. Many conventional data products and services are created and utilize this method.

Why Consider Alternative Integration Techniques?

The Web is changing the way that global data can be accessed and visualized. More than ever before, information can be transformed quickly into actionable knowledge. Such abilities are accomplished when we empower individuals at the edge of an organization – such as those involved directly in field based conservation and science.

The requirement for manual processing in these traditional methods contributes to a static representation of the Earth’s surface and tends to encourage a data preprocessing “push to the user” model. The results are often complex, underutilized, and statistically inappropriate analysis and display.

Merging disparate spatial content representing any location, at any level of granularity, on-demand, is beyond the capability of traditional approaches. The results are overlaid not fused products. Data referenced with lattice points is not fixed; it is free to reside and align anywhere, and as such resides as a continuous analog solution rather than a discrete, and more effective, digital solution. Even requirements to add a new dataset to a group must go through the rigors of re-processing with the old datasets, so back to the conflation drawing board. Any changes to the old datasets normally result in re-processing the whole project.

There is a trend to open, on-demand, distributed sharing of spatial information; a vision which is at the core of the conservation community's needs. The advent and growing requirement for digital mapping provide both the interest and opportunity to consider the adoption of a new approach that will provide on-demand information capability. Reaching this goal is a critical requirement for open and efficient sharing of scientific information, knowledge, and best practice critical to reducing the loss of the planet's biodiversity.

What’s the Difference?

It is possible to provide techniques which visually overlay one image on another; where two or more data sets appear integrated on a computer monitor but remain separated by the layers. Loosely coupled data has no capability beyond knowledge of its own layer. It presents a visual integration. As an example: 3D Visualization engines like Google Earth utilizes mosaicked imagery of various resolutions to serve as a global to local basemap. Developers can add points and overlay imagery “on-demand” through KML and other "mashup" techniques. Are these valuable, yes a great step forward in access to visual referenced data. However, users can not integrate new information into the mosaicked imagery that has been preprocessed they can not map on demand. Full functionality between the layers of information is only possible if they reside on a common grid. This limitation significantly affects the value of Google Earth for those who expect it to serve as a base for GIS services.

A notable exception that exemplifies the need to define a tesseral model: points are easily integrating on such a basemap on-demand, as by definition, a point can be placed anywhere on a continuum of a lattice without change. Anyone can do “DOTS ON A MAP”, assuming it is positioned correctly, it has no scale or rotation. However, information contained at a point can only represent infinite levels of granularity unless manual assumptions for feature generalization are included. In other words, a point does not really exist physically, so a point that represents a place such as a city – the word Kingston - becomes nonsensical on a map of the whole city.

How does the capability of geospatial products change and grow in value when the spatial content can be integrated on-demand? WOW a new era of spatial capability will emerge. Consider a 3D Visualization Engine like Google Earth: content providers beyond the “Dots on a map” could serve single source value added data to users who combine on demand data and applications of their choosing. Further, advanced application, improved visualization, analysis and modeling could be incorporated were visualization only is provided now.

Consider a user requirement. Zoom to a location of interest on the Earth that represents protected parkland. Combine several data sets – terrain, satellite imagery, vectors. Based on the data available, delineate the watershed, estimate moisture content in soil, determine the vegetation type, and complete a slope analysis. Search for all the known and recorded species that have been observed in within the park. Determine whether any location within 100 km exhibit similar geomorphology and potential for biodiversity.

How can a Digital Earth Reference Model Help

The capacity to share, discover, combine, analyze, model and report is enabled with the simplified application is desirable. How is this possible?

Tessellations, as opposed to lattices, are the core component of “digital” technologies. Within digital technologies analog continuous systems are parsed into discrete pieces, the analog values quantized into single integer values and assigned to each pieces. In digital imagery square cells arranged in orders of columns and rows and values form an analog image or sensor quantized into each cell. Often in a digital model there is a sense of spatial proximity to neighbour cells and a hierarchical indexing that provides for aggregation and decomposition through levels of cells, like a quad-tree. Ideally, a full algebra including advanced transforms can be applied to the digital structure, like in the case of wavelet compression.

What if one was able to create an elegant tessellation or digital model of the Earth; a fixed, uniform partitioning of the Earth’s surface? Could analog geospatial data sources be automatically and suitably sampled into this digital model? If one data sources can be sampled into this structure, would it not be true that multiple datasets could be sampled into this grid? By definition, wouldn’t these multiple data sets then be integrated on-demand, independent of the data source, but successfully to each other? What would this global look like?

As they represent an Earth reference that encompasses the components of a digital model – discrete equal area cells, quantized values, hierarchical index, and integer algebra – we term the resulting grid a Digital Earth Reference Model.

This approach associates the quantized datasets to unique mathematical defined cells. This transformation integrates datasets without re-projection through the unique index of a cell, in the same way a spreadsheet can hold a value; this spreadsheet Earth can hold multiple values at multiple resolutions. In this way data can reside at its own level of granularity or scale but can be aggregated or decomposed up the hierarchy of cells. Further, the indexing is a part of an algebraic numbering system, providing a mathematical platform for simple arithmetic to advance transformations.

Lego Blocks.gif

To continue the analogy of the previous section, using a well designed Digital Earth Reference Model to integrate geospatial content is like building a beautiful structure with blocks of various sizes and shapes, but with concise mortise and tendon joinery that facilitate a snap together architecture. From this framework, advances that promote data discovery, use, reusability and usefulness can flourish and advance opportunities such as:

  • Improved representation by avoiding geometric distortion and inter-comparability in multiple projections
  • Enabling the encoding of location to express the resolution and scale of measurements, which standard coordinates can’t support. And a hierarchical grid index carries its own positional metadata.
  • Facilitating data management by putting forth a common geometric schema that avoids the need to compromise legacy formats, specific projections, scales and datums.
  • Providing an accurate and efficient framework for spatial analysis by detecting the inconsistency among the datasets and merging datasets appropriately. Providing query capability across multiple disparate data sources using simplified set theory.
  • Improving the data simulation and modeling by combining spatial data from multiple sources, including temporal and three-dimensional aspects of the datasets, on a generalized hexagonal grid.

Within a global grid system an open format for sharing or format specification becomes simplified to an index on a cell of particular and a value in a cell. By associating all the objects to a specific cell, the transformation between original formats, resolutions, projects and coordinates are solved automatically. The conflicts between inconsistent maps can easily be detected during the re-sampling and co-registration.

As a final arguement, this is a data structure problem not a GIS problem. IPv6 architecture calls for hierarchical address space for up to n^128 digits. Providing a spatial coding with Internet Protocol can not be accomplished on continuous space, however a hierarchical cell indexing is ideally suited for IPv6. Imagine the advancement that could be made in the spatial data world if fixed locations could be the core to accessing attributes of that space like network address currently function. This solves the distributed spatial data problem. Immediately we will recognize the true face of on-demand spatial data access and integration and it will be called Digital Earth.

Personal tools

Wiki Navigation