University of California, Berkeley NASA End to End in EOSDIS

Our New World Order...
A Prototype Implementation.

This is a ROUGH DRAFT under heavy construction. Please email feedback to the coordinator noted in each section. Please send general suggestions to rtroy@postgres.berkeley.edu.

Last update: March 24, 1995

Introduction to our Prototype

As a part of the UCB portion of the National Aeronautics & Space Administration (NASA) Earth Observing System (EOS), Distributed Information System (DIS), "End to End" project, we have built a prototype system based on a database and a small set of application tools. (Our prototype will be growing in scope and power as time moves on; Watch this page for future developments.)

We call our prototype The New World Order Project because of its ability to bring order to a world of datasets. One of the key goals of this project is to provide the ability to "join" disparate datasets to researchers working in diverse fields, which were not previously possible. We are bringing together into one database such diverse datasets as satelite imagry, wetlands data, ariel photographs, 35mm terestrial photographs, newspaper articles, census data, climatology, river and stream hydrology, and a host of others.

Aside from academic research, such a system would have uses in the society at large. One example question which cannot be easily answered without such a system might have to do with public policy: "Given our drought, just how important are water intensive crops like cotton and rice in preserving the ecology necessary to sustain migrating waterfowl who have lost their natural wetlands habitat in this region?" Or, "What sensitive resources such as schools or endangered specie habitat exist within the geographic domain of the Environmental Impact Reports submitted by this corporation?" Questions like these require sophisticated integration of diverse datasets, and that is part of what our project is all about.

While The New World Order Project cannot answer these questions today, it is our vision that out of our work will result a system that can. For more information on our whole project, please see our parent page.

Our Database Engine

We are implementing our database in a very powerful Database Management System (DBMS) which combines the "Object Oriented" and "Relational" paradigms, and permits SQL access to all data. This database engine is Illustra, a commercial version of the Postgres DBMS, developed here at UCB as a research project by Michael Stonebraker.

The combination of the "Object Oriented" paradigm with the Relational Database Management System (RDBMS) model (thus creating an "ORDBMS") is very important for our project for several reasons:

Unlike traditional RDBMS, it offers ease of extension with "objects", such as satelite raster images, or atypical datatypes, such as coordinates in longitude - latitude pairs, while retaining standard SQL access. The system also offers easy addition of functions, and operators to go along with new datatypes or objects.

And, unlike RDBMSs, it offers inheritence features which permit more logical database design, improving consistency and ease of access while reducing the number of tables, and easing the difficulty of performing "joins."

Unlike Object Oriented only systems, the Illustra ORDBMS retains the ease of use on "standard" datatypes, such as text, just as if it were a pure RDBMS.
Because of its wide acceptance, the ability to use "standard SQL" means that the system is immediately useful to a wide range of individuals, without need to train them in a specialized toolset.

And, the Illustra ORDBMS offers other important features:

For performance, it has an "R-Tree" index structure for "spatial" data retrieval. In more traditional systems, indexes are typically built on "keys" such as Social Security Number, where a sequential ordering makes quick retrieval possible. Such indexes can be thought of as single dimension structures because they operate on one ordered element. In the same way, though much more sophisticated, the R-Tree index provides a way of locating data in multiple dimensions without having to examine an entire data set.

Because it is a commercial system, its stability and useability is better assured as there is a staff of professionals maintaining it. Thus, it is a better choice than the original Postgres DBMS.

Our Database Schema

Our database is based on the Big Sur - Sequoia schema , which in turn is based on the Federal Geographic Data Committee (FGDC), Standards for Digital Geospatial Metadata, with a few extensions. As the Big Sur schema is well described on its own page (follow the link above), here we focus on the extensions added for our Prototype. It should be noted that our extensions may not have been necessary; We created them out of expediency. Through an ongoing evolutionary process, we are improving our implementation to avoid such extensions, and will provide feedback to appropriate Standards committees on what we've learned.

We created a "Reference" schema to house some of our extensions to the database design. (In Illustra, you can have multiple schemas in a single database.) The Reference schema contains only those items which can be thought of as providing "reference points" from which to evaluate the location of other objects. Some examples are a world map, and State and County/Parish boundaries.

We also created a "GCM" schema for Global Circulation Modeling. The GCMTest schema houses that data which is unique to GCM work. In testing various approaches to GCM visualization, we decided to try our ideas out in an isolated schema. We haven't yet attempted to reconcile the GCMTest schema with the Big Sur schema. (...We're not through playing with it yet...)

One GCM visualization issue for us was the actual drawing of representative shapes indicating detail such as wind direction. This activity can be done dynamically at run-time as a deriveable attribute, but there is a significant performance cost for doing so. So, we ended up creating the various shapes "staticly" instead, and had to store them somewhere - a private schema seemed the right choice. Reconsiling this work with the Big Sur schema is an important area for further improvement.

Our Application tools

Our application tools include a combination of SQL (Standard Query Language), TCL/TK, and an Illustra tool known as the Object Knowledge Browser (OK Browser). While SQL should be familliar to a wide range of individuals with computer exposure, the other two may not be familliar. TCL/TK is an interpreted scripting language that has GUI (Graphical User Interface) capabilities. The OK Browser is a new type of application development tool in which the programmer writes virtually no "code." Instead, the OK Project Editor presents a palate of choices. The programmer chooses from the palate, places these objects in a "canvas", and joins the chosen objects input and output ports to each other in meaningful ways.

The result of combining these tools is that we were able to create our initial application in about 10 days! We used the OK Browser as the initial viewing tool. When objects (items) are selected, TCK/TK may be brought up and used to view the selected object in detail. If desired, the user is given the opportunity to access SQL directly to further their inquery into the data available. The power handed to the user is considerable, and the data is largely at their complete disposal.

More importantly, however, once diployed, we want our users, who we do not insist be highly computer literate, to either create their own applications or modify the ones we provide to their liking. It is very important that our users be able to tailor applications to meet their specific needs, and at the same time, it is important to offer them high-productivity tools which are easy for them to learn, as they will typically not be Computer Programmers.

We believe our prototype system illustrates how such a system might be brought together. But we have just begun. We anticipate integration of a host of more specialized tools so that investigators may quickly move from one application environment to another, passing items of interest between the applications without difficulty.

As an example, when viewing the New World Order application, please note the " Grass rasters." These rasters (images) are generated by an application which is presently wholely seperate from this one. By integrating these two applications, we intend to permit the user to select and display Grass rasters via the New World Order application, and quickly move to the other application for a more specialized interaction with the data.

Our Data

Our data sets include a diverse collection, all of which share a few key attributes. Chief among these is a "spatial" element. The spatial element is usually a "geo-location" on the Earth. While we realize the Earth is indeed 3 dimensional, a more common representation is a two dimensional, spherical geometry based approach with an agreed origin: Latitude and Longitude! We use the latitude/longitude system for much of our data (much of it was available to us in this form), though we are not restricted to it. Future work will include the flexibility to support coordinates in a projected form, such as Albers Equal Area, which uses Meters Easting & Northing. But for now, we coerce such data into Lat/Long (decimal form, ie no minutes and seconds).

Spatial data can be of three forms: point, path, and polygonal:

Point data is simply an X,Y pair, and is the foundation for the other two forms. Generally it could be argued that virtually all geo-located items have an area they cover, and so should not be located by a single point. However, as a practical matter, in many cases geo-location data is collected as a single X,Y (longitude, latitude) point. In these cases, the area data may be lost, or is simply considered unimportant.
Path data is a series of points, and each successive pair defines a line segment. This data may represent a river or a road. Again, data collected in this way may loose area information, so it might not be possible to answer the question "how much land area does this river cover" without making width assumptions. It should be noted that occasionally data collectors measuring such objects as rivers collect data for each side so that one ends up with two paths for the object. This is the case with the Topologically Integrated Geographic Encoding and Referencing System ("Tiger") based hydrological data (rivers and streams).
Polygons are simply paths whose last point is the same as the first, and so are "closed." Polygons (often abreviated as simply "poly") are uniquely usefull as they are the only form capable of conveying area information directly. Polygons are most often used for things such as State boundaries.

Our data comes to us from several sources and we are constantly adding more. Presently, our data sets include:

World Map

Includes major political boundaries, provided by the Central Intelligence Agency (CIA).

US State and County/parish Maps

Provided by the USGS and/or census Tiger data.

AVHRR images.

Advanced Very High Resolution Radiometer images are taken by satelites and go through numerous processing stages.

Geographic Resources Analysis Support System (GRASS) Rasters.

Grass is a geographic information system (GIS) used to analyze and display data. Like many GIS software toolkits, GRASS provides tools for data conversion , digitizing, management, analysis, overlay, and display. Its particular strengths include geographic modeling - overlaying and combining different geodata based on location and description.

Ariel photographs of the Sacramento (California) river delta.
DWR (California Department of Water Resources) 35mm slide library.
"Flood text"

San Francisco Chronicle articles related to the recent (Jan-Feb 1995) flooding in California. These articles were "hand geo-located" by us, but could possibly be geolocated by a tool known as GYPSY.

"Points of Interest"

This data is from the California place name registry.

GCM Test data

This data was created by us to simulate a real Global Circulation Model visualization in our prototype. We expect to soon update our visualization display, and use real data.

Our Prototype

Our prototype is called the New World Order Project, and is based on the items outlined above. One enters the environment by running the "OK Shell." The OK Shell runs "X" so the user, the program running OK and the database engine may all be on different systems! OK starts by bringing up the first "Recipe", and from there the user may run different recipes simply by clicking on them.

The following is a sample of what you would see as you enter the application. The default is to see a large swath of the Earth, including the whole eastern seaboard of the US:

Acknowledgements

P. Brown (UCB) and R. Troy (UCB)

Section Coordinator: Richard Troy, rtroy@postgres.berkeley.edu