[Historical Document]

University California / NASA

End to End Problems & Solutions in EOSDIS

Last Update: October 21, 1998.

BigSur: An Alternative Architecture

for The Mission To Planet Earth Information System.

This work provides an alternative option for the National Aeronautics & Space Administration ( NASA ) Mission To Planet Earth ( MTPE ), Earth Observing System, Distributed Information System ( EOS-DIS ) other than the Hughes' Aerospace Earth Core System (ECS).

"End-to-End Problems & Solutions in EOSDIS" (we call it simply "BigSur") is a NASA-sponsored multi-year project investigating alternative data management strategies for NASA's Earth Observing System (EOS). The project includes research at the Berkeley, Los Angeles, San Diego, and Santa Barbara campuses of the University of California, and Lawrence Livermore National Laboratory.

Under Construction...

Let's face it; We don't have the man-power to keep these pages current. If these pages are stale, it's because we're busy writing code! Please consider them permanently under construction!

Contents

This page:

This Site:

This page was written by Richard Troy, Project Lead, with introduction by Michael Stonebraker, Principal Investigator.

Introduction

Principal Investigator: Michael Stonebraker

The purpose of this grant is explore an architecture for EOSDIS that is "Database-centric". The main tenets of our architecture are:

  1. Put all EOSDIS data in a next-generation Database Management System (DBMS).
  2. Control the workflow of converting raw satellite imagery into "cooked" data poducts througth DBMS triggers (events).
  3. Effectively support ad-hoc inquiry as the major source of EOSDIS queries.
  4. Allow "eager" and "lazy" execution of "Processes" in the workflow.
  5. Run the same software system at SCFs and DAACs.
  6. Support seamless distribution of data through distributed DBMS technology.
Some of the utility of our system: [added by R. Troy]
  1. Provide Scientists an electronic notebook to describe and track their objects and the processes performed against them.
  2. Provide a processing system which can perform desired "Processes" against scientific objects.
  3. Provide the ability to automate Processes.
  4. Provide "resource discovery" so that browsers need not know what exists in advance.
  5. Provide a framework for inter- and intra-diciplinary interoperabilty.
In addition, we are exploring several areas of research that are inherent in such a DBMS-centric architecture. These include:
  1. More effective wide-area distributed-DBMS technology. Our efforts focus on a prototype distributed DBMS, called Mariposa.
  2. A type library for optimally regridding satellite imagery (David Siegel).
  3. A more effective interface between a DBMS and a tertiary memory file system . This is a part of the High Performance Storage System Project, HPSS , headed by Dick Watson. Also see the National Storage Laboratory, NSL.
  4. End-to-end modelling of the "end-to-end" problem of going from data source to DBMS to visualization system. The " Gator " project has as its goal to identify the ultimate bottlenecks in the overall architecture and then focus on parallelization of these modules. Gator is headed by Jim Demmel.
  5. A more flexible wide-area networking protocol, headed by Joseph Pasquale.
  6. An advanced visualization system for specifying user interactions with the database known as Tecate, headed by Peter Kochevar. This project is now over, but it was hosted by the San Diego Super-computer Center Visualization Research group, formerly at: http://www.sdsc.edu/SDSC/Research/Visualization/Tecate/tecate.html.

Brief History

In the pre-dawn epoch before time was recorded, there was Sequoia 2000, an Earth Science project whose mission was to architect an "Alternative" to the Hughes "let's ship CDs to scientists for Mission to Planet Earth" model. During Sequoia, much almost-forgotten blood was spilled. Many tempers flaired... Large beasts roamed the land and evolution was on the fast-track.

The evolutionary process yielded many important lessons and bore fruit in the form of the seminary white paper, "EOSDIS Alternative Architecture, Sequoia 2000 Technical Report 95/61, April 95". Among the most easily forgotten lessons to be learned were these:

In September of 1994, when this particular Grant began, the BigSur project (as it has since become known) was the inheritor of the evolutionary progress espoused in Sequoia 2000 (S2K). The Sequoia project was officially over, but the name BigSur had not yet been chosen so "Son of Sequoia" was the moniker... A few of the "worker-bees" - notably Jean Anderson - remained with the new project and never really gave up the name "S2K", so you will find quite a few materials that properly belong to BigSur with the Sequoia name on them - please do not be confused by this!

So, BigSur started in '94 with a mission and some pre-existing evolutionary history. The mission was to implement the vision proffered in the previously mentioned white-paper. At that time, Paul Brown (now at Informix) was the new "indian" and he officially had no chief. I came on board in January of '95 to fill that role but it didn't really matter; Paul and I worked together side by side, and were largely the only full-time workers on the project.

Before I arrived, Paul had put together an initial database design which matched his interpretation of what the white-paper said, and it was a very good start indeed. Together, we built a working prototype system in a matter of a few months! Paul primarily worked on stuffing the database full of as many diverse datatypes as possible, while I wrote applications against it. And together we improved the schema greatly. Some of our results are visible on the prototype web page (url above).

In the Summer of '95, we hired Yuechen Chi to work directly with Roberto Mechoso in UCLA. He has essentially been an "applications" oriented person and has done some wonderful work... Paul left the group in December of '95 to pursue a carrier with Illustra, and in January of '96 we hired a replacement... The replacement didn't work out, and we were unable to find a suitable person that the University could actually afford.

As time moved onward, BigSur progressed. Yuechen wrote some clever code to wed parts of Roberto Mechosos Global Circulation Model - he calls it 'esmdis' - with BigSur and put it on the web. We were still using Illustra (commercialized Postgres) as a database engine and used tools such as "Tool Control Language/ToolKit" ( 'Tcl/Tk" also developed at UCB)  for an API and GUI programming.

Illustra was bought by Informix not long after Paul left in 1995, and within a year it was abandoned by them in favor of a merged product called "IUS" (INFORMIX Universal Server), but in late '96 it was still unavailable. Also note that during this period "The World Wide Web" took off like a rocket. While our Tcl/Tk is still viable, it's limited to Unix, and isn't "web ready." This left us pinned in regarding our platforms and tools... and we were too short-staffed to think about scrapping the tools and re-implementing. With Paul gone, we could never accomplish that in the time remaining on our grant.

As our prototype system was a functional success - if only partially so - within the first year, eventually someone would want to make use of it. In the spring of '97, the LaRC - NASA's Langley Research Center got wind of BigSur, and realized its ability to solve their TRMM (Tropical Rainfall Measurement Mission)  data processing problems when the Huges ECS (Earth Core System?) failed them. They wanted it to run on "modern tools" such as a commercially available database (Illustra was gone), and use Java as a GUI and possibly API language. Berkeley Earth Science Tools performed these translations and commercialized our work for them. Thankfully, it is the obligation of those who use UC research to make derived works, so of course BEST has shared their system with us. We now use it in place of our earlier work for many of the same reasons as LaRC, and also because it inclues many practical features we left out - among these are nifty things like trace features so you can see what scientific processes are doing, and have a clue when things go wrong. ...Not a bad idea...

In the mean time, we finally found some competent help in Mark Schimmelman, who has been helping out for a while now. our focus with our remaining time is to work on areas which need further attention - there are many such areas as we have been so short handed for so long...

Acknowledgements of Partners and Participants, Past and Present

Principal Investigator:

Michael Stonebraker
621 Soda Hall
University of California
Berkeley, Ca. 94720

mike@cs.berkeley.edu

Participants Present:

Tony Drumond (UCLA)
Roberto Mechoso (UCLA)
Mark Schimmelman (UCB)
Keith Sklower (UCB)
Mike Stonebraker (UCB)
Richard Troy (UCB)

Participants Past:

Jean Anderson (UCSB)
Paul Brown (UCB)
Yuecheh Chi (UCLA)
Frank. Davis (UCSB)
Debbie Donahue (UCSB)
Jeff Dozier (UCSB)
Dave Fisher (LLNL)
Jim Frew (UCSB)
Ken Gardels (UCB)
Steve Louis (LLNL)
Jim McGraw (LLNL)
Ed Mesrobian (UCLA)
Ron Musick (LLNL)
Davie Siegel (UCSB)
Joe Spahr (UCLA)
Dick Watson (LLNL)

Partners:

   Lawrence Livermore National Laboratory 

Section Coordinator: Richard Troy