[Historical Document]
University California / NASA
End to End Problems & Solutions in EOSDIS
Last Update: October 21, 1998.
BigSur: An Alternative Architecture
for The Mission To Planet Earth Information System.
This work provides an alternative option for the National Aeronautics &
Space Administration ( NASA ) Mission To
Planet Earth ( MTPE ), Earth Observing
System, Distributed Information System ( EOS-DIS
) other than the Hughes' Aerospace Earth Core System (ECS).
"End-to-End Problems & Solutions in EOSDIS" (we call it simply "BigSur")
is a NASA-sponsored multi-year project investigating alternative data management
strategies for NASA's Earth Observing System (EOS). The project includes
research at the Berkeley, Los
Angeles, San
Diego, and Santa
Barbara campuses of the University of California, and Lawrence
Livermore National Laboratory.
Under Construction...
Let's face it; We don't have the man-power to keep these pages current.
If these pages are stale, it's because we're busy writing code! Please
consider them permanently under construction!
Contents
This page:
This Site:
This page was written by Richard Troy, Project Lead, with introduction
by Michael Stonebraker, Principal Investigator.
Introduction
Principal Investigator: Michael Stonebraker
The purpose of this grant is explore an architecture for EOSDIS that
is "Database-centric". The main tenets of our architecture are:
-
Put all EOSDIS data in a next-generation Database Management System (DBMS).
-
Control the workflow of converting raw satellite imagery into "cooked"
data poducts througth DBMS triggers (events).
-
Effectively support ad-hoc inquiry as the major source of EOSDIS queries.
-
Allow "eager" and "lazy" execution of "Processes" in the workflow.
-
Run the same software system at SCFs and DAACs.
-
Support seamless distribution of data through distributed DBMS technology.
Some of the utility of our system: [added by R. Troy]
-
Provide Scientists an electronic notebook to describe and track their objects
and the processes performed against them.
-
Provide a processing system which can perform desired "Processes" against
scientific objects.
-
Provide the ability to automate Processes.
-
Provide "resource discovery" so that browsers need not know what exists
in advance.
-
Provide a framework for inter- and intra-diciplinary interoperabilty.
In addition, we are exploring several areas of research that are inherent
in such a DBMS-centric architecture. These include:
-
More effective wide-area distributed-DBMS technology. Our efforts focus
on a prototype distributed DBMS, called Mariposa.
-
A type library for optimally regridding satellite imagery (David
Siegel).
-
A more effective interface between a DBMS and a tertiary memory file system
. This is a part of the High Performance Storage System Project, HPSS
, headed by Dick Watson. Also see the National Storage Laboratory, NSL.
-
End-to-end modelling of the "end-to-end" problem of going from data source
to DBMS to visualization system. The " Gator
" project has as its goal to identify the ultimate bottlenecks in the overall
architecture and then focus on parallelization of these modules. Gator
is headed by Jim
Demmel.
-
A more flexible wide-area networking
protocol, headed by Joseph
Pasquale.
-
An advanced visualization system for specifying user interactions with
the database known as Tecate, headed by Peter Kochevar. This project is
now over, but it was hosted by the San Diego Super-computer Center Visualization
Research group, formerly at: http://www.sdsc.edu/SDSC/Research/Visualization/Tecate/tecate.html.
Brief History
In the pre-dawn epoch before time was recorded, there was Sequoia 2000,
an Earth Science project whose mission was to architect an "Alternative"
to the Hughes "let's ship CDs to scientists for Mission to Planet Earth"
model. During Sequoia, much almost-forgotten blood was spilled. Many tempers
flaired... Large beasts roamed the land and evolution was on the fast-track.
The evolutionary process yielded many important lessons and bore fruit
in the form of the seminary white paper, "EOSDIS Alternative Architecture,
Sequoia 2000 Technical Report 95/61, April 95". Among the most easily forgotten
lessons to be learned were these:
-
To decide is to be wrong; remain flexible and extensible. Focus instead
on providing Scientists a framework within which to describe their work,
and toolsets that make such descriptions and manipulations easy.
-
Vocabulary; Many identical terms are used similarly between diciplines
yet the speakers and listeners "talk past" each other because they think
they understand when they do not. Therefore, abstract ideas clearly, and
remove them from the barriers of language and diction by translating these
concepts into the native languages of each Scientific Dicipline.
Prominent examples: Process and Function. These are equivalent concepts
in our system. Another: Parameter and Argument. These too are equivalents...
In September of 1994, when this particular Grant began, the BigSur project
(as it has since become known) was the inheritor of the evolutionary progress
espoused in Sequoia 2000 (S2K). The Sequoia project was officially over,
but the name BigSur had not yet been chosen so "Son of Sequoia" was the
moniker... A few of the "worker-bees" - notably Jean Anderson - remained
with the new project and never really gave up the name "S2K", so you will
find quite a few materials that properly belong to BigSur with the Sequoia
name on them - please do not be confused by this!
So, BigSur started in '94 with a mission and some pre-existing evolutionary
history. The mission was to implement the vision proffered in the
previously mentioned white-paper. At that time, Paul
Brown (now at Informix) was the new "indian" and he officially had
no chief. I came on board in January of '95 to fill that role but it didn't
really matter; Paul and I worked together side by side, and were largely
the only full-time workers on the project.
Before I arrived, Paul had put together an initial database design which
matched his interpretation of what the white-paper said, and it was a very
good start indeed. Together, we built a working prototype
system in a matter of a few months! Paul primarily worked on
stuffing the database full of as many diverse datatypes as possible, while
I wrote applications against it. And together we improved the schema greatly.
Some of our results are visible on the prototype web page (url above).
In the Summer of '95, we hired Yuechen Chi to work directly with Roberto
Mechoso in UCLA. He has essentially been an "applications" oriented person
and has done some wonderful work... Paul left the group in December of
'95 to pursue a carrier with Illustra, and in January of '96 we hired a
replacement... The replacement didn't work out, and we were unable to find
a suitable person that the University could actually afford.
As time moved onward, BigSur progressed. Yuechen wrote some clever code
to wed parts of Roberto Mechosos Global Circulation Model - he calls it
'esmdis' - with BigSur and put it on the
web. We were still using Illustra (commercialized Postgres) as a database
engine and used tools such as "Tool Control Language/ToolKit" ( 'Tcl/Tk"
also developed at UCB) for an API and GUI programming.
Illustra was bought by Informix not long after Paul left in 1995, and
within a year it was abandoned by them in favor of a merged product called
"IUS" (INFORMIX Universal Server), but in late '96 it was still unavailable.
Also note that during this period "The World Wide Web" took off like a
rocket. While our Tcl/Tk is still viable, it's limited to Unix, and isn't
"web ready." This left us pinned in regarding our platforms and tools...
and we were too short-staffed to think about scrapping the tools and re-implementing.
With Paul gone, we could never accomplish that in the time remaining on
our grant.
As our prototype system was a functional success - if only partially
so - within the first year, eventually someone would want to make use of
it. In the spring of '97, the LaRC - NASA's Langley Research Center got
wind of BigSur, and realized its ability to solve their TRMM
(Tropical Rainfall Measurement Mission) data processing problems
when the Huges ECS (Earth Core System?) failed them. They wanted it to
run on "modern tools" such as a commercially available database (Illustra
was gone), and use Java as a GUI and possibly API language. Berkeley
Earth Science Tools performed these translations and commercialized
our work for them. Thankfully, it is the obligation of those who use UC
research to make derived works, so of course BEST has shared their system
with us. We now use in favor of our earlier work for many of the same reasons
as LaRC, and also because it inclues many practical features we left out
- among these are nifty things like trace features so you can see what
scientific processes are doing, and have a clue when things go wrong. ...Not
a bad idea...
In the mean time, we finally found some competent help in Mark Schimmelman,
who has been helping out for a while now. our focus with our remaining
time is to work on areas which need further attention - there are many
such areas as we have been so short handed for so long...
Acknowledgements of Partners and Participants, Past and
Present
Principal Investigator:
Michael Stonebraker
621 Soda Hall
University of California
Berkeley, Ca. 94720
mike@cs.berkeley.edu
Participants Present:
Tony Drumond (UCLA)
Roberto Mechoso (UCLA)
Mark Schimmelman (UCB)
Keith Sklower (UCB)
Mike Stonebraker (UCB)
Richard
Troy (UCB)
Participants Past:
Jean Anderson (UCSB)
Paul Brown (UCB)
Yuecheh Chi (UCLA)
Frank. Davis (UCSB)
Debbie Donahue (UCSB)
Jeff Dozier (UCSB)
Dave Fisher (LLNL)
Jim Frew (UCSB)
Ken Gardels (UCB)
Steve Louis (LLNL)
Jim McGraw (LLNL)
Ed Mesrobian (UCLA)
Ron Musick (LLNL)
Davie Siegel (UCSB)
Joe Spahr (UCLA)
Dick Watson (LLNL)
Partners:
Lawrence Livermore National
Laboratory
Section Coordinator: Richard
Troy