BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-91-01
ENTRY:: February 4, 1994
TITLE:: To Support Global Change Research
DATE:: September 17, 1991
AUTHOR:: Stonebraker, Michael
AUTHOR:: Dozier, Jeff
PAGES:: 26
ABSTRACT:: Improved data management is crucial to the success of
current scientific investigations of Global Change. New modes of
research, especially the synergistic interactions between observations
and  model-based simulations, will require massive amounts of
diverse data to be stored, organized, accessed, distributed,
visualized, and analyzed. Achieving the goals of the U. S. Global
Change Research Program will largely depend on more advanced data
management systems will allow scientists to manipulate large-scale
data sets and climate system models.

Refinements in computing--specifically involving storage, networking,
distributed file systems, extensible distributed data base management,
and visualization--can be applied to a range of Global Change applications
through a series of specific investigation scenarios. Computer
scientists and environmental researchers at several UC campuses
will collaborate to address these challenges. This project com-
plements both NASA's EOS project and UCAR's (University Corporation
for Atmospheric Research) Climate System's Modeling Program in
addressing the gigantic data requirements of Earth System Science
research before the turn of the century. Therefore, we have named
it Sequoia 2000, after the giant trees of the Sierra Nevada, the
largest organisms on the Earth's land surface.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-91-01

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-91-02
ENTRY:: March 4, 1994 
TITLE:: High Performance Network and Channel-Based Storage
DATE:: 
AUTHOR:: Katz, Randy H. 
PAGES:: 46 
ABSTRACT:: In the traditional mainframe-centered view of a 
computer system, storage devices are coupled to the system
through complex hardware subsystems called I/O channels.
With the dramatic shift towards workstation-based computing,
and its associated client/server model of computation, 
storage facilities are now found attached to file servers
and distributed throughout the network. In this paper, we 
discuss the underlying technology trends that are leading to
high performance network-based storage, namely advances in
networks, storage devices, and I/O controller and server
architectures. We review several commercial systems and research
prototypes that are leading to a new approach to high performance
computing based on network-attached storage. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-91-02

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-91-3
ENTRY:: March 4, 1994 
TITLE:: Rob-line Storage: Low Latency, High Capacity
        Storage Systems Over Geographically Distributed
        Networks 
DATE:: 
AUTHOR:: Katz, Randy H.
AUTHOR:: Anderson, Thomas E. 
AUTHOR:: Ousterhout, John K. 
AUTHOR:: Patterson, David A. 
PAGES:: 34
ABSTRACT:: Rapid advances in high performance computing are
making possible more complete and accurate computer-based 
modeling of complex physical phenomena, such as weather
front interactions, dynamics of chemical reactions, numerical
aerodynamic analysis of airframes, and ocean-land-atmosphere
interactions. Many of these "grand challenge" applications
are as demanding of the underlying storage system, in terms
of their capacity and bandwidth requirements, as they are on
the computational power of the processor. A global view of 
the Earth's ocean chlorophyll and land vegetation requires
over 2 terabytes of raw satellite image data [ISTP91]!

In this paper, we describe our planned research program in
high capacity, high bandwidth storage systems. The project
has four overall goals. First, we will examine new methods
for high capacity storage systems, made possible by low cost, 
small formfactor magnetic and optical tape systems. Second, 
access to the storage system, including devices, controllers,
servers, and communications links. Latency will be reduced
by extensive caching throughout the storage hierarchy. Third,
we will provide effective management of a storage hierarchy, 
extending the techniques already developed by Ousterhout for
his Log Structured File Systems. Finally, we will construct
a prototype high capacity file server, suitable for use on 
the National Research and Education Network (NREN). Such
research must be a cornerstone of any coherent program in
high performance computing and communications. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-91-3

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-91-04 
ENTRY:: February 4, 1994 
TITLE:: Sequoia 2000 Technical Report 91/4 
DATE:: April 1991
AUTHOR:: Chen, Jolly 
AUTHOR:: Larson, Ray 
AUTHOR:: Stonebraker, Michael  
PAGES:: 11 
ABSTRACT:: In this paper we explain the paradigm that
we are following for Sequoia 2000 object browsers.
It is intended to be a keyboard-free interface,
and is based on the "move and zoom" paradigm 
popularized for Navy ships by SDMS [HERO80].\fR
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-91-04

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-91-05
ENTRY:: February 4, 1994
TITLE:: AN OVERVIEW OF THE SEQUOIA 2000 PROJECT
DATE:: May 1991
AUTHOR:: Stonebraker, Michael
PAGES:: 11
ABSTRACT:: Achieving the goals of the U.S. Global
Change Research Program will depend not only on 
improved measurement systems, but also on improved 
data systems that will allow scientists to manipulate 
the resulting large-scale data sets and climate system 
models,  as well as compare model results with observations. 
New modes of research, especially the synergistic interactions 
between observations and model-based simulations, will require 
massive amounts of diverse data to be stored, organized, 
accessed, distributed, visualized, and analyzed. Computer 
scientists and environmental researchers at several UC
campuses are collaborating to address these challenges. 
Refinements in computing--specifically involving storage, 
networking, file systems, extensible data base management,
and visualization--will be applied to specific Global Change 
applications.

We have named this project Sequoia 2000, after the giant trees 
of the Sierra Nevada, the largest organisms on the Earth's 
land surface.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-91-05

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-91-06
ENTRY:: February 4, 1994
TITLE:: Network Issues for Sequoia 2000
DATE::
AUTHOR:: Pasquale, Joseph
AUTHOR:: Polyzos, George C.
PAGES:: 6
ABSTRACT:: The goals of the Sequoia 2000 Network 
are to provide high throughput for the massive 
observation input data and image output data 
characterizing Global Change applications, as 
well as real-time services for animations and
collaboration tools such as video conferencing.
The first phase of the network will be based on
a T3 (45 Mb/s) backbone and FDDI for local 
distribution. The research issues we are
focusing on include protocols that provide 
deterministic and statistical performance 
guarantees and take advantage of hierarchical 
coding of information, and the design of I/O 
system software that integrates process and 
device communication software with network
protocol software.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-91-06

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-91-07
ENTRY:: February 4, 1994
TITLE:: Internet Throughput and Delay Measurements
        Between Sequoia 2000 Sites
DATE:: December 1991
AUTHOR:: Pasquale, Joseph C.
AUTHOR:: Polyzos, George C.
AUTHOR:: Fall, Kevin R.
AUTHOR:: Kompella, Vachaspathi P.
PAGES:: 9
ABSTRACT:: We report performance 
measurements of Internet connections
between five Sequoia 2000 sites.
Throughput and delay statistics
are presented for various message sizes
and for both daytime and nighttime.
The highest throughput observed was
85\ KB/s between UCSD and UCLA at night
and the lowest was 1\ KB/s between 
UCSD and DWR during the day.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-91-07

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-91-08 
ENTRY:: February 14, 1994 
TITLE:: Early EOSDIS: A Data and
        Information System for the
        Study of Global Change 
DATE:: 
AUTHOR:: Dozier, Jeff 
PAGES:: 14
ABSTRACT:: The initial step in the development of the Earth 
Observing System Data and Information System (EOSDIS) is to
deliver a working prototype system for use by the Earth science
research community by July 1994. EOSDIS will be NASA's contribution
to the confederation of national and international agency data 
systems to support global change research and other uses of 
environmental data [Dozier and Ramapriyan, 1991], including GEWEX, 
the Global Energy and Water Cycle Experiment. This prototype Version
EOSDIS will not provide every feature that later systems will provide,
so clear choices must be made as to priorities in its implementation.
Technical readiness and affordability will determine how much capability 
the system will actually offer. 

EOSDIS Version 0 (VO) is a fusion of data holdings, data services, and
research community infrastructure. All must advance to improve the use
of data to support global change research. The science prioritites will
guide the choice of data sets to emphasize and provide to the community
and the levels of service to be encompassed within VO. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-91-08

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-91-10 
ENTRY:: February 14, 1994 
TITLE:: A Method for Refining Automatically-Discovered
        Lexical Relations: Combining Weak Techniques for
        Stronger Results 
DATE:: 
AUTHOR:: Grefenstette, Gregory 
AUTHOR:: Hearst, Marti A.  
PAGES:: 9 
ABSTRACT:: Knowledge-poor corpus-based approaches to natural
language processing are attractive in that they do not incur
the diffiulties associated with complex knowledge bases and
real-world inferences. However,  these kinds of language pro-
cessing techniques in isolation often of not suffice for a 
particular task; for this reason we are interested in finding
ways to combine various techniques and improve their results.

Accordingly, we conducted experiments to refine the results
of an automatic lexical discovery technique by making use of
a statistically-based syntactic similarity measure. The dis-
covery program uses lexico-syntactic patterns to find instances
of the hyponmy relation in large text bases. Once relations of
this sort are found, they should be inserted into an existing
lexicon or thesaurus. However, the terms in the relaiton may
have multiple senses, thus hampering automtic placement. In 
order to address this problem we tried to make a term-similarity
determination technique choose where, in an existing thesaurus,
to install a lexical relation. The union of these two corpus-
based methods is promising, although only partially successful
in the experiments run so far. Here we report some prelimimary
results, and make suggestions for how to improve the technique
in future.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-91-10

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-92-11
ENTRY:: March 4, 1994 
TITLE:: Abstracts: A Latency-Hiding Technique for High-Capacity
        Mass-Storage Systems 
DATE:: 
AUTHOR:: Fine, Joel A. :  
AUTHOR:: Anderson, Thomas E. 
AUTHOR:: Dahlin, Michael D. 
AUTHOR:: Frew, James 
AUTHOR:: Olson, Michael 
AUTHOR:: Patterson, David A. 
PAGES:: 24
ABSTRACT:: Extraordinary advances in digital storage technology are
rapidly making possible cost-effective, multiple-terabyte information
retrieval systems. The latency and bandwidth of these technologies
are typically much worse than what users of computer systems are 
accustomed to.  Unfortunately, traditional techniques of reducing
latency and improving bandwidth, caching and compression, by themselves
will not work well with the access patterns that we anticipate for 
these high-capacity systems.

We introduce and define a new storage management technique, called
abstracts. An abstract is an extraction of the "essential" part of
the data set. It is created using some combination of averaging,
subsetting, rounding, or some other method of condensing the data.
An abstract's composition is heavily dependent on the context in which
it is used. Each data set can have multiple abstracts associated
with it, each of which can be used to answer a query from an abstract,
effective bandwidth increases, because we transfer much less data
through the storage system. The counter-intuitive result is that
abstracts on robot-based tape storage systems can have lower latency
than full data sets on magnetic disks, because the inherent latency
disadvantage of tertiary systems can be overcome by the reduction
in transfer time due to the smaller transfer size. Moreover, because
many abstracts can fit in faster storage in the space occupied by a
single unabstracted data set, users can get the effect of magnetic
disk latencies for very large objects.

To evaluate the potential of abstracts, we examine four common queries
as well as a detailed case study. We also study teh statistical 
characteristics of several data sets in an effort to identify classes
of abstracting functions.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-92-11

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-92-12 
ENTRY:: February 18, 1994 
TITLE:: The Sequoia 2000 Storage Benchmark 
DATE:: 
AUTHOR:: Stonebraker, Michael 
AUTHOR:: Frew, James
AUTHOR:: Gardels, Kenn
AUTHOR:: Meredith, Jeff    
PAGES:: 14
ABSTRACT:: This paper presents a benchmark that concisely captures
the data base requirements of a collection of Earth Scientists 
working in the SEQUOIA 2000 project on various aspects of global
change research. This benchmark has the novel characteristic that
it uses real data sets and real queries that are representative of
engineering and scientific DBMS users, we claim that this bench-
mark represents the needs of this more general community. Also 
included in the paper are benchmark results for four example
DBMSs, ARC-INFo, GRASS, IPW and POSTGRES.	
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-92-12

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-92-13 
ENTRY:: February 18, 1994 
TITLE:: Predicate Migration: Optimizing Queries with 
        Expensive Predicates 
DATE:: December 3, 1992 
AUTHOR:: Hellerstein, Joseph M. 
PAGES:: 22
ABSTRACT:: The traditional focus of relational query optimization
schemes has been on the choice of join methods and join orders.
Restrictions have typically been handled in query optimizers by 
"predicate pushdown" rules, which apply restrictions in some
random order before as many joins as possible.  These rules work
under the assumption that restrictions is essentially a zerO-
time operation. However, today's extensible and object-oriented
database systems allow users to define time-consuming functions, 
which may be used in a query's restriction and join predicates.
furthermore, SQL has long supported subquery predicates, which
may be arbitrarily time-consuming to check. Thus restrictions
should not be considered zero-time operations, and the model of 
query optimization must be enhanced.

In this paper we develop a theory for moving expensive predicates
in a query plan so that the total cost of the plan--including the
costs of both joins and restrictions--is minimal. We present an 
algorithm to implement the theory, as well as results of our 
implementation in POSTGRES. Our experience with the newly enhanced
POSTGRES are orders of magnitude faster than plans generated by a 
traditional query optimizer. The additional complexity of considering
expensive predicates during optimization is found to be manageably 
small. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-92-13

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-92-14  
ENTRY:: February 18, 1994 
TITLE:: How Sequoia 2000 Addresses Issues in Data and
        Information Systems for Global Change 
DATE:: August 1992 
AUTHOR:: Dozier, Jeff
PAGES:: 16
ABSTRACT:: Sequoia 2000 is a project to design a next-generation
information system for accessing, archiving, distributing, managing,
and visualizing data for global change research. Funded by the
Digital Equipment Corporation, it has investigators from computer
science and Earth science departments on five campuses of the
University of California.  
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-92-14

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-92-15
ENTRY:: February 18, 1994 
TITLE:: Sequoia 2000 Network (S2Knet) Handbook 
DATE:: June 6, 1992
AUTHOR:: Pasquale, Joseph 
AUTHOR:: Fall, Kevin R. 
AUTHOR:: Forrest, Jon   
PAGES:: 14
ABSTRACT:: The construction of the Sequoia 2000 network
(S2Knet) is a joint effort involving people from the
University of California campuses of Berkeley, Los Angeles,
San Diego, and Santa Barbara, the UC Office of the Presi-
dent (UCOP), and the San Diego Supercomputer Center (SDSC).
The purpose of this handbook is to serve as a reference,
containing up-to-date information describing policy, 
topology, names, addresses, and routing. It also includes 
a list of contacts for network management. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-92-15

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-92-16 
ENTRY:: February 18, 1994 
TITLE:: Highlight: Using a Log-structured File
        System for Tertiary Storage Management 
DATE:: November 20, 1994 
AUTHOR:: Kohl, John T. 
AUTHOR:: Staelin, Carl 
AUTHOR:: Stonebraker, Michael   
PAGES:: 15 
ABSTRACT:: Robotic storage devices offer huge storage
capacity at a low cost per byte, but with large access
times. Integrating these devices into the storage
hierarchy presents a challenge to file system designers.
Log-structured file systems (LFSs) were developed to
reduce latencies involved in accessing disk devices, 
but their sequential write patterns match well with 
tertiary storage characteristics. Unfortunately, 
existing versions only manage memory caches and
disks, and do not support a broader storage hierarchy. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-92-16 

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-92-17 
ENTRY:: February 18, 1994 
TITLE:: Exploiting In-Kernel Data Paths to 
        Improve I/O Throughput and CPU Availability 
DATE:: November 1992 
AUTHOR:: Pasquale, Joseph 
AUTHOR:: Fall, Kevin R.   
PAGES:: 11
ABSTRACT::  We present the motivation, design, implementation,
and performance evaluation of a UNIX kernel mechanism
capable of establishing fast in-kernel data pathways
between I/O objects.
A new system call, \fIsplice()\fP moves data asynchronously
and without user-process intervention to and from I/O
objects specified by file descriptors.
Performance measurements indicate improved I/O throughput
and increased CPU availability attributable to reduce
context switch and data copying overhead.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-92-17

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-92-18 
ENTRY:: February 18, 1994 
TITLE:: A Performance Analysis of TCP/IP and UDP/IP
        Networking Software for the DECstation 5000 
DATE:: 
AUTHOR:: Kay, Jonathan 
AUTHOR:: Pasquale, Joseph   
PAGES:: 21
ABSTRACT:: Modern workstations increasingly rely on distributed
software such as NFS and NIS, yet the speed of networking software
is not improving as rapidly as the workstation and networking hardware,
leading to a network software bottleneck. We present detailed measure-
ments of various components of the TCP/IP and UDP/IP protocol stack on
a DECstation 5000/200 running Ultrix 4.2a. Measurements are by layer
(i.e. socket, transport, IP, data-link) and by function (i.e. checksum
computation, data copying, buffer management, protocol processing, and
operating system interaction), with further breakdowns within each
category. We show that checksum computation and data transfers dominate
component times for a real LAN workload, that using large packet MTUs 
(maximum tranmission units) is very important to achieving high through-
put, and that given the distribution of component times for small
sized messages, it will be difficult to improve latency. TCP and UDP
time breakdowns are shown to be quite similar, suggesting that "light-
weight" transport protocols are not likely to greatly decrease processing time. 
Finally, analytical models for network software processing times are presented. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-92-18

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-92-19  
ENTRY:: February 18, 1994 
TITLE:: A Static Analysis of I/O Characteristics
        of a Broad Class of Scientific Applications 
DATE:: November 1992 
AUTHOR:: Pasquale, Barbara K. 
AUTHOR:: Polyzos, George C.   
PAGES:: 14 
ABSTRACT:: Past research on high performance computers for scientific 
applications has concentrated on CPU speed and exploitation of parallelism,
but has, until very recently, neglected I/O considerations. This paper 
presents a study of the production workload at the San Diego Supercomputer
Center from an I/O requirements and characteristics perspective. Results of 
our analyses support our hypothesis that a significant proportion of scientific
applications with intensive I/O demands have predictable I/O requirements. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-92-19

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-92-20 
ENTRY:: February 25, 1994 
TITLE:: Tioga: Providing Data Management Support
        for Scientific Visualization Applications 
DATE:: 
AUTHOR:: Stonebraker, Michael 
AUTHOR:: Chen, Jolly 
AUTHOR:: Nathan, Nobuko
AUTHOR:: Paxson, Caroline 
PAGES:: 20
ABSTRACT:: We present a user interface paradigm for database
management systems motivated by scientific visualization 
applications. Our graphical user interface includes a 
"boxes and arrows" notation for database access and a 
flight simulator model of movement through information
space. We also provide means to specify a hierarchy of
abstracts of data of different types and resolutions. In
addition, multiple portals on data may be related as 
master and slaves. The underlying DBMS support for this 
system includes the compilation of query plans into 
megaplans, new algorithms for data buffering, and provisions
for a guaranteed rate of delivery.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-92-20

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-92-42
ENTRY:: April 27, 1994 
TITLE:: Remote Sensing of Global Surface Shortwave Radiation and PAR 
         Over the Ocean: a Sequoia Testbed 
DATE:: January l994 
AUTHOR:: Gautier, Catherine  
AUTHOR:: Byers, Michael   
PAGES:: 20
ABSTRACT:: During the past few years many methods have been proposed for estimating
         surface radiative fluxes (shortwave radiation, Photosynthetically Active
         Radiation - PAR) from satellite observations. We have developed algorithms for
         computing the shortwave radiative flux (shortwave irradiance) at the ocean
         surface from visible radiance observations and they have been found to be quite
         successful under most atmospheric and cloud conditions. For broken clouds,
         however, the simple plane parallel assumption for solving the radiative transfer
         equations may need to be corrected to account for cloud geometry. The
         estimation of PAR is simpler because the most commonly used satellite radiance
         measurements cover a similar region of the solar spectrum.

         We are in the process of producing global llSW and PAR as a contribution to the
         Sequoia 2000 project (to implement a distributed processing system designed
         for the needs of global change researchers). Results from our algorithms
         developed for Sequoia and preliminary global surface solar irradiance and PAR
         fields will be presented and discussed.
RETRIEVAL:: ocr (in all.ocr)
RETRIEVAL:: tiff (in 001-020.tif)
END:: UCB//S2K-92-42

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-91-09 
ENTRY:: February 17, 1994 
TITLE:: Automatic Acquisition of Hyponyms 
        from Large Text Corpora 
DATE:: July 1992 
AUTHOR:: Hearst, Marti A.
PAGES:: 8
ABSTRACT:: We describe a method for the 
automatic acquisition of the hyponymy 
lexical relation from unrestricted text.
Two goals motivate the approach: (i)
avoidance of the need for pre-encoded
knowledge and (ii) applicability across
a wide range of text. We identify a set of
lexico-syntactic patterns that are easily
recognizable, that occur frequently and 
across text genre boundaries, and that
indisputably indicate the lexical relation
of interest. We describe a method for 
discovering these patterns and suggest that
other lexical relations will also be acquirable
in this way. A subset of the acquisition algorithm 
is implemented and the results are used to
augment and critique the structure of a large
hand-built thesaurus. Extensions and applications
To areas such as information retrieval are suggested. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-91-09

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-21
ENTRY:: February 25, 1994 
TITLE:: Measurement, Analysis, and Improvement of
        UDP/IP Throughput for the DECstation 5000
DATE:: 
AUTHOR:: Kay, Jonathan 
AUTHOR:: Pasquale, Joseph 
PAGES:: 11
ABSTRACT:: Networking software is a growing bottleneck in 
modern workstations, particularly for high throughput 
applications such as networked digital video. We measure
various components of the UDP/IP protocol stack in a DEC-
station 5000/200 running Ultrix 4.2a, and quantify the way
in which checksumming and copying dominate the processing time
for high throughput applications. This paper describes network
software measurements and substantial performance improvements
which derive from a faster checksum implementation. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-21

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-22
ENTRY:: February 25, 1994 
TITLE:: Cases as Structured Indexes for Full-Length Documents 
DATE:: 
AUTHOR::  Hearst, Marti A. 
PAGES:: 6 
ABSTRACT:: Two long, full-length texts are not likely to
discuss all, or almost all, of the same subtropics or sub-
points. Even if the documents contain many of the same terms
the ways the terms are grouped to form subtopical disucssions
sill might be quite different. A solution is to create a
description of a document which lists all of its subtopical
discussions as well as its main topics. An index that indicates
this structure is an abstract representation of the document
and we can think of this index as a case in the Case-Based  
Reasoning (CBR) sense. This paper proposes the use of cases 
to represent the high-level structure of full-length documents
for the purpose of information retrieval. The cases are to be
used both for assessing document similarity and for helping
the user construct viable queries. The case can be transformed
in various ways in order to make it more similar to the 
descriptions of other documents; these tranformations include
generalizing, substituting, and emphasizing subtropic descrip-
tions. An advantage of this approach is that the cases that
represent the document are automatically generable. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-22

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-23
ENTRY:: February 25, 1994
TITLE:: The Sequoia 2000 Architecture and Implementation Strategy
DATE::
AUTHOR:: Stonebraker, Michael
AUTHOR:: Frew, James
AUTHOR:: Dozier, Jeff
PAGES:: 47
ABSTRACT:: This paper describes the Sequoia 2000 software
architecture and its current implementations, including layers
for Footprint, the file system, the DBMS, application, and the
network. Early prototype applications of this software include
a Global Change data schema, GCM integration, remote sensing, a
data system for climate studies, and operational uses by the DWR.
Longer-range efforts include transfer protocols for moving elements
of the database, controllers for secondary and tertiary storage,
distributed file system, and a distributed DBMS. The implementation
plan ensures that the current architecture is stabilized and robust
by the end of 1993.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-23

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-24
ENTRY:: February 25, 1994 
TITLE:: TextTiling: A Quantitative Approach to Discourse
        Segmentation 
DATE:: 
AUTHOR:: Hearst, Marti A.  
PAGES:: 10
ABSTRACT:: This paper represents TextTiling, a method for
partitioning full-length text documents into coherent 
multiparagraph units. The layout of text tiles is meant
to reflect the pattern of subtropics contained in an 
expository text. The approach uses lexical analyses based
on tfidf, and information retrieval measurement, to determine
the extent of the tiles, incorporating thesaural information
via a statistical disambiguation algorithm. The tiles have 
been found to correspond will to human judgements of the 
major subtopic boundaries of science magazine articles.  
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-24

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-25
ENTRY:: February 28, 1994 
TITLE:: DARWIN: On the Incremental Migration
        of Legacy Information Systems  
DATE::  
AUTHOR:: Brodie, Michael L. 
AUTHOR:: Stonebraker, Michael  
PAGES:: 32 
ABSTRACT:: As is painfully evident today, the deterioration
of the transportation, education, and other national infra- 
structures negatively impacts many aspects of life, business,
and our economy. This has resulted, in part, when responses
to short term crises discourage investing in infrastructure
enhancement and when there are no effective information system
(IS) insfrastructure that has strong negative impacts on ISs,
on the organizations they support, and, ulimately, on the 
economy. This paper addresses the problem of legacy IS migration
by methods that mediate between spectrum of supporting methods
for migrating legacy ISs into a target environment that includes
rightsized  hardware and modern technologies (i.e., infrastructure)
wuch as client-server architecture, DBMSs and CASE. We illustrate 
the methods with two migration case studies of multi-million dollar,
mission critical legacy ISs. The contribution of this paper is a 
highly flexible set of migration methods that is tailorable to most
legacy ISs and business contexts. The goal is to support continuous,
iterative evolution. The critical success factor, and challenge in
deployment, is to identify appropriate portions of the IS and the
associated planning and management to achieve an incremental migration 
that is feasible with respect to the technical and business require-
ments. The paper concludes with a list of desirable migration tools
for which basic research is required. The principles described in
this paper can be  used to design future ISs and an infrastructure 
that will support continuous IS evolution to avoid future legacy
ISs. 
RETRIEVAL:: ocr (in all.ocr)
RETRIEVAL:: tiff (in {001-032}.tif)
END:: UCB//S2K-93-25

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-26
ENTRY:: February 25, 1994 
TITLE:: Subtopic Structuring for Full-Length Document Access
DATE:: 
AUTHOR::  Hearst, Marti A. 
AUTHOR:: Plaunt, Christian 
PAGES:: 10
ABSTRACT:: We argue that the advent of large volumes of full-length
text, as opposed to short texts like abstracts and newswire, should
be accompanied by corresponding new approaches to information access.
Toward this end, we discuss the merits of imposing structure on full-
length text documents; that is, a partition of the text into 
coherent multi-paragraph units that represent the pattern of sub- 
topics that comprise the text. Using this structure, we can make
a distinction between the main topics, which occur throughout the
length of the text, and the subtopics, which are of only limited
extent. We discuss why recognition of subtopic structure is 
important and how, to some degree of accuracy, it can be found.
We describe a new way of specifying queries on full-length 
documents and then describe an experiment in which making use of 
the recognition of local structure achieves better results on a
typical information retrieval task than does a standard 1R measure. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-26

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-27
ENTRY:: February 25, 1994 
TITLE:: A Simple Visualization Management System: 
        Bridging the Gap Between Visualization and
        Data Management 
DATE:: April 30, 1993 
AUTHOR:: Kochevar, Peter 
AUTHOR:: Ahmed, Zahid 
AUTHOR:: Shade, Jonathan 
AUTHOR:: Sharp, Colin 
PAGES:: 16
ABSTRACT::  A prototype visualization management system is
described which merges the capabilities of a database manage-
ment system with any number of exising visualization packages
such as AVS or IDL. The protoype uses the Postgres database
management system to store and access Earth science data 
through a simple graphical browser. Data located in the 
database is visualized by automatically invoking a desired
visualization package and downloading an appropriate script
or program. The central idea underlying the system is that 
information on how to visualize a dataset is stored in the
database with the dataset itself. As a result, scientists can
concentrate more on their science rather than on the process
of doing it since visualization programs do not have to be
created or searched for each time a dataset is to be viewed. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-27

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-28
ENTRY:: February 28, 1994 
TITLE:: The Design and Implemention of the Inversion  
        File System 
DATE:: April 1993 
AUTHOR:: Olson, Michael A. 
PAGES:: 34
ABSTRACT:: This paper describes the design, implementation,
and performance of the Inversion file system. Inversion 
provides a rich set of services to file system users, and 
manages a large tertiary data store. Inversion is built
on top of the POSTGRES database system, and takes advantage
of low-level DBMS services to provide transaction protection,
fine-grained time travel, and fast crash recovery for user  
files and file system metadata. Inversion gets between 30%
and 80% of the throughput of ULTRIX NFS backed by a non-
volatile RAM cache. In addition, Inversion allows users 
to provide code for execution directly in the file system
manager, yielding performance as much as seven times better
than that of ULTRIX NFS.
RETRIEVAL:: ocr (in all.ocr)
RETRIEVAL:: tiff (in {001-034}.tif)
END:: UCB//S2K-93-28

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-29
ENTRY:: February 25, 1994 
TITLE:: Tioga: Providing Data Management Support for 
        Scientific Visualization Applications 
DATE:: 
AUTHOR:: Stonebraker, Michael 
AUTHOR:: Chen, Jolly 
AUTHOR:: Nathan, Nobuko 
AUTHOR:: Paxson, Caroline 
AUTHOR:: Wu, Jiang 
PAGES:: 14 
ABSTRACT:: We present a user interface paradigm for 
database management systems that is motivated by 
scientific visualization applications. Our graphical
user interface includes a "boxes and arrows" notation
for database access and a flight simulator model of
movement through information space. We also provide
means to specify a hierarchy of abstracts of data of
different types and resolutions, so that a "zoom"
capability can be supported. The underlying DBMS
support for this system is described and includes the
compilation of query plans into megaplans,  new algorithms
for data buffering, and provisions for a guaranteed rate of
data delivery. The current state of the Tioga implementation
is also described.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-29

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-30
ENTRY:: February 25, 1994 
TITLE:: Large Object Support in POSTGRES 
DATE:: 
AUTHOR:: Stonebraker, Michael 
AUTHOR:: Olson, Michael 
PAGES:: 8
ABSTRACT:: This paper presents four implementations
for support of large objects in POSTGRES. The four
implementations offer varying levels of support for
user-defined storage managers available in POSTGRES
is also detailed. The performance of all four large
object implementations on two different storage 
devices is presented.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-30

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-31
ENTRY:: February 25, 1994 
TITLE:: Mariposa: A New Architecture for Distributed Data 
DATE:: 
AUTHOR:: Stonebraker, Michael 
AUTHOR:: Aoki, Paul M. 
AUTHOR:: Devine, Robert 
PAGES:: 17
ABSTRACT:: We describe the design of Mariposa, an experimental 
distributed data management system that provides high performance 
in an environment of high data mobility and heterogeneous host 
capabilities. The Mariposa provides a general, flexible platform
for the development of new algorithms for distributed query 
optimization, storage management, and scalable data storage
structures. This flexibility is primarily due to a unique rule-
based design that permits autonomous, local-knowledge decisions
to be made regarding data placement, query execution location,
and storage management.  
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-31

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-32
ENTRY:: February 25, 1994 
TITLE:: Efficient Organization of Large Multidimensional
        Array  
DATE:: 
AUTHOR:: Sarawagi, Sunita  
AUTHOR:: Stonebraker, Michael 
PAGES:: 17
ABSTRACT:: Large multidimensional arrays are widely used in
scientific and engineering database applications. In this paper,
we present methods of organizing arrays to make their access 
on secondary and tertiary memory devices fast and efficient. We
have developed four techniques for doing this: (1) storing the
array in multidimensional "chunks" to minimize the number of 
blocks fetched, (2) reordering the chunked array to minimize
seek distance between accessed blocks, (3) maintaining redundant 
copies of the array, each organized for a different chunk size and
ordering and (4) partitioning the array onto platters of a 
tertiary memory device so as to minimize the number of platter 
switches. Our measurments on real data sets obtained from global
change scientists demonstrate that accesses on arrays organized
using the above techniques are often an order of magnitude faster
than on the original unopimized data. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-32

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-33 
ENTRY:: February 25, 1994 
TITLE:: Origins of Multi-Sector Scientific Collaboration:
        A Report on Research in Progress 
DATE:: February 1992 
AUTHOR:: Weedman, Judith  
PAGES:: 17
ABSTRACT:: Sequoia 2000 is a research initiative funded by the
Digital Equipment Corporation to develop large capacity object
servers to support global change research. Existing hardware,
software, network technology, and visualization techniques are
inadequate to the task of handling the terabytes of data which
global change researchers need to  access and manipulate. The
purpose of Sequoia 2000 is to develop the needed technology 
and to create an electronic repository in which researchers'
data sets, programs, documents, and simulation outputs can be
stored and made available to multiple users. Sequoia 2000 is a
multidisciplinary, multi-campus, multi-agency project; researchers
are from the fields of computer science, information, and global
change, and are located in private industry, universities, and
state and federal agencies. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-33

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-34
ENTRY:: February 25, 1994 
TITLE:: A Simple Research Paradigm in the Context of the 
        Sequoia 2000 Project and its Application to an 
        Ocean-Atmosphere Interaction Study 
DATE:: June 1993 
AUTHOR:: Waliser, D. E. 
AUTHOR:: Mechoso, C. R
AUTHOR:: Gautier, Catherine 
AUTHOR:: Neelin, J. D. 
PAGES:: 9
ABSTRACT:: This paper presents an application of a common 
research paradigm that can help enhance and facilitate the
conceptual interaction between the research goals of climate
and global change researchers and the design and implementation
goals of the computer scientists and engineers. As presented,
the paradigm fits neatly into the Sequoia 2000 architecture,
and can be applied across scientific disciplines and to all
levels of scientific research, from the program level to the
detailed analysis level. Each of these aspects are discusssed
and example applications of the paradigm to the area of ocean-
atmosphere interactions are given, including a detailed application
to the analysis of evaporative heat flux parameterizations for ocean
general circulation models. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-34 

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-35
ENTRY:: February 25, 1994 
TITLE:: A Visualization Architecture for the 
        Sequoia 2000 Project 
DATE:: 
AUTHOR:: Kochevar, Peter  
AUTHOR:: Ahmed, Zahid 
AUTHOR:: Wanger, Len 
AUTHOR:: Shade, Colin 
AUTHOR:: Sharp, Jonathan 
PAGES:: 20
ABSTRACT:: An architecture is described for the Tioga
Visualization Management System which is under development
as part of the Sequoia 2000 Project. This system brings 
together the capabilities of a database management system
a scientific visualization system, and a graphical user-interface
builder. The paper concentrates on the front-end of Tioga which
is interactive visualizations of data that reside in a database
management system. The Visualization Executive achieves this 
goal by mixing techniques from knowledge-based systems with 
those of scientific visualization and user-interface design.
The intent is to free scientists as much as possible from 
having to deal with the process of doing science so that they
can concentrate on the science itself. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-35

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-36
ENTRY:: February 25, 1994 
TITLE:: 

DATE:: 
AUTHOR::  
AUTHOR:: 
AUTHOR:: 
AUTHOR:: 
PAGES:: 
ABSTRACT:: 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-36

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-38
ENTRY:: March 10, 1994
TITLE:: Extending a Graphical Query Language to Support
        updates, Foreign Systems, and Transactions
DATE::
AUTHOR:: Chen, Jolly
AUTHOR:: Aiken, Alexander
AUTHOR:: Nathan, Nobuko
AUTHOR:: Paxson, Caroline
AUTHOR:: Stonebraker, Michael
AUTHOR:: Wu, Jiang
PAGES:: 14
ABSTRACT:: In [STON93] we proposed a new user interface
paradigm called Tioga for interacting with database manage-
ment systems. Tioga simplifies the task of building database
applications and is geared especially towards the needs of
scientific users. We borrow the "boxes and arrows" visual
programming notation of scientific visualization systems and
allow users to graphically construct applications by using
database procedures as building blocks.

This paper extends the Tioga paradigm to a general database
programming environment. In particular, we address three
shortcomings of graphical query languages. First, we define
a mechanism for allowing general programs--not just database
procedures--as building blocks. This extension allows better
handling of general data entry and data visualization needs
and provides an interface to foreign systems. Second, we permit
database updates. Third, we define a transaction semantics for
graphical query languages. Unlike traditional transactions,
Tioga transactions contain a directed graph of queries instead
of a linear sequence of queries. We explore concurrency control
techniques to promote both intra-transaction and inter-transaction
parallelism.

Finally, we present query processing strategies for graphical
queries with general building blocks, updates, and transactions.
We show how to efficiently execute a Tioga application by de-
composing the application into components that are individually
optimized.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-38


BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-94-40
ENTRY:: February 25, 1994
TITLE:: Database Management for Data Visualization
DATE::
AUTHOR:: Kochevar, Peter
PAGES:: 12
ABSTRACT:: Visualization management systems which integrate
database management, data visualization, and graphical
user-interface generation into one package are becoming
essential tools for conducting science. Unfortunately,
the link between data management and data visualization
is not very well understood. To make matters worse, most
database management systems available today are not well-
suited for handling large, time-sequenced data sets that
are common to many scientific disciplines. The reason for
this shortfall is that most database systems do not use an
appropriate data model, are not geared toward real-time
operation, and they have inadequate user-interfaces. Hints
as to how database systems can fix these problems are
given so that effective visualization management systems
can be constructed.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-94-40

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-94-41
ENTRY:: March 4, 1994
TITLE:: GIPSY: Georeferenced Information 
        Processing SYstem 
DATE:: March 25, 1994
AUTHOR:: Woodruff, Allison Gyle 
AUTHOR:: Plaunt, Christian 
PAGES:: 24
ABSTRACT:: In this paper we present an algorithm which
automatically extracts geopostitional coordinate index
terms from text to support georeferenced document indexing
and retrieval. Under this algorithm, words and phrases 
containing geographic place names or characteristics are
extracted from a text document and used as input to database
functions which use spatial reasoning to approximate stat-
istically the geoposition being referenced in the text. WE
conclude with a discussion of preliminary results and future
work. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-94-41

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-94-42
ENTRY:: March 4, 1994
TITLE:: Remote Sensing of Global Surface
        Shortwave Radiation and PAR Over
        the Ocean: a Sequoia Testbed
DATE:: January 1994
AUTHOR:: Gautier, Catherine
AUTHOR:: Byers, Michael
PAGES:: 20
ABSTRACT:: During the past few years many methods
have been proposed for estimating surface radiative
fluxes (shortwave irradiance) at the ocean surface
from visible radiance observations and they have
been found to be quite successful under most
atmospheric and cloud conditions. For broken
clouds, however, the simple plane parallel
assumption for solving the radiative transfer
equations may need to be corrected to account
for cloud geometry. The estimation of PAR is
simpler because the most commonly used satellite
radiance measurements cover a similar region of
the solar spectrum.
RETRIEVAL:: tiff (in {001-020}.tif)
RETRIEVAL:: ocr (in all.ocr)
END:: UCB//S2K-94-42


BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-93-44
ENTRY:: February 25, 1994 
TITLE:: HERMES A Prototype Distributed Application 
        Management System  
DATE:: October 22, 1993 
AUTHOR:: Hanyzewski, G. A.  
AUTHOR:: Spahr, J. 
AUTHOR:: Mechoso, C. R. 
AUTHOR:: Moore, R. W. 
PAGES:: 9
ABSTRACT:: This paper presents a prototype system for
on-line distributed application management. The system
allows researchers to determine the exact state of 
remotely executing applications by allowing interactive
acquisition and visualization of output datasets as they
are calculated. This paper discusses the general design
as well as implementation details, case studies, and 
future directions for development. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-93-44

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-94-45
ENTRY:: March 10, 1994
TITLE:: Single Query Opimization for Tertiary Memory 
DATE:: December 1993
AUTHOR:: Sarawagi, Sunita 
AUTHOR:: Stonebraker, Michael 
PAGES:: 12
ABSTRACT:: We present query execution strategies that are optimized
for the characteristics of tertiary memory devices. Traditional 
query execution methods are oriented to magnetic disk or main 
memory and perform poorly on tertiary memory. Our methods use
ordering and batching techniques on the I/O requests to reduce
the media switch cost and seek cost on these devices. Some of
our methods are provably optimal and others are shown to be 
superior by simulation and cost formula analysis. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-94-45


BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-94-46
ENTRY:: April 29, 1994
TITLE:: RP: A Family of Order Preserving Scalable
        Distributed Data Structures
DATE::
AUTHOR:: Litwin, W.
AUTHOR:: Neimat, M-A.
AUTHOR:: Schneider, D.
PAGES:: 19
ABSTRACT:: Hash-based scalable distributed data structures (SDDSs),
like LH* or DDH, for networks of interconnected computers (multi-
computers) were shown to open new perspectives for file management.
We propose a family of ordered SDDSs, called RP*, providing for
ordered and dynamic files on multicomputers, and thus for more
efficient processing of range queries and of ordered traversals of
files. The basic algorithm termed RP* N, builds the file with the
same key space range partitioning as a B-tree, but avoids indexes
through the use of multicast. The algorithms, RP*c and RP*s enhance
the throughput for faster networks, adding the indexes on clients,
or on clients and servers, decreasing or avoiding the multicast. RP
files are shown highly efficient with access performance exceeding
traditional files by an order of magnitude or two, and, for non-
range queries, very close to LH*.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-94-46

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-94-47
ENTRY:: April 29, 1994
TITLE:: A Hydrographic Database built on Montage and S-PLUS
DATE:: March 1994
AUTHOR:: Farrell, W. E.
AUTHOR:: Gaffney, J.
AUTHOR:: Given, J.
AUTHOR:: Jenkins, R. D.
AUTHOR:: Hall, N.
PAGES:: 19
ABSTRACT:: (none)
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-94-46

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-94-48
ENTRY:: April 29, 1994
TITLE:: Zooming and Tunneling in Tioga:
        Supporting Navigation in Multi-
        dimensional Space
DATE:: March 1994
AUTHOR:: Woodruff, Allison
AUTHOR:: Wisnovsky, Peter
AUTHOR:: Taylor, Cimarron  
AUTHOR:: Stonebraker, Michael 
AUTHOR:: Paxson, Caroline
AUTHOR:: Chen, Jolly 
AUTHOR:: Aiken, Alexander 
PAGES:: 8
ABSTRACT:: In [STON93] we proposed a visual programming
system called Tioga. The Tioga system applies a boxes 
and arrows programming notation to allow nonexpert users
to graphically construct database applications. Users 
connect database procedures using a dataflow model. 
Browsers are used to visualize the resulting data.

This paper describes extensions to the Tioga browser
protocol. These extensions allow sophisticated, flight-
simulator navigation through a multidimensional data 
space. This design also incorporates wormholes to allow
tunneling between different multidimensional spaces. 
Wormholes are shown to be substantial generalizations
of hyperlinks in a hypertext system.

These powerful mechanisms for relating data provide
users with great flexibility. For example, users can
create magnifying glasses that provide an enhanced view
of the underlying data.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-94-48

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-94-49
ENTRY:: April 29, 1994
TITLE:: An Economic Paradigm for Query Processing and 
        Data Migration in Mariposa 
DATE:: 
AUTHOR:: Stonebraker, Michael 
AUTHOR:: Devine, Robert 
AUTHOR:: Kornacker, Marcel 
AUTHOR:: Litwin, Witold 
AUTHOR:: Pfeffer, Avi 
AUTHOR:: Sah, Adam 
AUTHOR:: Staelin, Carl 
PAGES:: 24
ABSTRACT::  In this paper we explore query execution and 
storage management issues for Mariposa, a distributed data
base system under construction at Berkeley. Because of the
extreme complexity of both issues, we have adopted an under-
lying economic paradigm for both problems. Hence, queries
receive a budget which they spend to obtain their answers,
and each processing site attempts to maximize income by 
buying and selling storage objects and processing queries for
locally stored objects. This paper presents the protocols 
which underlie this economic system. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-94-49

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-94-50
ENTRY:: April 29, 1994
TITLE:: Design and Implementation of DDH:
        A Distributed Dynamic Hashing Algorithm 
DATE::
AUTHOR:: Devine, Robert
PAGES:: 14
ABSTRACT::  DDH extends the idea of dynamic hashing
algorithms to distributed systems. DDH spreads data
across multiple servers in a network using a novel
autonomous location discovery algorithm that learns
the bucket locations instead of using a centralized 
directory.

We describe the design and implementation of the 
basic DDH algorithm using networked computers. 
Performance results show that the prototype of 
DDH hashing is roughly equivalent to conventional
single-node hashing implementations when compared
with CPU time or elapsed time. Finally, possible 
improvements are suggested to the basic DDH algo-
rithm for increased reliability and robustness. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-94-50

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-94-51
ENTRY:: April 29, 1994
TITLE:: The Sequoia 2000 Showcase
        An S2K Technical Report  
DATE:: March 20, 1994
AUTHOR:: Norris, C. L. 
AUTHOR::  Chen, S.-C. 
AUTHOR:: Roads, J. O. 
PAGES:: 6
ABSTRACT:: Environmental investigators view space and time 
differently, mainly through emphasis on different environmental
variables and data sets. IN fact, because it has been so dif-
ficult to fully develop any data set, many investigators have
spent their lifetimes emphasizing a single data set. Different
data sets and variables are sometimes comparedf with each other
in review articles or in modeling studies, but truly comprehensive
comparisons await development of proposed S2K-like database systems.
In these databases, we will be able to go to a generic environmental 
data base and extract all manner of relevant data sets and environ-
mental variables which will then be merged and output with a set of
graphics packages under the control of the person doing the merge.
These integrated views will undoubtedly give us new insight into how
our world works.
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-94-51

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-94-52
ENTRY:: April 29, 1994
TITLE:: An Intelligent Assistant for Creating Data Flow Visualization
        Networks 
DATE:: March 20, 1994
AUTHOR:: Kochevar, Peter 
AUTHOR:: Ahmed, Zahid 
PAGES:: 12
ABSTRACT:: Non-visualization experts, including most scientists, find 
visualization systems like AVS too difficult to use. One approach to 
assisting these end-users in doing interactive visualization is to 
embed the knowledge of visualization experts into an intelligent 
system. A prototype, called Tecate, of such a system has been devel-
oped as part of the Sequoia 2000 Project. In this system, a Planner
makes use of expert knowledge stored in a Knowledge Base to create
data-flow visualization programs. The Planner takes as input a 
description of the data to be visualized and an indication of the
data analysis goals of an end-user. From this information, an AVS
network script is produced that when executed, builds an appropriate
visualization of the indicated data set. The networks so produced
make use of both a restricted set of standard AVS modules and a
collection of custom ones which operate on data structured as
fiber bundles.  
RETRIEVAL:: postscript (in all.ps)
END:: UCB//S2K-94-52

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//ERL-94-53
ENTRY:: May 16, 1994
TITLE:: Experiments with the Tenet Real-Time Protocol
        Suite on the Sequoia 2000 Wide Area Network 
AUTHOR:: Banjerea, Anindo
AUTHOR:: Knightly, Edward W. 
AUTHOR:: Templin, Fred L. 
AUTHOR:: Zhang, Hui 
PAGES:: 17
ABSTRACT:: Emerging distributed multimedia applications have
stringent performance requirements in terms of bandwidth, 
delay, delay-jitter, and loss rate. The Tenet real-time
protocol suite provides the services and mechanisms for
delivering such performance guarantees, even during periods
of high network load and congestion. The protocols achieve
this by using resource management, connection admission
control, and appropriate packet service disciplines inside the
network. The Sequoia 2000 network employs the Tenet Protocol 
Suite at each of its hosts and routers making it one of the 
first wide area packet-switched networks to provide end-to-end
per-connection performance guarantees. This paper presents 
experiments of the Tenet protocols on the Sequoia 2000 network
including measurments of the performance of the protocols, the
service received by real multimedia applications using the 
protocols, and comparisons with the service received by 
applications that use the Internet protocols (UDP/IP). We
conclude that the Tenet protocols successfully protect the
real-time channels from other traffic in the network, including
other real-time channels, and continue to meet the performance
guarantees, even when the network is highly loaded. 
RETRIEVAL:: postscript (in all.ps)
END:: UCB//ERL-94-53

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-94-56
ENTRY:: January 4, 1995
DATE:: December 1994
TITLE:: High-Concurrency Locking in R-Trees*
AUTHOR:: Banks, Douglas
AUTHOR:: Kornacker, Marcel
AUTHOR:: Stonebraker, Michael
PAGES:: 15
ABSTRACT:: In this paper we present a solution to the problem of
        concurrent operations in R-trees, a dynamic access structure
        capable of storing multidimensional and spatial data. We
        describe the R-link tree, a variant of the R-tree that adds
        sibling pointers to nodes, a technique first deployed in
        B-link tree, a variant of the R-tree that adds sibling 
        pointers to nodes, a technique first deployed in B-link trees,
        to compensatyee for concurrent structure modifications.  The
        main obstacle to the use of sibling pointers is the lack of
        linear ordering among the keys in an R-tree; we overcome this
        by assigning sequence numbers to nodes that let us reconstruct
        the "lineage" of a node at any point in time.  The search, 
        insertion and deletion algorithms for R-link trees are designed
        to lock at most two nodes at a time and the locking can be 
        shown to be deadlock-free.  In addition, we describe how R-link
        trees can be made recoverable so that they are instantly 
        available after a crash and we further describe how to achieve
        degree 3 consistency with an inexpensive predicate locking
        mechanism.    

RETRIEVAL:: postscript  (in s2k-94-56.ps)
END::  UCB//S2K-94-56

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-94-57
ENTRY:: December 10, 1994
DATE:: September 1994
TITLE:: Vision for Sequoia 2000 Phase II:
        The Sequoia Computational Infrastructure
AUTHOR:: Pasquale, Joseph
AUTHOR:: Katz, Randy
AUTHOR:: Dozier, Jeff
PAGES:: 4
ABSTRACT:: The Sequoia 2000 project, a collaboration of computer
        scientists and Earth Scientists at the University of 
        California, is beginning its second 3-year phase.  The 
        major goal for Phase II is to support power harnessing,
        the ability to dynamically concentrate as much of the 
        cumulative resource power in a wide-area distributed 
        system to meet the demands of any single application.
        The project will focus on creating an underlying hardware
        and software infrastructure that supports power harnessing,
        and above which software systems targeted to support
        Earth science applications can be built.  As in Phae I,
        the project will seek to develop partnerships, both 
        financial and intellectual, with the university, industry,
        and the state and federal government.
        
RETRIEVAL:: postscript  (in s2k-94-57.ps)
END::  UCB//S2K-94-57

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-94-58
ENTRY:: December 10, 1994
DATE:: September 1994
TITLE:: Sequoia 2000- A Reflection on the First Three Years
AUTHOR:: Stonebraker, Michael
PAGES:: 9
ABSTRACT:: This paper describes the SEQUOIA 2000 project and
        its implementation efforts during the first three
        years.  Included are the objectives we had, how we
        chose to address them and some of the lessons we
        learned from this endeavor.

RETRIEVAL:: postscript  (in s2k-94-58.ps)
END::  UCB//S2K-94-58

BIB-VERSION:: CS-TR-v2.0
ID:: UCB//S2K-94-59
ENTRY:: December 10, 1994
TITLE:: Sequoia 2000 Metadata Schema for Satellite
        Images
AUTHOR:: Anderson, Jean T.
AUTHOR:: Stonebraker, Michael
PAGES:: 7
ABSTRACT:: Sequoia 2000 schema development is based on emerging
        geospatial standards to accelerate development and facilitate
        data exchange.  This paper focuses on the metadata schema
        for digital satellite images.  We examine how satellite
        metadata are defined, used, and maintained.  We discuss the
        geospatial standards we are using, and describe a SQL
        prototype that is based on the Spatial Archive and Interchange
        Format (SAIF) standard and implemented in the illustra 
        object-relational database.

RETRIEVAL:: postscript  (in s2k-94-59.ps)
END::  UCB//S2K-94-59