BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-91-01 ENTRY:: February 4, 1994 TITLE:: To Support Global Change Research DATE:: September 17, 1991 AUTHOR:: Stonebraker, Michael AUTHOR:: Dozier, Jeff PAGES:: 26 ABSTRACT:: Improved data management is crucial to the success of current scientific investigations of Global Change. New modes of research, especially the synergistic interactions between observations and model-based simulations, will require massive amounts of diverse data to be stored, organized, accessed, distributed, visualized, and analyzed. Achieving the goals of the U. S. Global Change Research Program will largely depend on more advanced data management systems will allow scientists to manipulate large-scale data sets and climate system models. Refinements in computing--specifically involving storage, networking, distributed file systems, extensible distributed data base management, and visualization--can be applied to a range of Global Change applications through a series of specific investigation scenarios. Computer scientists and environmental researchers at several UC campuses will collaborate to address these challenges. This project com- plements both NASA's EOS project and UCAR's (University Corporation for Atmospheric Research) Climate System's Modeling Program in addressing the gigantic data requirements of Earth System Science research before the turn of the century. Therefore, we have named it Sequoia 2000, after the giant trees of the Sierra Nevada, the largest organisms on the Earth's land surface. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-91-01 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-91-02 ENTRY:: March 4, 1994 TITLE:: High Performance Network and Channel-Based Storage DATE:: AUTHOR:: Katz, Randy H. PAGES:: 46 ABSTRACT:: In the traditional mainframe-centered view of a computer system, storage devices are coupled to the system through complex hardware subsystems called I/O channels. With the dramatic shift towards workstation-based computing, and its associated client/server model of computation, storage facilities are now found attached to file servers and distributed throughout the network. In this paper, we discuss the underlying technology trends that are leading to high performance network-based storage, namely advances in networks, storage devices, and I/O controller and server architectures. We review several commercial systems and research prototypes that are leading to a new approach to high performance computing based on network-attached storage. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-91-02 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-91-3 ENTRY:: March 4, 1994 TITLE:: Rob-line Storage: Low Latency, High Capacity Storage Systems Over Geographically Distributed Networks DATE:: AUTHOR:: Katz, Randy H. AUTHOR:: Anderson, Thomas E. AUTHOR:: Ousterhout, John K. AUTHOR:: Patterson, David A. PAGES:: 34 ABSTRACT:: Rapid advances in high performance computing are making possible more complete and accurate computer-based modeling of complex physical phenomena, such as weather front interactions, dynamics of chemical reactions, numerical aerodynamic analysis of airframes, and ocean-land-atmosphere interactions. Many of these "grand challenge" applications are as demanding of the underlying storage system, in terms of their capacity and bandwidth requirements, as they are on the computational power of the processor. A global view of the Earth's ocean chlorophyll and land vegetation requires over 2 terabytes of raw satellite image data [ISTP91]! In this paper, we describe our planned research program in high capacity, high bandwidth storage systems. The project has four overall goals. First, we will examine new methods for high capacity storage systems, made possible by low cost, small formfactor magnetic and optical tape systems. Second, access to the storage system, including devices, controllers, servers, and communications links. Latency will be reduced by extensive caching throughout the storage hierarchy. Third, we will provide effective management of a storage hierarchy, extending the techniques already developed by Ousterhout for his Log Structured File Systems. Finally, we will construct a prototype high capacity file server, suitable for use on the National Research and Education Network (NREN). Such research must be a cornerstone of any coherent program in high performance computing and communications. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-91-3 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-91-04 ENTRY:: February 4, 1994 TITLE:: Sequoia 2000 Technical Report 91/4 DATE:: April 1991 AUTHOR:: Chen, Jolly AUTHOR:: Larson, Ray AUTHOR:: Stonebraker, Michael PAGES:: 11 ABSTRACT:: In this paper we explain the paradigm that we are following for Sequoia 2000 object browsers. It is intended to be a keyboard-free interface, and is based on the "move and zoom" paradigm popularized for Navy ships by SDMS [HERO80].\fR RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-91-04 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-91-05 ENTRY:: February 4, 1994 TITLE:: AN OVERVIEW OF THE SEQUOIA 2000 PROJECT DATE:: May 1991 AUTHOR:: Stonebraker, Michael PAGES:: 11 ABSTRACT:: Achieving the goals of the U.S. Global Change Research Program will depend not only on improved measurement systems, but also on improved data systems that will allow scientists to manipulate the resulting large-scale data sets and climate system models, as well as compare model results with observations. New modes of research, especially the synergistic interactions between observations and model-based simulations, will require massive amounts of diverse data to be stored, organized, accessed, distributed, visualized, and analyzed. Computer scientists and environmental researchers at several UC campuses are collaborating to address these challenges. Refinements in computing--specifically involving storage, networking, file systems, extensible data base management, and visualization--will be applied to specific Global Change applications. We have named this project Sequoia 2000, after the giant trees of the Sierra Nevada, the largest organisms on the Earth's land surface. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-91-05 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-91-06 ENTRY:: February 4, 1994 TITLE:: Network Issues for Sequoia 2000 DATE:: AUTHOR:: Pasquale, Joseph AUTHOR:: Polyzos, George C. PAGES:: 6 ABSTRACT:: The goals of the Sequoia 2000 Network are to provide high throughput for the massive observation input data and image output data characterizing Global Change applications, as well as real-time services for animations and collaboration tools such as video conferencing. The first phase of the network will be based on a T3 (45 Mb/s) backbone and FDDI for local distribution. The research issues we are focusing on include protocols that provide deterministic and statistical performance guarantees and take advantage of hierarchical coding of information, and the design of I/O system software that integrates process and device communication software with network protocol software. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-91-06 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-91-07 ENTRY:: February 4, 1994 TITLE:: Internet Throughput and Delay Measurements Between Sequoia 2000 Sites DATE:: December 1991 AUTHOR:: Pasquale, Joseph C. AUTHOR:: Polyzos, George C. AUTHOR:: Fall, Kevin R. AUTHOR:: Kompella, Vachaspathi P. PAGES:: 9 ABSTRACT:: We report performance measurements of Internet connections between five Sequoia 2000 sites. Throughput and delay statistics are presented for various message sizes and for both daytime and nighttime. The highest throughput observed was 85\ KB/s between UCSD and UCLA at night and the lowest was 1\ KB/s between UCSD and DWR during the day. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-91-07 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-91-08 ENTRY:: February 14, 1994 TITLE:: Early EOSDIS: A Data and Information System for the Study of Global Change DATE:: AUTHOR:: Dozier, Jeff PAGES:: 14 ABSTRACT:: The initial step in the development of the Earth Observing System Data and Information System (EOSDIS) is to deliver a working prototype system for use by the Earth science research community by July 1994. EOSDIS will be NASA's contribution to the confederation of national and international agency data systems to support global change research and other uses of environmental data [Dozier and Ramapriyan, 1991], including GEWEX, the Global Energy and Water Cycle Experiment. This prototype Version EOSDIS will not provide every feature that later systems will provide, so clear choices must be made as to priorities in its implementation. Technical readiness and affordability will determine how much capability the system will actually offer. EOSDIS Version 0 (VO) is a fusion of data holdings, data services, and research community infrastructure. All must advance to improve the use of data to support global change research. The science prioritites will guide the choice of data sets to emphasize and provide to the community and the levels of service to be encompassed within VO. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-91-08 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-91-10 ENTRY:: February 14, 1994 TITLE:: A Method for Refining Automatically-Discovered Lexical Relations: Combining Weak Techniques for Stronger Results DATE:: AUTHOR:: Grefenstette, Gregory AUTHOR:: Hearst, Marti A. PAGES:: 9 ABSTRACT:: Knowledge-poor corpus-based approaches to natural language processing are attractive in that they do not incur the diffiulties associated with complex knowledge bases and real-world inferences. However, these kinds of language pro- cessing techniques in isolation often of not suffice for a particular task; for this reason we are interested in finding ways to combine various techniques and improve their results. Accordingly, we conducted experiments to refine the results of an automatic lexical discovery technique by making use of a statistically-based syntactic similarity measure. The dis- covery program uses lexico-syntactic patterns to find instances of the hyponmy relation in large text bases. Once relations of this sort are found, they should be inserted into an existing lexicon or thesaurus. However, the terms in the relaiton may have multiple senses, thus hampering automtic placement. In order to address this problem we tried to make a term-similarity determination technique choose where, in an existing thesaurus, to install a lexical relation. The union of these two corpus- based methods is promising, although only partially successful in the experiments run so far. Here we report some prelimimary results, and make suggestions for how to improve the technique in future. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-91-10 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-92-11 ENTRY:: March 4, 1994 TITLE:: Abstracts: A Latency-Hiding Technique for High-Capacity Mass-Storage Systems DATE:: AUTHOR:: Fine, Joel A. : AUTHOR:: Anderson, Thomas E. AUTHOR:: Dahlin, Michael D. AUTHOR:: Frew, James AUTHOR:: Olson, Michael AUTHOR:: Patterson, David A. PAGES:: 24 ABSTRACT:: Extraordinary advances in digital storage technology are rapidly making possible cost-effective, multiple-terabyte information retrieval systems. The latency and bandwidth of these technologies are typically much worse than what users of computer systems are accustomed to. Unfortunately, traditional techniques of reducing latency and improving bandwidth, caching and compression, by themselves will not work well with the access patterns that we anticipate for these high-capacity systems. We introduce and define a new storage management technique, called abstracts. An abstract is an extraction of the "essential" part of the data set. It is created using some combination of averaging, subsetting, rounding, or some other method of condensing the data. An abstract's composition is heavily dependent on the context in which it is used. Each data set can have multiple abstracts associated with it, each of which can be used to answer a query from an abstract, effective bandwidth increases, because we transfer much less data through the storage system. The counter-intuitive result is that abstracts on robot-based tape storage systems can have lower latency than full data sets on magnetic disks, because the inherent latency disadvantage of tertiary systems can be overcome by the reduction in transfer time due to the smaller transfer size. Moreover, because many abstracts can fit in faster storage in the space occupied by a single unabstracted data set, users can get the effect of magnetic disk latencies for very large objects. To evaluate the potential of abstracts, we examine four common queries as well as a detailed case study. We also study teh statistical characteristics of several data sets in an effort to identify classes of abstracting functions. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-92-11 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-92-12 ENTRY:: February 18, 1994 TITLE:: The Sequoia 2000 Storage Benchmark DATE:: AUTHOR:: Stonebraker, Michael AUTHOR:: Frew, James AUTHOR:: Gardels, Kenn AUTHOR:: Meredith, Jeff PAGES:: 14 ABSTRACT:: This paper presents a benchmark that concisely captures the data base requirements of a collection of Earth Scientists working in the SEQUOIA 2000 project on various aspects of global change research. This benchmark has the novel characteristic that it uses real data sets and real queries that are representative of engineering and scientific DBMS users, we claim that this bench- mark represents the needs of this more general community. Also included in the paper are benchmark results for four example DBMSs, ARC-INFo, GRASS, IPW and POSTGRES. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-92-12 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-92-13 ENTRY:: February 18, 1994 TITLE:: Predicate Migration: Optimizing Queries with Expensive Predicates DATE:: December 3, 1992 AUTHOR:: Hellerstein, Joseph M. PAGES:: 22 ABSTRACT:: The traditional focus of relational query optimization schemes has been on the choice of join methods and join orders. Restrictions have typically been handled in query optimizers by "predicate pushdown" rules, which apply restrictions in some random order before as many joins as possible. These rules work under the assumption that restrictions is essentially a zerO- time operation. However, today's extensible and object-oriented database systems allow users to define time-consuming functions, which may be used in a query's restriction and join predicates. furthermore, SQL has long supported subquery predicates, which may be arbitrarily time-consuming to check. Thus restrictions should not be considered zero-time operations, and the model of query optimization must be enhanced. In this paper we develop a theory for moving expensive predicates in a query plan so that the total cost of the plan--including the costs of both joins and restrictions--is minimal. We present an algorithm to implement the theory, as well as results of our implementation in POSTGRES. Our experience with the newly enhanced POSTGRES are orders of magnitude faster than plans generated by a traditional query optimizer. The additional complexity of considering expensive predicates during optimization is found to be manageably small. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-92-13 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-92-14 ENTRY:: February 18, 1994 TITLE:: How Sequoia 2000 Addresses Issues in Data and Information Systems for Global Change DATE:: August 1992 AUTHOR:: Dozier, Jeff PAGES:: 16 ABSTRACT:: Sequoia 2000 is a project to design a next-generation information system for accessing, archiving, distributing, managing, and visualizing data for global change research. Funded by the Digital Equipment Corporation, it has investigators from computer science and Earth science departments on five campuses of the University of California. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-92-14 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-92-15 ENTRY:: February 18, 1994 TITLE:: Sequoia 2000 Network (S2Knet) Handbook DATE:: June 6, 1992 AUTHOR:: Pasquale, Joseph AUTHOR:: Fall, Kevin R. AUTHOR:: Forrest, Jon PAGES:: 14 ABSTRACT:: The construction of the Sequoia 2000 network (S2Knet) is a joint effort involving people from the University of California campuses of Berkeley, Los Angeles, San Diego, and Santa Barbara, the UC Office of the Presi- dent (UCOP), and the San Diego Supercomputer Center (SDSC). The purpose of this handbook is to serve as a reference, containing up-to-date information describing policy, topology, names, addresses, and routing. It also includes a list of contacts for network management. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-92-15 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-92-16 ENTRY:: February 18, 1994 TITLE:: Highlight: Using a Log-structured File System for Tertiary Storage Management DATE:: November 20, 1994 AUTHOR:: Kohl, John T. AUTHOR:: Staelin, Carl AUTHOR:: Stonebraker, Michael PAGES:: 15 ABSTRACT:: Robotic storage devices offer huge storage capacity at a low cost per byte, but with large access times. Integrating these devices into the storage hierarchy presents a challenge to file system designers. Log-structured file systems (LFSs) were developed to reduce latencies involved in accessing disk devices, but their sequential write patterns match well with tertiary storage characteristics. Unfortunately, existing versions only manage memory caches and disks, and do not support a broader storage hierarchy. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-92-16 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-92-17 ENTRY:: February 18, 1994 TITLE:: Exploiting In-Kernel Data Paths to Improve I/O Throughput and CPU Availability DATE:: November 1992 AUTHOR:: Pasquale, Joseph AUTHOR:: Fall, Kevin R. PAGES:: 11 ABSTRACT:: We present the motivation, design, implementation, and performance evaluation of a UNIX kernel mechanism capable of establishing fast in-kernel data pathways between I/O objects. A new system call, \fIsplice()\fP moves data asynchronously and without user-process intervention to and from I/O objects specified by file descriptors. Performance measurements indicate improved I/O throughput and increased CPU availability attributable to reduce context switch and data copying overhead. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-92-17 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-92-18 ENTRY:: February 18, 1994 TITLE:: A Performance Analysis of TCP/IP and UDP/IP Networking Software for the DECstation 5000 DATE:: AUTHOR:: Kay, Jonathan AUTHOR:: Pasquale, Joseph PAGES:: 21 ABSTRACT:: Modern workstations increasingly rely on distributed software such as NFS and NIS, yet the speed of networking software is not improving as rapidly as the workstation and networking hardware, leading to a network software bottleneck. We present detailed measure- ments of various components of the TCP/IP and UDP/IP protocol stack on a DECstation 5000/200 running Ultrix 4.2a. Measurements are by layer (i.e. socket, transport, IP, data-link) and by function (i.e. checksum computation, data copying, buffer management, protocol processing, and operating system interaction), with further breakdowns within each category. We show that checksum computation and data transfers dominate component times for a real LAN workload, that using large packet MTUs (maximum tranmission units) is very important to achieving high through- put, and that given the distribution of component times for small sized messages, it will be difficult to improve latency. TCP and UDP time breakdowns are shown to be quite similar, suggesting that "light- weight" transport protocols are not likely to greatly decrease processing time. Finally, analytical models for network software processing times are presented. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-92-18 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-92-19 ENTRY:: February 18, 1994 TITLE:: A Static Analysis of I/O Characteristics of a Broad Class of Scientific Applications DATE:: November 1992 AUTHOR:: Pasquale, Barbara K. AUTHOR:: Polyzos, George C. PAGES:: 14 ABSTRACT:: Past research on high performance computers for scientific applications has concentrated on CPU speed and exploitation of parallelism, but has, until very recently, neglected I/O considerations. This paper presents a study of the production workload at the San Diego Supercomputer Center from an I/O requirements and characteristics perspective. Results of our analyses support our hypothesis that a significant proportion of scientific applications with intensive I/O demands have predictable I/O requirements. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-92-19 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-92-20 ENTRY:: February 25, 1994 TITLE:: Tioga: Providing Data Management Support for Scientific Visualization Applications DATE:: AUTHOR:: Stonebraker, Michael AUTHOR:: Chen, Jolly AUTHOR:: Nathan, Nobuko AUTHOR:: Paxson, Caroline PAGES:: 20 ABSTRACT:: We present a user interface paradigm for database management systems motivated by scientific visualization applications. Our graphical user interface includes a "boxes and arrows" notation for database access and a flight simulator model of movement through information space. We also provide means to specify a hierarchy of abstracts of data of different types and resolutions. In addition, multiple portals on data may be related as master and slaves. The underlying DBMS support for this system includes the compilation of query plans into megaplans, new algorithms for data buffering, and provisions for a guaranteed rate of delivery. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-92-20 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-92-42 ENTRY:: April 27, 1994 TITLE:: Remote Sensing of Global Surface Shortwave Radiation and PAR Over the Ocean: a Sequoia Testbed DATE:: January l994 AUTHOR:: Gautier, Catherine AUTHOR:: Byers, Michael PAGES:: 20 ABSTRACT:: During the past few years many methods have been proposed for estimating surface radiative fluxes (shortwave radiation, Photosynthetically Active Radiation - PAR) from satellite observations. We have developed algorithms for computing the shortwave radiative flux (shortwave irradiance) at the ocean surface from visible radiance observations and they have been found to be quite successful under most atmospheric and cloud conditions. For broken clouds, however, the simple plane parallel assumption for solving the radiative transfer equations may need to be corrected to account for cloud geometry. The estimation of PAR is simpler because the most commonly used satellite radiance measurements cover a similar region of the solar spectrum. We are in the process of producing global llSW and PAR as a contribution to the Sequoia 2000 project (to implement a distributed processing system designed for the needs of global change researchers). Results from our algorithms developed for Sequoia and preliminary global surface solar irradiance and PAR fields will be presented and discussed. RETRIEVAL:: ocr (in all.ocr) RETRIEVAL:: tiff (in 001-020.tif) END:: UCB//S2K-92-42 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-91-09 ENTRY:: February 17, 1994 TITLE:: Automatic Acquisition of Hyponyms from Large Text Corpora DATE:: July 1992 AUTHOR:: Hearst, Marti A. PAGES:: 8 ABSTRACT:: We describe a method for the automatic acquisition of the hyponymy lexical relation from unrestricted text. Two goals motivate the approach: (i) avoidance of the need for pre-encoded knowledge and (ii) applicability across a wide range of text. We identify a set of lexico-syntactic patterns that are easily recognizable, that occur frequently and across text genre boundaries, and that indisputably indicate the lexical relation of interest. We describe a method for discovering these patterns and suggest that other lexical relations will also be acquirable in this way. A subset of the acquisition algorithm is implemented and the results are used to augment and critique the structure of a large hand-built thesaurus. Extensions and applications To areas such as information retrieval are suggested. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-91-09 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-21 ENTRY:: February 25, 1994 TITLE:: Measurement, Analysis, and Improvement of UDP/IP Throughput for the DECstation 5000 DATE:: AUTHOR:: Kay, Jonathan AUTHOR:: Pasquale, Joseph PAGES:: 11 ABSTRACT:: Networking software is a growing bottleneck in modern workstations, particularly for high throughput applications such as networked digital video. We measure various components of the UDP/IP protocol stack in a DEC- station 5000/200 running Ultrix 4.2a, and quantify the way in which checksumming and copying dominate the processing time for high throughput applications. This paper describes network software measurements and substantial performance improvements which derive from a faster checksum implementation. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-21 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-22 ENTRY:: February 25, 1994 TITLE:: Cases as Structured Indexes for Full-Length Documents DATE:: AUTHOR:: Hearst, Marti A. PAGES:: 6 ABSTRACT:: Two long, full-length texts are not likely to discuss all, or almost all, of the same subtropics or sub- points. Even if the documents contain many of the same terms the ways the terms are grouped to form subtopical disucssions sill might be quite different. A solution is to create a description of a document which lists all of its subtopical discussions as well as its main topics. An index that indicates this structure is an abstract representation of the document and we can think of this index as a case in the Case-Based Reasoning (CBR) sense. This paper proposes the use of cases to represent the high-level structure of full-length documents for the purpose of information retrieval. The cases are to be used both for assessing document similarity and for helping the user construct viable queries. The case can be transformed in various ways in order to make it more similar to the descriptions of other documents; these tranformations include generalizing, substituting, and emphasizing subtropic descrip- tions. An advantage of this approach is that the cases that represent the document are automatically generable. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-22 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-23 ENTRY:: February 25, 1994 TITLE:: The Sequoia 2000 Architecture and Implementation Strategy DATE:: AUTHOR:: Stonebraker, Michael AUTHOR:: Frew, James AUTHOR:: Dozier, Jeff PAGES:: 47 ABSTRACT:: This paper describes the Sequoia 2000 software architecture and its current implementations, including layers for Footprint, the file system, the DBMS, application, and the network. Early prototype applications of this software include a Global Change data schema, GCM integration, remote sensing, a data system for climate studies, and operational uses by the DWR. Longer-range efforts include transfer protocols for moving elements of the database, controllers for secondary and tertiary storage, distributed file system, and a distributed DBMS. The implementation plan ensures that the current architecture is stabilized and robust by the end of 1993. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-23 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-24 ENTRY:: February 25, 1994 TITLE:: TextTiling: A Quantitative Approach to Discourse Segmentation DATE:: AUTHOR:: Hearst, Marti A. PAGES:: 10 ABSTRACT:: This paper represents TextTiling, a method for partitioning full-length text documents into coherent multiparagraph units. The layout of text tiles is meant to reflect the pattern of subtropics contained in an expository text. The approach uses lexical analyses based on tfidf, and information retrieval measurement, to determine the extent of the tiles, incorporating thesaural information via a statistical disambiguation algorithm. The tiles have been found to correspond will to human judgements of the major subtopic boundaries of science magazine articles. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-24 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-25 ENTRY:: February 28, 1994 TITLE:: DARWIN: On the Incremental Migration of Legacy Information Systems DATE:: AUTHOR:: Brodie, Michael L. AUTHOR:: Stonebraker, Michael PAGES:: 32 ABSTRACT:: As is painfully evident today, the deterioration of the transportation, education, and other national infra- structures negatively impacts many aspects of life, business, and our economy. This has resulted, in part, when responses to short term crises discourage investing in infrastructure enhancement and when there are no effective information system (IS) insfrastructure that has strong negative impacts on ISs, on the organizations they support, and, ulimately, on the economy. This paper addresses the problem of legacy IS migration by methods that mediate between spectrum of supporting methods for migrating legacy ISs into a target environment that includes rightsized hardware and modern technologies (i.e., infrastructure) wuch as client-server architecture, DBMSs and CASE. We illustrate the methods with two migration case studies of multi-million dollar, mission critical legacy ISs. The contribution of this paper is a highly flexible set of migration methods that is tailorable to most legacy ISs and business contexts. The goal is to support continuous, iterative evolution. The critical success factor, and challenge in deployment, is to identify appropriate portions of the IS and the associated planning and management to achieve an incremental migration that is feasible with respect to the technical and business require- ments. The paper concludes with a list of desirable migration tools for which basic research is required. The principles described in this paper can be used to design future ISs and an infrastructure that will support continuous IS evolution to avoid future legacy ISs. RETRIEVAL:: ocr (in all.ocr) RETRIEVAL:: tiff (in {001-032}.tif) END:: UCB//S2K-93-25 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-26 ENTRY:: February 25, 1994 TITLE:: Subtopic Structuring for Full-Length Document Access DATE:: AUTHOR:: Hearst, Marti A. AUTHOR:: Plaunt, Christian PAGES:: 10 ABSTRACT:: We argue that the advent of large volumes of full-length text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Toward this end, we discuss the merits of imposing structure on full- length text documents; that is, a partition of the text into coherent multi-paragraph units that represent the pattern of sub- topics that comprise the text. Using this structure, we can make a distinction between the main topics, which occur throughout the length of the text, and the subtopics, which are of only limited extent. We discuss why recognition of subtopic structure is important and how, to some degree of accuracy, it can be found. We describe a new way of specifying queries on full-length documents and then describe an experiment in which making use of the recognition of local structure achieves better results on a typical information retrieval task than does a standard 1R measure. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-26 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-27 ENTRY:: February 25, 1994 TITLE:: A Simple Visualization Management System: Bridging the Gap Between Visualization and Data Management DATE:: April 30, 1993 AUTHOR:: Kochevar, Peter AUTHOR:: Ahmed, Zahid AUTHOR:: Shade, Jonathan AUTHOR:: Sharp, Colin PAGES:: 16 ABSTRACT:: A prototype visualization management system is described which merges the capabilities of a database manage- ment system with any number of exising visualization packages such as AVS or IDL. The protoype uses the Postgres database management system to store and access Earth science data through a simple graphical browser. Data located in the database is visualized by automatically invoking a desired visualization package and downloading an appropriate script or program. The central idea underlying the system is that information on how to visualize a dataset is stored in the database with the dataset itself. As a result, scientists can concentrate more on their science rather than on the process of doing it since visualization programs do not have to be created or searched for each time a dataset is to be viewed. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-27 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-28 ENTRY:: February 28, 1994 TITLE:: The Design and Implemention of the Inversion File System DATE:: April 1993 AUTHOR:: Olson, Michael A. PAGES:: 34 ABSTRACT:: This paper describes the design, implementation, and performance of the Inversion file system. Inversion provides a rich set of services to file system users, and manages a large tertiary data store. Inversion is built on top of the POSTGRES database system, and takes advantage of low-level DBMS services to provide transaction protection, fine-grained time travel, and fast crash recovery for user files and file system metadata. Inversion gets between 30% and 80% of the throughput of ULTRIX NFS backed by a non- volatile RAM cache. In addition, Inversion allows users to provide code for execution directly in the file system manager, yielding performance as much as seven times better than that of ULTRIX NFS. RETRIEVAL:: ocr (in all.ocr) RETRIEVAL:: tiff (in {001-034}.tif) END:: UCB//S2K-93-28 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-29 ENTRY:: February 25, 1994 TITLE:: Tioga: Providing Data Management Support for Scientific Visualization Applications DATE:: AUTHOR:: Stonebraker, Michael AUTHOR:: Chen, Jolly AUTHOR:: Nathan, Nobuko AUTHOR:: Paxson, Caroline AUTHOR:: Wu, Jiang PAGES:: 14 ABSTRACT:: We present a user interface paradigm for database management systems that is motivated by scientific visualization applications. Our graphical user interface includes a "boxes and arrows" notation for database access and a flight simulator model of movement through information space. We also provide means to specify a hierarchy of abstracts of data of different types and resolutions, so that a "zoom" capability can be supported. The underlying DBMS support for this system is described and includes the compilation of query plans into megaplans, new algorithms for data buffering, and provisions for a guaranteed rate of data delivery. The current state of the Tioga implementation is also described. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-29 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-30 ENTRY:: February 25, 1994 TITLE:: Large Object Support in POSTGRES DATE:: AUTHOR:: Stonebraker, Michael AUTHOR:: Olson, Michael PAGES:: 8 ABSTRACT:: This paper presents four implementations for support of large objects in POSTGRES. The four implementations offer varying levels of support for user-defined storage managers available in POSTGRES is also detailed. The performance of all four large object implementations on two different storage devices is presented. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-30 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-31 ENTRY:: February 25, 1994 TITLE:: Mariposa: A New Architecture for Distributed Data DATE:: AUTHOR:: Stonebraker, Michael AUTHOR:: Aoki, Paul M. AUTHOR:: Devine, Robert PAGES:: 17 ABSTRACT:: We describe the design of Mariposa, an experimental distributed data management system that provides high performance in an environment of high data mobility and heterogeneous host capabilities. The Mariposa provides a general, flexible platform for the development of new algorithms for distributed query optimization, storage management, and scalable data storage structures. This flexibility is primarily due to a unique rule- based design that permits autonomous, local-knowledge decisions to be made regarding data placement, query execution location, and storage management. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-31 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-32 ENTRY:: February 25, 1994 TITLE:: Efficient Organization of Large Multidimensional Array DATE:: AUTHOR:: Sarawagi, Sunita AUTHOR:: Stonebraker, Michael PAGES:: 17 ABSTRACT:: Large multidimensional arrays are widely used in scientific and engineering database applications. In this paper, we present methods of organizing arrays to make their access on secondary and tertiary memory devices fast and efficient. We have developed four techniques for doing this: (1) storing the array in multidimensional "chunks" to minimize the number of blocks fetched, (2) reordering the chunked array to minimize seek distance between accessed blocks, (3) maintaining redundant copies of the array, each organized for a different chunk size and ordering and (4) partitioning the array onto platters of a tertiary memory device so as to minimize the number of platter switches. Our measurments on real data sets obtained from global change scientists demonstrate that accesses on arrays organized using the above techniques are often an order of magnitude faster than on the original unopimized data. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-32 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-33 ENTRY:: February 25, 1994 TITLE:: Origins of Multi-Sector Scientific Collaboration: A Report on Research in Progress DATE:: February 1992 AUTHOR:: Weedman, Judith PAGES:: 17 ABSTRACT:: Sequoia 2000 is a research initiative funded by the Digital Equipment Corporation to develop large capacity object servers to support global change research. Existing hardware, software, network technology, and visualization techniques are inadequate to the task of handling the terabytes of data which global change researchers need to access and manipulate. The purpose of Sequoia 2000 is to develop the needed technology and to create an electronic repository in which researchers' data sets, programs, documents, and simulation outputs can be stored and made available to multiple users. Sequoia 2000 is a multidisciplinary, multi-campus, multi-agency project; researchers are from the fields of computer science, information, and global change, and are located in private industry, universities, and state and federal agencies. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-33 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-34 ENTRY:: February 25, 1994 TITLE:: A Simple Research Paradigm in the Context of the Sequoia 2000 Project and its Application to an Ocean-Atmosphere Interaction Study DATE:: June 1993 AUTHOR:: Waliser, D. E. AUTHOR:: Mechoso, C. R AUTHOR:: Gautier, Catherine AUTHOR:: Neelin, J. D. PAGES:: 9 ABSTRACT:: This paper presents an application of a common research paradigm that can help enhance and facilitate the conceptual interaction between the research goals of climate and global change researchers and the design and implementation goals of the computer scientists and engineers. As presented, the paradigm fits neatly into the Sequoia 2000 architecture, and can be applied across scientific disciplines and to all levels of scientific research, from the program level to the detailed analysis level. Each of these aspects are discusssed and example applications of the paradigm to the area of ocean- atmosphere interactions are given, including a detailed application to the analysis of evaporative heat flux parameterizations for ocean general circulation models. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-34 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-35 ENTRY:: February 25, 1994 TITLE:: A Visualization Architecture for the Sequoia 2000 Project DATE:: AUTHOR:: Kochevar, Peter AUTHOR:: Ahmed, Zahid AUTHOR:: Wanger, Len AUTHOR:: Shade, Colin AUTHOR:: Sharp, Jonathan PAGES:: 20 ABSTRACT:: An architecture is described for the Tioga Visualization Management System which is under development as part of the Sequoia 2000 Project. This system brings together the capabilities of a database management system a scientific visualization system, and a graphical user-interface builder. The paper concentrates on the front-end of Tioga which is interactive visualizations of data that reside in a database management system. The Visualization Executive achieves this goal by mixing techniques from knowledge-based systems with those of scientific visualization and user-interface design. The intent is to free scientists as much as possible from having to deal with the process of doing science so that they can concentrate on the science itself. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-35 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-36 ENTRY:: February 25, 1994 TITLE:: DATE:: AUTHOR:: AUTHOR:: AUTHOR:: AUTHOR:: PAGES:: ABSTRACT:: RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-36 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-38 ENTRY:: March 10, 1994 TITLE:: Extending a Graphical Query Language to Support updates, Foreign Systems, and Transactions DATE:: AUTHOR:: Chen, Jolly AUTHOR:: Aiken, Alexander AUTHOR:: Nathan, Nobuko AUTHOR:: Paxson, Caroline AUTHOR:: Stonebraker, Michael AUTHOR:: Wu, Jiang PAGES:: 14 ABSTRACT:: In [STON93] we proposed a new user interface paradigm called Tioga for interacting with database manage- ment systems. Tioga simplifies the task of building database applications and is geared especially towards the needs of scientific users. We borrow the "boxes and arrows" visual programming notation of scientific visualization systems and allow users to graphically construct applications by using database procedures as building blocks. This paper extends the Tioga paradigm to a general database programming environment. In particular, we address three shortcomings of graphical query languages. First, we define a mechanism for allowing general programs--not just database procedures--as building blocks. This extension allows better handling of general data entry and data visualization needs and provides an interface to foreign systems. Second, we permit database updates. Third, we define a transaction semantics for graphical query languages. Unlike traditional transactions, Tioga transactions contain a directed graph of queries instead of a linear sequence of queries. We explore concurrency control techniques to promote both intra-transaction and inter-transaction parallelism. Finally, we present query processing strategies for graphical queries with general building blocks, updates, and transactions. We show how to efficiently execute a Tioga application by de- composing the application into components that are individually optimized. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-38 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-94-40 ENTRY:: February 25, 1994 TITLE:: Database Management for Data Visualization DATE:: AUTHOR:: Kochevar, Peter PAGES:: 12 ABSTRACT:: Visualization management systems which integrate database management, data visualization, and graphical user-interface generation into one package are becoming essential tools for conducting science. Unfortunately, the link between data management and data visualization is not very well understood. To make matters worse, most database management systems available today are not well- suited for handling large, time-sequenced data sets that are common to many scientific disciplines. The reason for this shortfall is that most database systems do not use an appropriate data model, are not geared toward real-time operation, and they have inadequate user-interfaces. Hints as to how database systems can fix these problems are given so that effective visualization management systems can be constructed. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-94-40 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-94-41 ENTRY:: March 4, 1994 TITLE:: GIPSY: Georeferenced Information Processing SYstem DATE:: March 25, 1994 AUTHOR:: Woodruff, Allison Gyle AUTHOR:: Plaunt, Christian PAGES:: 24 ABSTRACT:: In this paper we present an algorithm which automatically extracts geopostitional coordinate index terms from text to support georeferenced document indexing and retrieval. Under this algorithm, words and phrases containing geographic place names or characteristics are extracted from a text document and used as input to database functions which use spatial reasoning to approximate stat- istically the geoposition being referenced in the text. WE conclude with a discussion of preliminary results and future work. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-94-41 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-94-42 ENTRY:: March 4, 1994 TITLE:: Remote Sensing of Global Surface Shortwave Radiation and PAR Over the Ocean: a Sequoia Testbed DATE:: January 1994 AUTHOR:: Gautier, Catherine AUTHOR:: Byers, Michael PAGES:: 20 ABSTRACT:: During the past few years many methods have been proposed for estimating surface radiative fluxes (shortwave irradiance) at the ocean surface from visible radiance observations and they have been found to be quite successful under most atmospheric and cloud conditions. For broken clouds, however, the simple plane parallel assumption for solving the radiative transfer equations may need to be corrected to account for cloud geometry. The estimation of PAR is simpler because the most commonly used satellite radiance measurements cover a similar region of the solar spectrum. RETRIEVAL:: tiff (in {001-020}.tif) RETRIEVAL:: ocr (in all.ocr) END:: UCB//S2K-94-42 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-93-44 ENTRY:: February 25, 1994 TITLE:: HERMES A Prototype Distributed Application Management System DATE:: October 22, 1993 AUTHOR:: Hanyzewski, G. A. AUTHOR:: Spahr, J. AUTHOR:: Mechoso, C. R. AUTHOR:: Moore, R. W. PAGES:: 9 ABSTRACT:: This paper presents a prototype system for on-line distributed application management. The system allows researchers to determine the exact state of remotely executing applications by allowing interactive acquisition and visualization of output datasets as they are calculated. This paper discusses the general design as well as implementation details, case studies, and future directions for development. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-93-44 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-94-45 ENTRY:: March 10, 1994 TITLE:: Single Query Opimization for Tertiary Memory DATE:: December 1993 AUTHOR:: Sarawagi, Sunita AUTHOR:: Stonebraker, Michael PAGES:: 12 ABSTRACT:: We present query execution strategies that are optimized for the characteristics of tertiary memory devices. Traditional query execution methods are oriented to magnetic disk or main memory and perform poorly on tertiary memory. Our methods use ordering and batching techniques on the I/O requests to reduce the media switch cost and seek cost on these devices. Some of our methods are provably optimal and others are shown to be superior by simulation and cost formula analysis. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-94-45 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-94-46 ENTRY:: April 29, 1994 TITLE:: RP: A Family of Order Preserving Scalable Distributed Data Structures DATE:: AUTHOR:: Litwin, W. AUTHOR:: Neimat, M-A. AUTHOR:: Schneider, D. PAGES:: 19 ABSTRACT:: Hash-based scalable distributed data structures (SDDSs), like LH* or DDH, for networks of interconnected computers (multi- computers) were shown to open new perspectives for file management. We propose a family of ordered SDDSs, called RP*, providing for ordered and dynamic files on multicomputers, and thus for more efficient processing of range queries and of ordered traversals of files. The basic algorithm termed RP* N, builds the file with the same key space range partitioning as a B-tree, but avoids indexes through the use of multicast. The algorithms, RP*c and RP*s enhance the throughput for faster networks, adding the indexes on clients, or on clients and servers, decreasing or avoiding the multicast. RP files are shown highly efficient with access performance exceeding traditional files by an order of magnitude or two, and, for non- range queries, very close to LH*. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-94-46 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-94-47 ENTRY:: April 29, 1994 TITLE:: A Hydrographic Database built on Montage and S-PLUS DATE:: March 1994 AUTHOR:: Farrell, W. E. AUTHOR:: Gaffney, J. AUTHOR:: Given, J. AUTHOR:: Jenkins, R. D. AUTHOR:: Hall, N. PAGES:: 19 ABSTRACT:: (none) RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-94-46 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-94-48 ENTRY:: April 29, 1994 TITLE:: Zooming and Tunneling in Tioga: Supporting Navigation in Multi- dimensional Space DATE:: March 1994 AUTHOR:: Woodruff, Allison AUTHOR:: Wisnovsky, Peter AUTHOR:: Taylor, Cimarron AUTHOR:: Stonebraker, Michael AUTHOR:: Paxson, Caroline AUTHOR:: Chen, Jolly AUTHOR:: Aiken, Alexander PAGES:: 8 ABSTRACT:: In [STON93] we proposed a visual programming system called Tioga. The Tioga system applies a boxes and arrows programming notation to allow nonexpert users to graphically construct database applications. Users connect database procedures using a dataflow model. Browsers are used to visualize the resulting data. This paper describes extensions to the Tioga browser protocol. These extensions allow sophisticated, flight- simulator navigation through a multidimensional data space. This design also incorporates wormholes to allow tunneling between different multidimensional spaces. Wormholes are shown to be substantial generalizations of hyperlinks in a hypertext system. These powerful mechanisms for relating data provide users with great flexibility. For example, users can create magnifying glasses that provide an enhanced view of the underlying data. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-94-48 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-94-49 ENTRY:: April 29, 1994 TITLE:: An Economic Paradigm for Query Processing and Data Migration in Mariposa DATE:: AUTHOR:: Stonebraker, Michael AUTHOR:: Devine, Robert AUTHOR:: Kornacker, Marcel AUTHOR:: Litwin, Witold AUTHOR:: Pfeffer, Avi AUTHOR:: Sah, Adam AUTHOR:: Staelin, Carl PAGES:: 24 ABSTRACT:: In this paper we explore query execution and storage management issues for Mariposa, a distributed data base system under construction at Berkeley. Because of the extreme complexity of both issues, we have adopted an under- lying economic paradigm for both problems. Hence, queries receive a budget which they spend to obtain their answers, and each processing site attempts to maximize income by buying and selling storage objects and processing queries for locally stored objects. This paper presents the protocols which underlie this economic system. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-94-49 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-94-50 ENTRY:: April 29, 1994 TITLE:: Design and Implementation of DDH: A Distributed Dynamic Hashing Algorithm DATE:: AUTHOR:: Devine, Robert PAGES:: 14 ABSTRACT:: DDH extends the idea of dynamic hashing algorithms to distributed systems. DDH spreads data across multiple servers in a network using a novel autonomous location discovery algorithm that learns the bucket locations instead of using a centralized directory. We describe the design and implementation of the basic DDH algorithm using networked computers. Performance results show that the prototype of DDH hashing is roughly equivalent to conventional single-node hashing implementations when compared with CPU time or elapsed time. Finally, possible improvements are suggested to the basic DDH algo- rithm for increased reliability and robustness. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-94-50 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-94-51 ENTRY:: April 29, 1994 TITLE:: The Sequoia 2000 Showcase An S2K Technical Report DATE:: March 20, 1994 AUTHOR:: Norris, C. L. AUTHOR:: Chen, S.-C. AUTHOR:: Roads, J. O. PAGES:: 6 ABSTRACT:: Environmental investigators view space and time differently, mainly through emphasis on different environmental variables and data sets. IN fact, because it has been so dif- ficult to fully develop any data set, many investigators have spent their lifetimes emphasizing a single data set. Different data sets and variables are sometimes comparedf with each other in review articles or in modeling studies, but truly comprehensive comparisons await development of proposed S2K-like database systems. In these databases, we will be able to go to a generic environmental data base and extract all manner of relevant data sets and environ- mental variables which will then be merged and output with a set of graphics packages under the control of the person doing the merge. These integrated views will undoubtedly give us new insight into how our world works. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-94-51 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-94-52 ENTRY:: April 29, 1994 TITLE:: An Intelligent Assistant for Creating Data Flow Visualization Networks DATE:: March 20, 1994 AUTHOR:: Kochevar, Peter AUTHOR:: Ahmed, Zahid PAGES:: 12 ABSTRACT:: Non-visualization experts, including most scientists, find visualization systems like AVS too difficult to use. One approach to assisting these end-users in doing interactive visualization is to embed the knowledge of visualization experts into an intelligent system. A prototype, called Tecate, of such a system has been devel- oped as part of the Sequoia 2000 Project. In this system, a Planner makes use of expert knowledge stored in a Knowledge Base to create data-flow visualization programs. The Planner takes as input a description of the data to be visualized and an indication of the data analysis goals of an end-user. From this information, an AVS network script is produced that when executed, builds an appropriate visualization of the indicated data set. The networks so produced make use of both a restricted set of standard AVS modules and a collection of custom ones which operate on data structured as fiber bundles. RETRIEVAL:: postscript (in all.ps) END:: UCB//S2K-94-52 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//ERL-94-53 ENTRY:: May 16, 1994 TITLE:: Experiments with the Tenet Real-Time Protocol Suite on the Sequoia 2000 Wide Area Network AUTHOR:: Banjerea, Anindo AUTHOR:: Knightly, Edward W. AUTHOR:: Templin, Fred L. AUTHOR:: Zhang, Hui PAGES:: 17 ABSTRACT:: Emerging distributed multimedia applications have stringent performance requirements in terms of bandwidth, delay, delay-jitter, and loss rate. The Tenet real-time protocol suite provides the services and mechanisms for delivering such performance guarantees, even during periods of high network load and congestion. The protocols achieve this by using resource management, connection admission control, and appropriate packet service disciplines inside the network. The Sequoia 2000 network employs the Tenet Protocol Suite at each of its hosts and routers making it one of the first wide area packet-switched networks to provide end-to-end per-connection performance guarantees. This paper presents experiments of the Tenet protocols on the Sequoia 2000 network including measurments of the performance of the protocols, the service received by real multimedia applications using the protocols, and comparisons with the service received by applications that use the Internet protocols (UDP/IP). We conclude that the Tenet protocols successfully protect the real-time channels from other traffic in the network, including other real-time channels, and continue to meet the performance guarantees, even when the network is highly loaded. RETRIEVAL:: postscript (in all.ps) END:: UCB//ERL-94-53 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-94-56 ENTRY:: January 4, 1995 DATE:: December 1994 TITLE:: High-Concurrency Locking in R-Trees* AUTHOR:: Banks, Douglas AUTHOR:: Kornacker, Marcel AUTHOR:: Stonebraker, Michael PAGES:: 15 ABSTRACT:: In this paper we present a solution to the problem of concurrent operations in R-trees, a dynamic access structure capable of storing multidimensional and spatial data. We describe the R-link tree, a variant of the R-tree that adds sibling pointers to nodes, a technique first deployed in B-link tree, a variant of the R-tree that adds sibling pointers to nodes, a technique first deployed in B-link trees, to compensatyee for concurrent structure modifications. The main obstacle to the use of sibling pointers is the lack of linear ordering among the keys in an R-tree; we overcome this by assigning sequence numbers to nodes that let us reconstruct the "lineage" of a node at any point in time. The search, insertion and deletion algorithms for R-link trees are designed to lock at most two nodes at a time and the locking can be shown to be deadlock-free. In addition, we describe how R-link trees can be made recoverable so that they are instantly available after a crash and we further describe how to achieve degree 3 consistency with an inexpensive predicate locking mechanism. RETRIEVAL:: postscript (in s2k-94-56.ps) END:: UCB//S2K-94-56 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-94-57 ENTRY:: December 10, 1994 DATE:: September 1994 TITLE:: Vision for Sequoia 2000 Phase II: The Sequoia Computational Infrastructure AUTHOR:: Pasquale, Joseph AUTHOR:: Katz, Randy AUTHOR:: Dozier, Jeff PAGES:: 4 ABSTRACT:: The Sequoia 2000 project, a collaboration of computer scientists and Earth Scientists at the University of California, is beginning its second 3-year phase. The major goal for Phase II is to support power harnessing, the ability to dynamically concentrate as much of the cumulative resource power in a wide-area distributed system to meet the demands of any single application. The project will focus on creating an underlying hardware and software infrastructure that supports power harnessing, and above which software systems targeted to support Earth science applications can be built. As in Phae I, the project will seek to develop partnerships, both financial and intellectual, with the university, industry, and the state and federal government. RETRIEVAL:: postscript (in s2k-94-57.ps) END:: UCB//S2K-94-57 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-94-58 ENTRY:: December 10, 1994 DATE:: September 1994 TITLE:: Sequoia 2000- A Reflection on the First Three Years AUTHOR:: Stonebraker, Michael PAGES:: 9 ABSTRACT:: This paper describes the SEQUOIA 2000 project and its implementation efforts during the first three years. Included are the objectives we had, how we chose to address them and some of the lessons we learned from this endeavor. RETRIEVAL:: postscript (in s2k-94-58.ps) END:: UCB//S2K-94-58 BIB-VERSION:: CS-TR-v2.0 ID:: UCB//S2K-94-59 ENTRY:: December 10, 1994 TITLE:: Sequoia 2000 Metadata Schema for Satellite Images AUTHOR:: Anderson, Jean T. AUTHOR:: Stonebraker, Michael PAGES:: 7 ABSTRACT:: Sequoia 2000 schema development is based on emerging geospatial standards to accelerate development and facilitate data exchange. This paper focuses on the metadata schema for digital satellite images. We examine how satellite metadata are defined, used, and maintained. We discuss the geospatial standards we are using, and describe a SQL prototype that is based on the Spatial Archive and Interchange Format (SAIF) standard and implemented in the illustra object-relational database. RETRIEVAL:: postscript (in s2k-94-59.ps) END:: UCB//S2K-94-59