Section 4: Hardware Architecture
This section proposes a feasible hardware design for the 2 superDAACs
and the N peerDAACs. We first explore the design requirements
that a superDAAC must meet. These requirements include storing
the raw data feed from the EDOS, constructing a collection of
standard products (eager evaluation), executing a set of ad
hoc queries (lazy evaluation), and supporting batch processing
by reading the entire store sequentially in a reasonable period
of time. These design requirements are discussed in detail in
Section 4.1.
In Section 4.2, we indicate a generic architecture composed of
4 componentsó tape silos, disk caches, computation engines,
and DBMS enginesówhich we believe can satisfy the design
requirements. This generic architecture is then specialized to
real-world COTS hardware components to produce 6 candidate systems
that can be built in 1994. Because covering the details of the
component selection, ensuring that there are no bottlenecks, and
satisfying the design requirements are all complicated tasks,
we place these discussions in Appendix 6 for the interested specialist.
In Section 4.3, we present 1994 cost estimates for 2 of these
systems, one based primarily on supercomputer technology and one
based on a network of workstations. A system can be produced
costing between $259M - $407M in 1994 dollars for 2 superDAACs.
(In Section 5, we use these numbers, discounted by expected technology
advances over time, to determine the hardware cost of a phased,
just-in-time deployment schedule that will meet EOSDIS objectives.)
Section 4.4 turns to the 1994 cost of peerDAAC hardware. Again,
these numbers will drive a just-in-time deployment schedule in
Section 5.
4.1 Design Criteria for the 2 SuperDAACs
In this section, we indicate our major assumptions to drive our
superDAAC hardware sizing:
- We assume only online and nearline storage, consistent with
our desire to minimize operation cost by automating data retrieval.
- Each superDAAC will store the entire EDOS raw data feed.
- Each superDAAC will eagerly evaluate ½ of the current
level 1 and 2 standard products. Between them, they thereby evaluate
all level 1 and 2 products. Although it is difficult to predict
the exact amount of eager evaluation, this is a significant and
plausible amount.
- We assume a batch processing system at each superDAAC. Each
must read an appropriate 50% of its archive once a month sequentially.
- At each superDAAC, we assume that 4% of the data read in a
batch processing run qualifies for some batched user query. Of
this 4%, we assume that ½ of its resulting computations must
be stored at the superDAAC.
- Some level 1 and 2 standard products will require reprocessing
when their algorithm changes. We assume that reprocessing requests
form part of the batch query queue. Therefore, we assume part
of the 4% mentioned above is reprocessing requests. Specifically,
we assume 2% of the total raw data may be turned into new standard
products monthly. We assume that ½ of the new products are
stored at a superDAAC and ½ at a peerDAAC.
- We assume that lazy evaluation of queries accesses, in aggregate,
0.9% of the entire database each week at each superDAAC. This
value is chosen to provide a load comparable to the base push-processing
load and can be adjusted up or down as conditions warrant. Of
this 0.9%, 1/20 (5%) requires processing with 1/3 computed locally
at the superDAAC and 2/3 at a peerDAAC.
- We assume that DBMS metadata and metadata indices are stored
permanently in a disk cache. Metadata is assumed to be 0.1% of
the size of the total database. Two copies of the metadata are
stored at each site for redundancy. Metadata indices are assumed
to be 10% of the size of the metadata.
- A database log of all database actions is maintained on tape.
The size of the log is assumed to be a fixed fraction of the
archive size. For this study, we assume that 4 bytes of log entry
are created for every 64 KB that are stored.
- We assume that disk cache must provide storage for 1% of the
data in the database or, conversely, storage for 1 week of standard
product generation.
- We require the tape drives to sustain streaming of 50% of
the archive past DBMS data-subsetting engines each month and picking
of individual 100-MB data sets at the rate of 1 data set every
2 minutes per drive.
- The superDAAC is sized to perform the above functions for
data received through the year 1999. After that, the system must
be scaled by 2 PB per year in years 2000 through 2002.
Table 4-1 summarizes this set of design criteria, using the current
launch schedule and definition of level 1 and 2 standard products.
Table 4-1: SuperDAAC Sizing Assumptions
Item
| Characteristic
| Detail
| Comments
|
Tape archive | Size
| 2 PB |
Size of raw feed plus level 1 and 2 products for 3 years, plus some spare capacity
|
| Sustained data-streaming rate
| Re-read 50% of archive each month
| Design requirement |
| Access rate
| 366 MB/s
| Design requirement |
| Sustained data picking rate
| 0.5 accesses per minute
| Design requirement |
| Database log
| 0.006% of archive
| 4 bytes per 64K of data is .0006%
|
Disk cache | Sizeóstandard products
| 1% of archive size
| Design requirement |
| Sizeómetadata and copy
| 0.1% of archive size
| Design requirement |
Computationóeager
| % eager processing
| 50% of raw data
| Each superDAAC does 50% of eager evaluation
|
Raw feed | Peak rate
| 3.3 MB/s
| From HAIS design documents
|
| Size after 3 years
| 180 TB
| From HAIS design documents
|
Level 1-2 products |
Peak rate |
15 MB/s |
From HAIS design documents, with 50% eager evaluation
|
| Size after 3 years
| 1350 TB
| From HAIS design documents
|
| Eager computation rate
| 7.5 Gflops
| Each superDAAC does 50% of the computation listed in the HAIS design document
|
Computationólazy
| % lazy access
| 0.9% of archive per week
| Design requirement |
| Peak access rate
| 30 MB/s
| Design requirement |
| % lazy processing
| 5% of accessed data per week
| Design requirement |
| Peak data subset rate
| 1.5 MB/s
| Design requirement |
| Lazy computation rate
| 5 Gflops
| The superDAAC does 1/3 of the computation; the peerDAAC does the other 2/3
|
Computationóreprocessing
| % of raw data processed
| 2% of total raw data each month
| Design requirement |
| Peak data subset rate
| 30 MB/s
| Design requirement |
| Peak computation feed rate
| 1.5 MB/s
| Design requirement |
| Reprocessing computation rate
| 7.5 Gflops
| Design requirement |
The resulting design for the superDAAC is:
- 2 PB of archive data stored in tape robots.
- 20 TB of disk cache to support interactive retrieval of standard
products.
- 4 TB of metadata disk storage (2 copies of the 2-TB
metadata).
- 280 GB of real memory (RAM cache) to provide metadata hot
index memory cache and data computation memory cache.
- 20 Gflops sustained execution rate.
- 100 MB/s local area network within the superDAAC.
- Two OC-3 links between the superDAACs.
- T3 or faster WAN links to a certain number of peerDAACs.
- 30-day time to read the entire data archive.
4.2 SuperDAAC Hardware Architecture
The basic hardware components of the superDAACs are illustrated
in Figure 41. They consist of a data multiplexing platform
to support the raw feed, a database platform, a request queueing
platform, and compute platforms. A high-speed network links the
platforms.
The superDAACs support 4 data flows:
- Raw feed storage and standard product generation. Each superDAAC
calculates ½ of the standard products.
- Lazy evaluation of requests that access 0.9% of the archive
each week.
- Stream processing of the archive each month. Each superDAAC
reads ½ of the archive and is able to reprocess up to 2%
of the total raw data each month.
- Imaging of the data archived at the other superDAAC so each
superDAAC is a mirror of the other.
Figure
4-1: SuperDAAC Hardware Platforms
Figure 4-2 shows the associated data flows for eager processing
of the raw feed. The input data stream contains both the raw
feed and the data that is being backed up from the other superDAAC.
The compute platforms process only the raw feed data and store
half of the results in the data archive. These data are backed
up to the other superDAAC.
Figure
4-2: Eager Processing Data Flow
Figure 4-3 shows the associated data flows for processing interactive
queries to the database. Data is pulled from the archive in 100-MB
chunks. The appropriate data subset is generated. The processing
is then split between the superDAAC and the peerDAACs, with 1/3
of the processing being done at the superDAAC.
Figure
4-3: Lazy Processing Data Flow
Figure 4-4 shows the associated data flows for the data streaming
system that supports queued requests. Half of the data archive
is streamed through the data-subsetting platform at each superDAAC.
A data subset is generated that is equal in size to 2% of the
accumulated raw feed. This, in turn, is processed to create new
products. Half of the resulting data is stored back in the archive.
Figure
4-4: Streaming Data Flow
A detailed design of superDAAC hardware that specialized this
architecture to 6 collections of 1994 COTS hardware components
is discussed in Appendix 6.
4.3 SuperDAAC Cost Analysis
In this section, we present 2 example configurations that are
discussed in detail in Appendix 6 and meet the design requirements
of Section 4.1. Table 4-2 describes the superDAAC configurations.
Table 4-2: SuperDAAC Configurations
Type of platform
| | Storage device
| Database platform
| Compute server
| Interconnect technology
| Request queueing system
|
WS/NTP |
| IBM NTP 3495
| DEC 7000
| DEC 7000
| ATM |
CS 6400 |
Vector/NTP
| | IBM NTP 3495
| CS 6400
| Supercomputer
| HIPPI |
CS 6400 |
Both use an IBM tape silo, an NTP 3495, and a CRAY Superserver
made up of SPARC processors (CS-6400) as the request queuing system.
In the workstation-oriented system (WS), a DEC Alpha machine
(DEC 7000) forms the basis for the compute server and the database
platform. A large collection of interconnected machines is required
to satisfy the load, and an ATM switch is used to provide connectivity.
The second configuration is more conservative and uses a CRAY
C90 as a compute server (instead of a network of workstations)
and a CRAY CS-6400 as a DBMS engine. Table 4-3 indicates the
1994 hardware cost of each configuration.
Table 4-3: SuperDAAC Hardware Components
Type of platform
| | Max number of days to read archive
| Number of data servers
| Number of compute servers
| Number of tape robots
| Data server cost ($M)
| Disk cache cost ($M)
| Archive tape cost ($M)
| Compute platform cost ($M)
| Total hardware cost ($M)
|
WS/NTP |
| 27.7
| 18 |
106 | 8
| $29.5 |
$31.5 | $12.8
| $55.7 |
$129.5 |
Vector/NTP
| | 27.7
| 2 |
4 | 8
| $31.0 |
$31.5 | $12.8
| $128.0
| $203.3
|
The table indicates the number of each component needed, the aggregate
cost of each component, and the total hardware cost.
Notice that the conservative design is $203M, while the network
of workstations saves dramatically on the compute server and costs
$129.5M.
In our opinion, these 2 designs bracket the reasonable costs of
a real system and can be used safely for the cost analysis in
Section 5.
4.4 PeerDAAC Cost Analysis
A modified version of the design criteria used for the superDAACs
can be applied to the design of peerDAACs. The same hardware
systems are used: data storage devices, database server platforms,
network switches, and compute servers.
However, we assume 2 different load requirements for peerDAACs:
- A minimal peerDAAC that can archive 0.5% of the raw data and
do associated lazy evaluation of queries. This would be a peerDAAC
for a single research group or small collection of groups.
- A larger peerDAAC that can archive 2% of the raw data and
do associated lazy processing. This would be a peerDAAC for a
substantial collection of users.
In Table 4-4, we indicate the sizing of 2 minimal peerDAAC configurations,
one based on the IBM RS/6000 and the DEC TL820 tape silo, and
the second based on a DEC Alpha and DEC DLT tape stacker. Note
the total price of a minimal peerDAAC varies between $2.04M and
$3.52M.
Table 4-4: Minimal PeerDAAC
Type of platform
| | Max number of days to read archive
| Num-ber of data server
| Num-ber of com-pute servers
| Number of tape robots
| Data servers cost
| Disk cache cost
| Archive tape cost
| Compute platform cost
| Hardware cost
|
IBM
RS/6000/ TL820
| | 12.5
| 2 |
1 | 2
| $1.35 |
$0.11 | $0.22
| $0.37 |
$2.04 |
DEC 7000/DLT
| | 0.7
| 3 |
1 | 56
| $2.01 |
$0.11 | $0.85
| $0.55 |
$3.52 |
In Table 4-5, we scale the 2 options upward to satisfy the requirements
of a large peerDAAC. The cost climbs to $2.86M to $4.22M.
These numbers will be used in Section 5 for downstream peerDAAC
cost estimates. The details of peerDAAC configuration appear
in Appendix 6.
Table 4-5: Large PeerDAAC Costs
Type of platform
| | Max number of days to read archive
| Number of data server servers
| Number of compute servers
| Num-ber of tape robots
| Data servers cost
| Disk cache cost
| Archive tape cost
| Compute platform cost
| Total hardware cost
|
IBM RS/6000/ TL820
| | 12.5
| 2 |
2 | 6
| $1.03 |
$0.43 | $0.66
| $0.74 |
$2.86 |
DEC 7000/NTP
| | 3.0
| 2 |
2 | 1
| $1.38 |
$0.43 | $1.31
| $1.10 |
$4.22 |