\begin{center}
{\bf Advanced Architecture Computers*}
\vspace{.4in}
{\em Jack J. Dongarra and Iain S. Duff}
\vspace{.15in}
({\em dongarra@mcs.anl.gov} and {\em na.duff@na-net.stanford.edu})
\vspace{.15in}
Mathematics and Computer Science Division
Argonne National Laboratory
Argonne, Illinois  60439-4844
\vspace{.15in}
Computer Science and Systems Division
Building 8.9
Harwell Laboratory
Oxfordshire OX11 ORA
England
\end{center}
\vspace{.4in}
{\bf Abstract:}
We describe the characteristics of several recent computers that
employ vectorization or parallelism to achieve high performance
in floating-point calculations.
We consider both top-of-the-range supercomputers and computers
based on readily available and inexpensive basic units.
In each case we discuss the architectural base, novel features,
performance, and cost.  We intend to update this report 
regularly, and to this end we welcome comments.
\vspace{.3in}
\noindent
{\bf Keywords}

\par\noindent
vector processors, array processors, parallel architectures,
supercomputers, high-performance computers


\section{Introduction}
 
In the past few years several machines have been announced
that use some form of parallelism to achieve performance
in excess of that attainable directly from the underlying
technology of the constituent chips.  To
a large degree the availability of low-cost chips as
building blocks has given rise to many of these new
machines.
 
After listening to numerous technical and
sales presentations on these new computers, we 
became overwhelmed and confused with the characteristics of
each product and its relative strengths and weaknesses.  In
an effort to clarify these issues - both for ourselves and
for other computational scientists - we have written this
report summarizing the range of machines available,
the architectures employed, and the principal features of each
machine.

In Section 2, we list the computers considered  and discuss the
criteria we have used to select them. We present a rough
classification based on architectural features and their niche in
the marketplace. This classification divides the machines into
five categories: supercomputers, minisupercomputers, vector
add-ons or vector-assisted mainframes, parallel processors, and
high-performance graphics workstations.  Each category is
discussed in turn in Sections 3 through 7.  More detailed information
on the machines is provided as Appendix B.

The guidelines used in preparing the detailed descriptions are
given in Section 8.  In some cases, our data are incomplete and
nonuniform.  This situation reflects the technical level of the
presentations, the documentation available to us, the stage of
development of the product being described, and the comments
received from vendors on draft copies of our document.  We 
welcome comments and criticisms that might help to remedy
any deficiencies.  This report is a second edition.  We
intend to continue updating this report to reflect both the
changing marketplace and further information on currently listed
machines.

\section{
Summary and Classification of Machines Considered}
 
In the past few years there has been an unprecedented explosion in
the number of different computers in the marketplace. This
explosion has been fueled partly by the availability of powerful
and cheap building blocks and by the availability of venture
capital. There have been two main directions to this explosion.
One has been the personal computer and workstation market, and the
other the development and marketing of computers using advanced
architectural concepts.  In this report we restrict our study to
the latter group, with particular interest in architectures that
use some form of parallelism to increase performance over that of
the basic chip.

We also restrict our attention to machines that are
available commercially, and thus exclude research projects
in universities and government laboratories and products
still at the design stage.  We would, however, welcome being
alerted to ongoing activities.
 
We have necessarily had to exclude information obtained
under non-disclosure agreements. We will update this report
as such information is released through product
announcements.

A much-referenced and useful taxonomy of computer architectures was
given by Flynn (1966).  He divided machines into four categories:

\begin{flushleft}
 (i)   SISD - single instruction stream, single data stream\\
 (ii)  SIMD - single instruction stream, multiple data stream\\
 (iii) MISD - multiple instruction stream, single data stream\\
 (iv)  MIMD - multiple instruction stream, multiple data stream\\
\end{flushleft}

Although these categories give a helpful coarse division, we
find immediately that the
current situation is more complicated, with some architectures
exhibiting aspects of more than one category.
 
Many of today's machines are really a hybrid design. For
example, the CRAY X-MP has up to four processors (MIMD), but
each processor uses pipelining (SIMD) for vectorization.
Moreover, where there are multiple processors, the memory
can be local, global, or a combination of these. There may
or may not be caches and virtual memory systems, and the
interconnections can be by crossbar switches, multiple
bus-connected systems, time-shared bus systems, etc.

We thus choose a method of subdividing and classifying
the machines different from that used in our original report (Dongarra and Duff
1987).  As before, we identify the supercomputers separately and
discuss these in Section 3. However, we split the other machines
according to their niche in the marketplace rather than their
connectivity or mode of data access or data transfer.
Minisupercomputers can be defined as junior versions of
supercomputers that offer a similar interface to the larger
machines but with lower performance and reduced costs.  We
consider machines in this class in Section 4. Some powerful vector
computers do not fall into either of the previous classes but are
based on an enhancement to a mainframe computer through the
addition of an array processor or an integrated vector facility.
We discuss both types of computer in Section 5. In Section 6, we consider
machines that rely primarily on parallelism rather than pipelined
vector processing, and divide these into two
categories depending on whether we regard them as good
experimental vehicles for studying parallelism and parallel
algorithms or whether we consider them as
potential supercomputers of the
future. In
Section 7 we summarize the high-performance graphics workstations
that do not themselves qualify for the previous categories but
that are clearly in a different class from regular
top-of-the-line workstations.

\section{Supercomputers}

Supercomputers are by definition the fastest and most powerful
general-purpose scientific computing systems available at any
given time. They offer speed and capacity significantly greater
than mainframe computers, defined as top-of-the-range
widely available machines built primarily for
commercial use. The term supercomputer became prevalent in
the early 1960s, with the development of the CDC 6600. That
machine, first marketed in 1963, boasted a performance of 1
Megaflops (millions of floating-point operations per second).

During the next fifteen years, the peak performance of
supercomputers grew at an rapid rate; and since 1980,
that trend has accelerated. The projected 1995 machine is expected
to have a maximum speed of 200 Gigaflops, more than 200,000 times
that of the CDC 6600 (see Table 1).

\begin{center}
Table 1.  Performance Trends in Scientific Supercomputing
\end{center}
\vspace{.1in}
\begin{tabular}{l|l l l l}

Year     &  Machine   &   Speed            &  Speed Increase & \\
\hline
         &            &                    & 10 years  & 20 years\\
1963     &  CDC 6600  &   1 MFLOPS         & -         & -\\
1969     &  CDC 7600  &   4 MFLOPS         & 4         & -\\
1979     &  CRAY-1    &  160 MFLOPS        & 100       & -\\
1983     &  CYBER 205 &  400 MFLOPS        & 100       & 400\\
1986     &  CRAY-2    &  2 GFLOPS          & 500       & 2000\\
1990-1995&      -     &  200 - 1000 GFLOPS & 1000      & 250,000\\
\end{tabular}
\vspace{.1in}

Many companies 
have devoted their resources to producing
the fastest and most powerful machines on the market. Their
strategy has been to develop a few state-of-the-art machines that
enable scientists and engineers to tackle problems previously
considered computationally infeasible.
From these commercial ventures we have seen
the development of vector and, more recently, parallel computers
capable of solving complex numerical and nonnumerical problems.
The second generation, with higher speed and more parallelism, is
already under development.
In Table 2, we summarize the currently available supercomputers.

\begin{center}
Table 2.  Supercomputers
\end{center}
\vspace{.1in}
\begin{tabular}{l | r r c r}


Machine           &  Maximum Rate, & Memory,    & OS & Number \\
                  &  in MFLOPS    & in Mbytes &    & of Processors \\
\hline

 CRAY-1           &  160  &  32   &  Own   &  1   \\
 CRAY X-MP        &  941   &    512   &  Own/UNIX   &   4  \\
 CRAY Y-MP        &  2667   &    256   &  Own/UNIX   &   8  \\
 CRAY-2           &  1951  &  4096  &  UNIX  &  4   \\
 CYBER 205        &   400 & 128  & Own  &  1  \\
 ETA-10G          &  5714(a)  &  2048(b)  &  UNIX/VSOS    &  8 \\
 ETA-10E           &  3810(a)  &  2048(b) & UNIX/VSOS  &  8   \\
 ETA-10Q           &  526(a)  &  512(b)  & UNIX/VSOS  &  2   \\
 Fujitsu VP-400E  &  1714 &  1024  &  Own  &  1  \\
 Fujitsu VP-200E  &  857 &  1024  &  Own  &  1  \\
 Fujitsu VP-100E  &  429 &  1024  &  Own  &  1  \\
 Fujitsu VP-50E  &  286 &  1024  &  Own  &  1  \\
 Fujitsu VP-30E  &  133 &  1024  &  Own  &  1  \\
 Hitachi S-820/80 &  2000 &  512(c)  &  Own  &  1   \\
 Hitachi S-810/20 &  857 &  512(c)  &  Own  &  1   \\
 NEC SX-2A        &  1300 &  1024(d) & Own &  1 \\
 NEC SX-1A        &  650 &  1024(d) & Own &  1 \\
 NEC SX-1E        &  324 &  1024(d) & Own &  1 \\
\end{tabular}
\vspace{.1in}

\begin{tabbing}
aaa\=bbb\=  \kill

\>(a) for 64-bit processing on 2 pipelines with linked triad and overlapped\\
\>\>	scalar processing\\
\>(b) Also 16 MWord (128 Mbyte) local memory for each processor\\
\>(c) Also a 12-Gbyte extended memory\\
\>(d) Also a 8-Gbyte extended memory\\
\end{tabbing}

\vspace {.1in}
 
The actual price of the systems in Table 2 depends
on the configuration, with most manufacturers offering
systems in the \$5 million to \$20 million range.  All use ECL
logic with LSI, except the CRAY X-MP, the CRAY-1 in SSI, and
the ETA-10 in CMOS ALSI (Advanced Large Scale Integration), and
all use pipelining and/or multiple functional units to
achieve vectorization/parallelization within each processor.  Cray
and ETA are the only supercomputer manufacturers to
offer multiple-processors machines, although 
other vendors have announced multiprocessor machines for delivery in the
near future.  The form of synchronization on both the Cray and ETA
machines is essentially event handling.  Both Fujitsu and Hitachi
systems are IBM System 370 compatible.  We have included the
CRAY-1 computer in the above table largely as a benchmark since
it could not now be considered a supercomputer in terms of
performance and is no longer manufactured by Cray. The Fujitsu
machines are marketed in Europe and North America by Amdahl (the 500E to 1400E
range) and by Siemens (the VP-50 to 400 range).

\section{Minisupercomputers}

Below the supercomputer market, a new class of
near-supercomputers or minisupercomputers has emerged.  These
systems typically feature strong vector or advanced scalar
capabilities and have been utilized for traditional
high-performance technical computing applications.  Priced well
under supercomputers, \$100,000 to generally no more than \$1
million, minisupercomputers are frequently sold when budgets are
limited to this price range or when stand-alone
capabilities are required.  Early leaders in the field of
minisupercomputing were Alliant, Convex, and Scientific Computer
Systems.  More recently, this market has experienced high
growth, and many new products and companies have emerged, including
Multiflow, and Gould (see Table 3).

\begin{center}
Table 3.  Minisupercomputers
\end{center}
\vspace{.1in}
\begin{tabular}{l | c c c }

Vendor & Theoretical Peak & LINPACK & First Shipment \\
       &Performance       &Performance   \\
       &Mflops (64 bits)      &Mflops   \\
\hline
Alliant FX/8 &  94  &   7.6  &  1985 \\
Alliant FX/80 & 188  &  8.5  &  1987 \\
Astronautics  &  90  &  7.1  &  1988 \\
Convex C1  &    20   &  7.3  &  1984 \\
Convex C2   &   200  &  16  &   1987 \\
FPS 500 & --     &  --  &  1988 \\
Multiflow Trace 28/200 & 60   &  10   &   1987 \\
\end{tabular}
\vspace{.1in}

\section {Enhanced Mainframes}

An alternative in the near-supercomputer category is the add-on
array processor. Companies such as Floating Point Systems, and Star
Technology are actively marketing these add-on products
in an effort to attract current supercomputer users.

In a related vein, vector-processing enchancements are now
being marketed for commercial mainframes.  These vector
enhancements allow machines produced for general-purpose
applications to offer users increased numerical capability.  In
some cases, the ability to apply vectors is extended to more than
one processor in multiprocessing mode.  Companies currently
offering such vector-processing capabilities include Control
Data, Hitachi (marketed in the West by
NAS and COMPAREX), Honeywell, IBM, and UNISYS.

We summarize some of the machines in this category in Table 4.
 
\begin{center}
Table 4.  Power-assisted mainframes
\end{center}
\vspace{.1in}
\begin{tabular}{l|c c c c}

Machine &  Maximum Rate, & Memory, & OS & Number of \\
        & Mflops        & Mbytes &    & Processors \\
\hline
CDC 180 990 & 125 & 256 & NOS/VE & 1-2 \\
FPS M64/140 & 187 & 128 & Own & 1 \\
IBM 3090S/VF &  696 &    256 (a) &   Own &   1 - 6 \\
NAS AS/91X0 &  ?   &  64 &    Own &   1 or 2  \\
Unisys 1190/ISP & 266 &  128 &    Own &   1,2,4 (c) \\
\end{tabular}
\vspace{.15in}
\begin{tabbing}
aaa\=  \kill
\> (a) Also a 2-Gbyte extended memory\\
\> (b) In 32-bit arithmetic\\
\> (c) Only 1 or 2 ISPs can be attached\\
\end{tabbing}

\vspace {.1in}

\section{Parallel Machines}

While most of the supercomputers and minisupercomputers
utilize vector processing to provide performance, a number of new
companies are developing parallel processing systems.  Such
systems range from smaller (8- to 30-processor) machines
like the Sequent or Encore to massively parallel
(16,384-processor) systems like the Thinking Machines CM-2.
Others in this area include Floating Point
Systems, Myrias, BBN Advanced Computing,
and DEC; and they may be joined soon
by IBM, which has indicated
that it will offer a product in this category by 1989.

While it certainly true that the parallel architectures fall
into two camps depending on whether or not they are potential
supercomputers, it is less easy to assign a particular machine to
one of these classes.  We have, however, made a partly subjective
judgment and compare the parallel architectures in two
tables.  Table 5 summarizes those parallel architectures that are
designed for experimentation with parallel constructs, and Table 6
lists machines with potential for future elevation to the status
of a supercomputer.
\pagebreak
\begin{center}
Table 5.  Experimental parallel machines
\end{center}
\vspace{.1in}
\begin{tabular}{l|c c c }

Machine & Chip &  Max. Parallelism &  Connection \\
\hline
Elxsi 6400 &    ECL &   12 &    bus \\
Encore Multimax & 32332/32081 &  20  &   bus \\
(optional Weitek 1164/1165)	&	&	& \\
Flex/32 & 32032/32081 &  20 &   bus \\
Sequent Symmetry S81 &  80386/80387 &  30 &    bus \\

\end{tabular}
\vspace{.1in}




\begin{center}
Table 6. Potential supercomputers
\end{center}
\vspace{.1in}
\begin{tabular}{l c c c}

Machine & Chip &  Parallelism &  Connection \\
\hline

Active Memory (DAP) & CMOS &  4096 (SIMD) &  near-neighbor \\
BBN Butterfly TC 2000 &  88000 &  256 &   Banyon network \\
CYBERPLUS &  Own  &   256  &  ring \\
Intel iPSC/2 &    80386/80387 &   128 &   hypercube \\
IP-1 & Own & 33 &     cross-bar \\
Meiko &  Transputer &  No limit (a) &  user-configurable \\
Myrias SPS-2 &   68020/68882 & 512 minimum & hierarchical bus \\
NCUBE &  VLSI &  1024 &  hypercube \\
TMC CM-2 &   VLSI &  65536 (SIMD) &  hypercube \\
\end{tabular}
\vspace{.15in}
\begin{tabbing}
aaa\=  \kill
\> (a) Maximum system delivered to date has 1024 processors\\
\end{tabbing}

Because of the widely differing architectures
of the machines in Tables 5 and 6, it is not really advisable to
give one or even two values for the memory.  In some
instances there is an identifiable global memory; in others
there is a fixed amount of memory per processor.
Additionally, it may be possible to configure memory as either
local or global. A value for the maximum speed is even
less meaningful than in the previous tables, since a high Megaflop rate
is not necessarily the objective of those machines
and the actual speed will depend on the
algorithm and application.

\section{High-Performance Graphics Workstations}

Finally, the supercomputer market has been expanded by the
introduction of supercomputing workstations, such as those from
Apollo, and single-user high-performance graphics systems such
as those from Apollo, Ardent, Stellar, and Silicon Graphics.
We summarize these machines in Table 7.

\begin{center}
Table 7. High-performance graphics workstations
\end{center}
\vspace{.1in}
\begin{tabular}{l|c c c c}

Machine & Chip &  Peak performance, & Memory, \\
        &      &  Mflops         & Mbytes \\
\hline

Apollo DN10000 & Own & ? & ? \\
Ardent TITAN & MIPS/Weitek & 64 & 128 \\
Silicon Graphics IRIS GT & MIPS/Weitek & 100 & 16 \\
Stellar GS2000 & Own/Weitek & 80 & 128 \\
\end{tabular}
\vspace{.15in}

\section { Template for Machine Description}

As we mentioned in the introduction, the level of technical
information on each machine varied significantly. We have,
however, attempted to organize the available information in
a consistent manner. In Table 8, we give the template used
in presenting the data in the appendixes.

\begin{center}
Table 8. Template for Description of Machines 
\end{center}

\begin{verbatim}

         Name of machine, manufacturer, backers, etc.
         Architecture
           Basic chip used
           Local, shared memory, or both
           Connectivity (for example, grid, hypercube)
           Range of memory sizes available; virtual memory
           Floating point unit (IEEE standard?)
         Configuration
           Stand-alone or range of front-ends
           Peripherals
         Software
           UNIX or other?
         Languages available
         Fortran characteristics
           F77
           Extensions
           Debugging facilities
           Vectorizing/parallelizing capabilities
         Applications
           Run on prototype
           Software available
         Performance
           Peak
           Benchmarks on codes and kernels
         Status
           Date of delivery of first machine, beta sites, etc.
           Expected cost (cost range)
           Proposed market (numbers and class of users)
         Contact:  technical and sales
\end{verbatim}



\newpage
\begin{tabular}{l r}
Machine   &  Page  \\  

end{tabular}


.