From GOLUB@SU-SCORE.ARPA Tue Mar 11 16:33:34 1986
Received: from SU-SCORE.ARPA (su-score.arpa.ARPA) by anl-mcs.ARPA (4.12/4.9)
	id AA06972; Tue, 11 Mar 86 16:32:38 cst
Message-Id: <8603112232.AA06972@anl-mcs.ARPA>
Return-Path: <KARP@IBM-SJ.ARPA>
Received: from IBM-SJ.ARPA by SU-SCORE.ARPA with TCP; Tue 11 Mar 86 12:14:05-PST
Date: 11 Mar 86 11:27:25 PST
From: KARP@IBM-SJ.ARPA
To: duff@anl-mcs.arpa, na.dis@su-score
Resent-Date: Tue 11 Mar 86 13:38:38-PST
Resent-From: Gene H. Golub  <GOLUB@SU-SCORE.ARPA>
Resent-To: :;
Resent-Message-Id: <12189932408.47.GOLUB@SU-SCORE.ARPA>
Status: R

             Reply to Rice and Bajaj's comments on
             Karp's parallel processing challenge

You have raised some useful points in your note. I believe the
final paragraph in my challenge makes my intent clear. I would
like a demonstration that a massively parallel computer will be
useful for general purpose, scientific computing. Throughout
the challenge I make it clear that I will be quite reasonable
in what I consider a successful demonstration. In fact, I have
lined up 5 qualified but disinterested people to adjudicate any
disputes. I believe that the conditions I set are reasonable,
but they may need some clarification.

I believe the challenge raises some important questions, but I
may not have been as precise as I should have been. I would
like to get these points clarified before anything further goes
out to the general community. Perhaps we can construct an
addendum that will clear up any inconsistencies.

Let me reply to your points.

1. BEST ALGORITHM - Your point is well taken. My intent here is
that a reasonable algorithm be used on the sequential machine.
For example, if a major part of the time for the job is taken
in a symmetric eigenvalue routine, the parallel machine might
run fastest with a Jacobi iteration. I would expect the
sequential run to use the symmetric QR rather that the Jacobi
method. If two experts differ on which method is "best", I am
willing to accept either. However, you should be able to
convince a reasonable third party that you have made an honest
effort to be fair.

2. APPLICATION - Part of the reason I am letting the offer
stand for 10 years is the time it will take to port application
codes to parallel machines. However, I would hope that people
building 1,000 processor systems will eventually attempt to
solve real problems on them. I expect that the first step in
porting an application to a parallel processor will be to get a
sequential version going. If this version is at all reasonable
for one processor, I will be glad to accept it for the
sequential run. In any event, few people will be willing to buy
massively parallel systems until their utility is proven.  Since
it is not unusual for people to put in a great deal of effort
to port a code for a benchmark, I don't think the comparison
I am asking for is unreasonable.

3. EMBARRASINGLY PARALLEL - The intent of this restriction is
to prevent someone claiming the prize who has set up 200 PCs
running multiple, independent data streams. All that this run
would prove is that more CPUs increases throughput. If there is
no interaction between the processors in solving the problem,
then it doesn't count. Indeed, the method of lines is not
excluded. However, it must run 200 times faster than a suitable
algorithm on one processor. The one processor algorithm can
also be the method of lines if that is deemed a reasonable way
to solve the problem, but should be something else if the
method of lines is inappropriate.

4. WHAT IS AN ANSWER? - No problem here. An answer is what a
working scientist or engineer accepts as suitable output for
the problem. Clearly, the answers need not be identical. The
amount of output required and its accuracy should be determined
by the person interested in the output. Since an application is
defined as a batch job submitted to a supercomputer, it is
reasonable to let the person submitting that job decide if the
answers are good enough.

5. WHAT IS CRIPPLING? - Crippling is well-defined in the
original statement. Since speed-up is defined as the time taken
by one of your processors divided by the time taken by N of
your processors, it is important that the single processor be
given adequate resources. I would like to compare properly
balanced machines, but I understand the problems. A good
example is the response I received from the IBM Ultra group. I
have agreed that the one processor run can use the memory on
all the processors as long as they make a reasonable effort to
remove network delays that would not exist if all available
memory was local. At no point do I suggest comparing machines
with different base processors such as Cray vs NCUBE.

In summary, several of the points you raise concern loopholes
in the rules. I am really interested in a comparison run by
people of good faith who will be fair. In fact, I have already
decided that if someone satisfies my conditions without
convincing me that I will be able to use the machine for my
problems, I will renew the offer. Of course, since the money is
my personal funds, I reserve the right to terminate a run on my
bank account.

Alan


.