From GOLUB@SU-SCORE.ARPA Tue Mar 11 16:33:34 1986 Received: from SU-SCORE.ARPA (su-score.arpa.ARPA) by anl-mcs.ARPA (4.12/4.9) id AA06972; Tue, 11 Mar 86 16:32:38 cst Message-Id: <8603112232.AA06972@anl-mcs.ARPA> Return-Path: Received: from IBM-SJ.ARPA by SU-SCORE.ARPA with TCP; Tue 11 Mar 86 12:14:05-PST Date: 11 Mar 86 11:27:25 PST From: KARP@IBM-SJ.ARPA To: duff@anl-mcs.arpa, na.dis@su-score Resent-Date: Tue 11 Mar 86 13:38:38-PST Resent-From: Gene H. Golub Resent-To: :; Resent-Message-Id: <12189932408.47.GOLUB@SU-SCORE.ARPA> Status: R Reply to Rice and Bajaj's comments on Karp's parallel processing challenge You have raised some useful points in your note. I believe the final paragraph in my challenge makes my intent clear. I would like a demonstration that a massively parallel computer will be useful for general purpose, scientific computing. Throughout the challenge I make it clear that I will be quite reasonable in what I consider a successful demonstration. In fact, I have lined up 5 qualified but disinterested people to adjudicate any disputes. I believe that the conditions I set are reasonable, but they may need some clarification. I believe the challenge raises some important questions, but I may not have been as precise as I should have been. I would like to get these points clarified before anything further goes out to the general community. Perhaps we can construct an addendum that will clear up any inconsistencies. Let me reply to your points. 1. BEST ALGORITHM - Your point is well taken. My intent here is that a reasonable algorithm be used on the sequential machine. For example, if a major part of the time for the job is taken in a symmetric eigenvalue routine, the parallel machine might run fastest with a Jacobi iteration. I would expect the sequential run to use the symmetric QR rather that the Jacobi method. If two experts differ on which method is "best", I am willing to accept either. However, you should be able to convince a reasonable third party that you have made an honest effort to be fair. 2. APPLICATION - Part of the reason I am letting the offer stand for 10 years is the time it will take to port application codes to parallel machines. However, I would hope that people building 1,000 processor systems will eventually attempt to solve real problems on them. I expect that the first step in porting an application to a parallel processor will be to get a sequential version going. If this version is at all reasonable for one processor, I will be glad to accept it for the sequential run. In any event, few people will be willing to buy massively parallel systems until their utility is proven. Since it is not unusual for people to put in a great deal of effort to port a code for a benchmark, I don't think the comparison I am asking for is unreasonable. 3. EMBARRASINGLY PARALLEL - The intent of this restriction is to prevent someone claiming the prize who has set up 200 PCs running multiple, independent data streams. All that this run would prove is that more CPUs increases throughput. If there is no interaction between the processors in solving the problem, then it doesn't count. Indeed, the method of lines is not excluded. However, it must run 200 times faster than a suitable algorithm on one processor. The one processor algorithm can also be the method of lines if that is deemed a reasonable way to solve the problem, but should be something else if the method of lines is inappropriate. 4. WHAT IS AN ANSWER? - No problem here. An answer is what a working scientist or engineer accepts as suitable output for the problem. Clearly, the answers need not be identical. The amount of output required and its accuracy should be determined by the person interested in the output. Since an application is defined as a batch job submitted to a supercomputer, it is reasonable to let the person submitting that job decide if the answers are good enough. 5. WHAT IS CRIPPLING? - Crippling is well-defined in the original statement. Since speed-up is defined as the time taken by one of your processors divided by the time taken by N of your processors, it is important that the single processor be given adequate resources. I would like to compare properly balanced machines, but I understand the problems. A good example is the response I received from the IBM Ultra group. I have agreed that the one processor run can use the memory on all the processors as long as they make a reasonable effort to remove network delays that would not exist if all available memory was local. At no point do I suggest comparing machines with different base processors such as Cray vs NCUBE. In summary, several of the points you raise concern loopholes in the rules. I am really interested in a comparison run by people of good faith who will be fair. In fact, I have already decided that if someone satisfies my conditions without convincing me that I will be able to use the machine for my problems, I will renew the offer. Of course, since the money is my personal funds, I reserve the right to terminate a run on my bank account. Alan .