#!/bin/sh
# shar:	Shell Archiver  (v1.22)
#
#	Run the following text with /bin/sh to create:
#	  README
#	  Makefile
#	  getrtc.s
#
sed 's/^X//' << 'SHAR_EOF' > README &&
X** DRAFT ** * ** DRAFT ** * ** DRAFT ** * ** DRAFT ** * ** DRAFT **
X                          21 Sept. '92
X 
XIntroduction & Background
X-------------------------
XThis README describes a 3-D volume-renderer developed by staff of
Xthe Cornell National Supercomputer Facility (CNSF).  It uses the
Xray-casting algorithm (not to be confused with ray-tracing)
Xdescribed in:
X 
X   Levoy, Mark (1988). Display of surfaces from volume data.
X   IEEE Computer Graphics & Applications 8(3):29-37.
X 
XThe original, serial implementation of Levoy's algorithm at the
XCNSF was done by Daniel Kartch in autumn '88.  Catherine Devine
Ximproved the user interface, added function and generally made
Xthe program more flexible and useful.  Hugh Caffey did tuning and
Xvarious parallel implementations, including this one using pvm.
XUCLA got an older, serial version of the program from the CNSF
Xand made it available to its users.  Some parts of this README
Xrelating to the use of the program were extracted from:
X 
X   Vrend, A Volumetric Rendering Program. Perspective 15(2): 52-67.
X   (A publication of the Office of Academic Computing of U.C.L.A.)
X 
XThe pvm-ized, FORTRAN version of the program is divided into three
Xpipelined "components":
X  (1) pvrmain - the serial, parent code that deals with all set-up,
X      reading of data, and management of children; ONE instance
X  (2) pallslabs - 'parallel all slabs'; this is the computationally
X      intensive back end; it has approx. 60% vector content;
X      MULTIPLE instances
X  (3) pvrplot - takes output from pallslabs and generates run-length
X      encoded records (in an RGB bitmap) using C routines; ONE instance
X 
X 
X 3-d                               -> pallslabs \              rle
X data -> pvrmain(read3d) -> vrrays -> pallslabs -> pvrplot -> bitmap
X                                   -> pallslabs /
X 
XAny of these three components can be run on any system on which
Xpvm is installed.  (The version of pvm on which these programs
Xwas tested before their contribution to netlib was pvm 2.4.1.)
XThese programs have been run extensively on all combinations of
XIBM RISC System/6000 workstations and IBM 3090-600s (running
XAIX/370 MP; a.k.a. PAIX).  Display of output images has been
Xvia rletoxim and xim in which the RLE bitmaps are driven into an
XX window.  (These xim programs are also available via netlib as
X"ximstuff".)
X 
XIntended Audience
X-----------------
XYou should already know the basics of writing and running pvm
Xapplications.  You should also have some means of displaying
Xrun-length encoded RGB bitmaps.  (At the CNSF, a set of X-windows
Xtools is used.  These are also available via netlib as "ximstuff".)
XThis volume-rendering application is provided as: (1) an example of 
Xa non-trivial, production-quality pvm application designed to be used
Xon any combination of workstations and/or multiprocessor mainframe
Xmachines; and (2) a real production code to be used by anyone wishing
Xto render iso-value shells from 3-d data sets.  As well, we include
Xsome timing and utility routines that you may find more generally 
Xuseful with pvm.
X 
XDisclaimer
X----------
XSource code and associated files for this application are
Xprovided as-is.  Those who acquire these files are free to modify
Xand use them in any way they wish but always to acknowledge the
Xfiles' source, as outlined above.  The authors reserve the right
Xto deal with problems, suggestions and complaints as they see fit
X- including not at all.  Please direct any correspondence to:
X            Hugh Caffey   (caffey@tc.cornell.edu)
X            Cornell National Supercomputer Facility
X            Cornell University
X            Ithaca, NY  14853-3801
X 
X 
XHow to Build & Run the Volume Renderer
X--------------------------------------
XThis application was originally developed as a FORTRAN program.
XRecently, three small C routines were added for output.
XWith three possible exceptions, the code should be quite portable,
Xi.e. there are few system-dependent routines or constructs and
Xno "canned" routines are used.  The possible exceptions:
X (1) timing routines.  These are always system dependent.  The
X     routines included with this distribution work well on
X     IBM RS/6000 workstations but not at all on any other system.
X     (Hint for Sun/SPARC users: you can probably use the time()
X     routine instead of getrtc() and the dtime() routine instead
X     of mclock());
X (2) use of d or D in column 1 to enable/disable debugging
X     statements easily (Sun/SPARC users: use the '-xld' compiler
X     option).  If the compiler(s) you use do not support
X     d or D in column 1, simply change all of these to
X     c, C or *; and
X (3) use of underscores in the declaration of pvm routines.
X     If you look at many of the C source routines for the f2c
X     library for pvm (probably in /usr/local/src/pvm2.41/f2c
X     or some such), you'll see the form of the entry point that
X     your system's link-editor expects.  For example, the proper
X     form for the fenroll routine on an RS/6000 is simply fenroll;
X     on a Sun/SPARC system, it would be fenroll_.  It should be
X     sufficient just to make sure that the correct form is
X     specified in the EXTERNAL declarations for these routines,
X     e.g.  external fenroll_, finitiatem_, etc.
X 
XA small test data file and problem set-up are included in this
Xdistribution.  The file sph0 contains 13x13x13 real values (used
Xas single precision) that were synthesized for the purposes of
Xtesting and debugging the code.
X 
XSteps to build the executable modules and to run the small test
Xproblem:
X 
X(1) Edit Makefile to specify any compiler options and path
Xnames you may need (see (2) below).  Makefile is currently
Xset up to build three modules executable on RS/6000s using xl
XFORTRAN and cc.
X 
X(2; optional) Set up any special directories.  You should be able
Xto leave all files where you've unpacked them and still build and
Xrun this application.  Alternately, you can set up some special
Xdirectories.  One convenient arrangement is to set up the
Xfollowing:
X    ~myhome/timers - timing routines
X    ~myhome/utils  - general utility routines
X    ~myhome/data   - 3-d rectilinear data files for input
X    ~myhome/frames - rle bitmaps (output) from the program
X    ~myhome/pvm/hosts - host files for use by the local pvm daemon
XMost of the following discussion assumes that you have set up these
Xdirectories.
X 
X(3) You'll next need to copy files into these directories as follows:
XInto  ~myhome/timers  copy getrtc.s, cput.f, wallt.f
XInto  ~myhome/utils   copy chunks.f, gethosts.f, int2char.f,
X                           chnk.incl, copn.c, ccls.c, rleout.c
XInto  ~myhome/data    copy sph0
X 
X(4) Create binaries of timing and other utility routines:
XThis step is system-dependent.  If you're using *only* RS/6000
Xworkstations, the following should work as-is.
X + cd to ~myhome/utils;  issue chmod u+x utils.bld;  issue utils.bld
X + cd to ~myhome/timers; issue chmod u+x timers.bld; issue timers.bld
X 
XOtherwise, you should:
X 
X + edit sysvers.pvm.incl to make it report an appropriate message
X 
X + cp begwholetime.rs6k.incl to begwholetime.your_machine.incl
X + cp endwholetime.rs6k.incl to endwholetime.your_machine.incl
X 
X + cp begcpuslab.rs6k.incl to begcpuslab.your_machine.incl
X + cp endcpuslab.rs6k.incl to endcpuslab.your_machine.incl
X 
X + cp begcpuplot.rs6k.incl to begcpuplot.your_machine.incl
X + cp endcpuplot.rs6k.incl to endcpuplot.your_machine.incl
X 
XThen, substitute calls to the appropriate timing routines in the
X*.your_machine.incl files and in wallt.f and cput.f
X 
X(5) Next, edit several source files as follows:
XIn  chunks.f,  change './chnk.incl'  to  '~myhome/utils/chnk.incl'
XDo exactly the same in gethosts.f, pallslabs.decl.pvm.incl,
Xpvrmain.decl.pvm.incl, pvrplot.decl.pvm.incl and vrrays.decl.pvm.incl.
X 
X(6) Edit vrparam.incl.  Ensure that NPIX is a small value, e.g.
X64, for the first run.
X 
X(7) Run make to build the three programs:
X            pvrmain, pallslabs; and pvrplot.
X 
X(8) Set up appropriate pvm host files (if you haven't already got
Xthem).  A small, additional constraint imposed by this
Xapplication (in subroutine gethosts) is that each node's name
Xmust fit into columns 1-35 of the line it occupys.
X 
X(9) Make 6-8 copies of the test data file, sph0:
X  cp sph0 sph1
X  cp sph0 sph2
X  cp sph0 sph3
X      ... etc.
X 
X(10) Make sure that your .rhosts file on each of the nodes you
Xintend to use includes all needed host names and login names.
X 
X(11) If you are using a windowing environment of some sort, e.g.
XX-windows, open two sessions - one per window - on the node that
Xwill act as host, i.e. the one that will run pvrmain.
X 
X(12) Edit the file runsphere.  Fill in appropriate paths and
Xhost file names.  Make runsphere executable.
X 
X(13) Edit the file sphere.in.  This file can re-directed as input
Xfrom stdin.  As currently written, it will specify that the input
Xdata files are in the working directory and that output bitmap
Xfiles are to be written to the home directory of the node that
Xexecutes pvrplot.  (Unless pvrplot is run on the parent node, in
Xwhich case the output will come to the working directory.) Modify
Xthese records appropriately to suit your environment.
X 
XSpecify the frame numbers to render; currently, it is set to
Xgenerate frames named 1, 2 and 3 (using data sets sph0, sph1 & sph2).
X 
X(14) In one window, execute runsphere.  When you get the message
X"pvm is ready" or some such, enter "conf" to check your pvm
Xconfiguration.  If all is in order, select the other window ...
X 
X(15) ... and start the program:  pvrmain < sphere.in
X 
X(16) To display the output bitmaps in an X window, first run
Xxhost to add the name of the host machine from which your
XX server accepts connections.  Next, enter
X             setenv DISPLAY yourdisplay:0.
XFinally, enter
X      rletoxim: rletoxim -i sph51.PIXELS -w 320 -h 240 | xim
Xto display frame 1 (the other two should look identical because
Xall data sets were identical).
X 
X 
X3-d Input data:
X--------------
XData files for input must be three dimensional, rectilinear
Xarrays of REALs; these values will be stored as 4-byte elements.
XIf your data set is sparse or not in a regular grid, you must
Xinterpolate to generate a regular grid.
X 
XThe specification and use of input files is fairly elaborate in
Xthis program because this version of the program was originally
Xdeveloped to work co-operatively with a geophysical simulation
Xrunning on another system by capturing the simulation's 3-d
Xoutput files as they were written.  These output files formed a
Xtime series of temperature or pressure profiles.  Coupling of the
Xrendering program to the remotely executing simulation was crude
Xand done via subroutine read3d.  read3d simply checks for the
Xexistence (using the inquire statement) of the file having the
X*current* frame number.  When this file exists, read3d then opens
Xand reads the file having the *previous* frame number.  This
Xsynchronization mechanism (it even includes code to 'spin-wait'
Xwith time-out) ensures that the rendering program, which
Xgenerally renders images faster than the simulation can produce
Xdata to be rendered, does not get ahead of the simulation.  This
Xis crude but has the advantages of being fairly general and
Xrequiring no modifications to the program that produces the data.
XThe main costs of this approach seem to be: (1) the potential
Xconfusion resulting from the seeming mis-match of frame numbers
Xand data file names; and (2) CPU cycles wasted in read3d when it
Xmust wait for the next file to exist.
X 
XFor example, to render frames using the first three data files
Xwould require that, at the prompt to 'Enter first frame, last
Xframe & frame step' you enter 1, 3, 1.  The logic in subroutine
Xread3d would further require that there be three data files
Xhaving suffixes of 0, 1 & 2.  Accordingly, input file specifiers
Xare built from three parts:
X    (i) path to the directory containing the files
X        , e.g.  /u/myhome/data/
X   (ii) prefix of file name
X        , e.g.  sph
X  (iii) sequence number
X        , e.g. 2
XSo, in the example input data set included with these programs,
Xone could copy the file sph0 to create files sph1 and sph2.  In
Xorder to render images of sph0, sph1 and sph2, one would have to
Xenter 1, 3, 1 at the frame prompt.
X 
XThere are three basic ways to specify the various set-up values
Xneeded to render images:
X (1) Automatically.  Subroutine setshl can generate a set of
X     'first approximation' settings.  It reads the first data
X     file, finds the minimum and maximum values and then specifies
X     5 iso-value shells corresponding to the 5 even divisions of
X     the range (maximum - minimum).  Colors, opacities, etc. are
X     also scaled accordingly.  This is often a good way to start
X     with a new data set with which you may be unfamiliar.  To do
X     this, first select the 'Set shell levels and colors' menu
X     option.  Next, select the 'Linear autoshell calculation' option.
X (2) By "hand", i.e. by selecting and entering individual values
X     for each of the variables listed in the menus.  This is
X     tedious but the best way to "tune" the setting to get the
X     the images exactly as you want.  After the settings are to
X     your liking, you can then save them in a set-up file (see (1)).
X (3) By reading a special set-up file.  Such files have a suffix
X     of ".VRPARM".  To create a set-up file, select the
X     'Save settings' menu option.  This will save current settings
X     in a file having a user-specified prefix.  To use a set-up
X     file, select the 'Load settings' option.
X 
XA suggested protocol for rendering images from new or unfamiliar
Xdata sets is first to set a relatively low resolution (set NPIX
Xto 64 or 128 in vrparam.incl).  Next, specify only one frame
Xto render and use the linear autoshell settings.  Save these
Xsettings in a .VRPARM file.  For such a low-resolution image of
Xonly one frame, there's little point in using more than 2-3
Xnodes.  View the resulting image and decide what changes need to
Xbe made.  Make these changes to the .VRPARM file.  Continue this
Xcycle of inspection and refinement until the image is about
Xright.  Now, increase the resolution by making NPIX larger, say,
X512.  Check the resulting image.  If it's what you want, set up
Xfor a full production run by setting NPIX to the desired
Xresolution and specifying all the frames you'll want.
X 
Xpvm-Related Inputs:
X------------------
XVerbose output from chunks: v or V tells the chunks subroutine to
Xprint the chunk boundaries to be used in this run.
X 
XFull path/name of pvm host file: the program reads your host file
Xto get the number and names of nodes to be used in this run.
X 
XCompute 'slabs' on an MP node: y|Y indicates that pallslabs is to
Xbe run on a multi-processor node, e.g. Cray, Convex, Alliant, etc.
XIf you enter y or Y, the next prompt will ask you to enter the number 
Xof instances of pallslabs to use.
X 
XEnter an assignment factor: to use the chunks routine (in which
Xsome number of nearly equal-sized chunks will be computed) this
Xshould be a small, positive
X integer (1-10 or so).  It is the number of
Xtimes, on average, that each instance of pallslabs will be
Xassigned a chunk of work to do.  The number of chunks is computed
X(in chunks.f) as
X                 iassgn * nslab_procs
X  (where nslab_procs is the number of instances of pallslabs)
XLarger values of iassgn yield larger numbers of chunks and better
Xload balance (up to a point) but also more communication.
XSmaller values of iassgn yield fewer chunks, poorer load balance
Xand less communication.
X 
XA special value of iassgn, 9999, can be used to select an alternate
Xdomain decomposition routine, subroutine chnks.  In this routine,
Xa halving algorithm is used to generate a series of chunks of 
Xdecreasing size.  See the explanatory notes in chnk.f.
X 
XThe amount of memory required by this application depends mainly
Xupon the size of the 3-d input data array (XSIZE*YSIZE*ZSIZE) and
Xupon the resolution of the image desired.  Resolution is
Xdetermined by the number of rays cast through the data field and
Xis specified by NPIX2.  The approximate memory requirement for
Xpvrmain and pallslabs is:
X 
X400K + NPIX2*54 + ((XSIZE+2)*(YSIZE+2)*(ZSIZE+2) )*4 bytes
X 
X(All of these parameters are set in vrparam.incl.)
X 
XOne of the potential advantages of running parallel applications on
Xscalable, distributed memory systems is that one can, by using only as
Xmuch storage as is actually needed on each node, run much larger problems
Xthan on serial systems.  The current implementation of this application
Xdoes not do this.  There are two reasons:
X (1) It doesn't *need* to.  An RS/6000 workstation with 64MB of
X     real memory can, without paging, run the largest, highest-
X     resolution problem of this application; and
X (2) The program is written to be general in the sense of being able
X     to use a range of different numbers of processors and computational
X     chunks - including the serial case.  This means that all arrays
X     are dimensioned to their full size, whether the full size is
X     actually required or not.
X 
XGeneral Volume-Rendering Inputs:
X-------------------------------
XValues of NPIX (set in vrparam.incl) must be integral powers of 2.
XTypical values include:
X    64   - for testing and debugging runs; *very* low resolution
X   128   - low resolution for checking & modifying set-up values
X   256
X   512   - medium-to-high resolution
X  1024   - high resolution for publication quality images
X 
X(Of course, there is little point in generating a very
Xhigh-resolution image based upon a small or sparse data set.  Use
Xsome common sense with this.)
X 
Xviewd - sets the viewing distance - the distance from the observer's
Xeye (default:  2*sqrt( XSIZE**2 + YSIZE**2 + ZSIZE**2 )
X 
Xlookat - the 3-d point in the center of the viewed field.
X(default: (XSIZE/2, YSIZE/2, ZSIZE/2) )
X 
Xview angle & up angle - these determine which of the 6 faces of the
X3-d rectilinear data array faces the viewer and the orientation of
Xthat face.
X(defaults: view angle: vtheta = 0.0 rad.
X                         vphi = 0.0 rad.
X             up angle:  upang = pi rad. )
X 
Xscreen height - the amount of magnification of the view; smaller
Xvalues give larger images.  This is actually different from
Xchanging viewing distance: changes in screen height have little
Xeffect on the distortion of perspective whereas changes in viewing
Xdistance do.
X(default:  max(XSIZE,YSIZE,ZSIZE) )
X 
Xopacity cutoff threshold -
X(default: thresh = 0.0078 )
X 
Xnumber of shells - the number of iso-value shells to be rendered.
X(autoshell default: numshl = 5 )
X 
Xnear & far clipping distances - the points in space for the
Xbeginning (near) and ending (far) of the rays.  Basically , you
Xwill not be able to see anything closer than 'near' or further
Xaway than 'far'.  Naturally, computational requirements are
Xproportional to (far - near).  To render ALL of an object in the
X3-d data field, set near and far in front of and behind the first
Xand last values in the field, respectively.  To render a SLICE of
Xthe object, set near and far close together.
X(defaults: near = -viewd
X            far =  viewd )
X 
Xnear & far fade distances - these set an artificial 'fade' region
Xto simulate atmospheric haze; haze is sometimes a useful visual
Xdepth cue.
X(defaults:  nrfade = -viewd
X            frfade =  viewd )
X 
Xray segment size - sets the thickness of 'slabs' or the length of
Xa section of a ray to be sampled for values.  The default value,
X1, represents one sample per data grid point.  A smoother, higher
Xresolution image could result by oversampling.  To do this, set
Xthe ray segment size to a value less than 1, e.g. 0.5.  The
Xeffect of this is to interpolate between points in the depth
Xplane and to double computing time.
X(default: delta = 1.0 )
X 
Xsave settings - this will prompt you for a file name prefix (the
Xsuffix will be '.VRPARM') and save all current settings in that
Xfile.
X 
Xload settings - this will prompt you for a file name prefix (the
Xsuffix must be '.VRPARM') and load the settings in it.
X 
Xambient & diffuse reflection coefficients - ambient lighting here
Xis a pseudo-light source that originates from all directions.
XWith the diffuse reflection coefficient, you can also specify a
Xpoint source of simulated light.  These coefficients take values
Xin the range 0.0 - 1.0.
X(defaults: ambient = reflct(1) = 0.35
X           diffuse = reflct(2) = 0.90 )
X 
Xspecular reflection coefficient and exponent - the specular
Xcoefficient varies between 0.0 and 1.0; larger values cause
Xbrighter highlights.  The exponent determines the simulated
Xroughness of the surface.  An appropriate range for exponent is
Xabout 20 to 150 where 20 gives large, diffuse highlights and 150
Xgives glassy, sharp highlights.
X(defaults: coeff. = reflct(3) = 0.4
X             exp. = reflct(4) = 50.0 )
X 
Xlight angle - sets the direction of the point light source (in
Xspherical co-ordinates).
X(default: ltheta = pi/4 rad.
X            lphi = 0.0 rad.  )
X 
Xset shell levels and colors - gives a special menu to allow you to set
X              iso-surface values, colors and opacity.
X 
X 
XShell Colors and Opacity:
XOpacity for each iso-value shell to be rendered ranges from 0.0
X(invisible) to 1.0 (solid).
X 
XColors are specified by the "hue, lightness, saturation" scheme:
X"Hue" is specified as an angle around a color wheel where:
X     0 = blue
X   300 = blue-green
X   240 = green
X   180 = yellow
X   120 = red
X 
X"Lightness" refers to intensity and is specified in the range  0
X(black) to 32,767 (white).  Color is most pure with a lightness value
Xof 16,384.  By varying the lightness, you can generate colors such as
Xbrown, pink or light blue.
X 
X"Saturation" refers to spectral purity and is specified in the
Xrange 0 (least color; dull gray) to 32,767 (pure spectral color).
X 
XObject Orientation
XIn the following descriptions of the data object, the X and Y axes
Xare as conventionally defined, i.e. X runs "left and right" in the
Xplane of the screen and Y runs "up and down" in the plane of the
Xscreen.  The Z axis runs "into and out of" the screen.
X 
XUp angle - specifies rotation of the object around the Z axis
XView angle  - actually consists of two angles: angle1, which rotates
X              the object around the Y axis; and angle2, which rotates
X              the object around the Z axis.
X 
XAll rotation angles must be specified in radians.  Here are some
Xuseful degrees -> radians conversions:
X     ----------------- -----------------
X       deg.    rad.      deg.    rad.
X     ----------------- -----------------
X         0   0.00000     180   3.14159
X        30   0.52360     210   3.66519
X        45   0.78540     225   3.92699
X        60   1.04720     240   4.18880
X        90   1.57080     270   4.71240
X       120   2.09440     300   5.23599
X       135   2.35619     315   5.49780
X       150   2.61080     330   5.75960
X     ----------------- -----------------
X 
X 
XUnits of Computation and Notes on Parallelism & Performance
X-----------------------------------------------------------
XDefinitions of some terms used in this README:
X 
Xnode - a computing system having one or more processors but
X      only one network address; we distinguish between
X      uni-processor (UP) and multi-processor (MP) nodes.
X      This application was developed to use a set of
X      UP nodes, e.g. RS/6000, Sun, etc. and/or
X      MP nodes, e.g. IBM 3090, Cray, etc.
Xprocessor - a CPU in a node
Xframe - one complete image.  A given run of this application
X       produces one or more independent frames each of which
X       is the result of computing one or (more typically)
X       many independent chunks.  Each chunk consists of
X       computations involving one or (more typically) many
X       independent rays.
Xray - pretty much what it sounds like.  In ray-casting, rays originate
X     at an "eye point" and pass through the 3-d data space parallel
X     to each other.  They are logically and computationally
X     independent (cf. fancier ray tracing algorithms).  The
X     resolution of iso-value surfaces or shells from the data occurs
X     when many such rays (typically 16K - 1M) pass through a series
X     of thin 'slabs'.  Values of the 3-d data in the region around
X     the resulting ray segments are searched using trilinear interpolation.
Xchunk - a 'package' or task of computation for a contiguous group
X       of rays.  The number of chunks and their boundaries are computed 
X       (the computational domain is decomposed) in one or the other
X       of two subroutines: subroutine chunks() and subroutine chnks().
X       Boundaries are stored in the arrays, ilower() & iupper().
Xslab - a collection of NPIX*chunk size ray segments.  The 'thickness'
X      of the planes orthogonal to the rays as they pass from the
X      eyepoint to the near clipping distance to the far clipping
X      distance.  A ray passes through ( far - near ) slabs of
X      thickness 'delta' each.  Computation of values for a given
X      ray segment in a slab depends only upon values in the
X      previous segment.
X 
XParallelism is done at the level of rays within a frame.  There
Xare NPIX**2 rays per frame.  Rather than compute for NPIX**2 rays
Xin NPIX**2 separate chunks, however, rays are computed in chunks
Xof many rays each.  Two alternate domain decomposition routines
Xare available: chunks and chnks.  The algorithms used in these
Xroutines are described in chunks.f and chnks.f.  Both are very
Xeasy to use and fairly general.  While the domain decomposition
Xis static (requiring user input 'up front' for no.  of processes
Xand the 'assignment factor'), the actual scheduling of
Xcomputational work on the children is dynamic using, as it does,
Xthe pool-of-tasks or queue-picking model.
X 
XAn alternate or additional way to implement the parallelism would
Xbe at the level of frames which are also computationally
Xindependent.  And parallelism at this higher level would be more
Xcoarse-grained, i.e. have less communication and CPU overhead.
XMoreover, this approach would scale well.  But the computational
Xload among frames could vary considerably (depending upon the
Xinput data).  This could lead to poor load balance among
Xcomputational nodes.
X 
XThis implementation of the ray-casting algorithm should scale
Xreasonably well up to a moderate number of processors, say,
X10-15.  Beyond this, a major communication bottleneck would
Xmanifest itself at the point where pvrplot receives stripes of
Xthe array, screen(), computed by pallslabs.  (This is in
Xscrn_strp.rcv.pvm.incl.)  If the program is to be used to generate
X*many* frames, say, more than 20 or so, a major scalability
Xenhancement would be to create multiple instances of subroutine
Xvrrays, each one of which would use its own group of up to 10
Xinstances of pallslabs and single instance of pvrplot.  Such a
Xversion would have two levels of parallelism: concurrent chunks
Xwithin a frame (as in the current implementation) and concurrent
Xframes.
X 
XAnother consideration for scalability is that this program
Xachieves a modest performace increase by overlapped, pipelined
Xexecution of pallslabs and pvrplot when more than one frame is
Xcomputed.  The relative amount of elapsed time needed for
Xmultiple instances of pallslabs to complete their computation and
Xthat needed for the single instance of pvrplot to complete its
Xoutput depends upon the resolution of the image (NPIX) and upon
Xthe dimensions of the output file.  There are two extremes in a
Xcontinuum to consider here:
X (1) low resolution (NPIX = 64 or 128) and large output
X     format (1280 x 1024).  In this case, the elapsed time to
X     compute screen() will be nearly the same (possibly *less*)
X     as that to write the output file.  There would be little
X     point in using more than a very few instances of pallslabs.
X 
X (2) high resolution (NPIX = 512 or 1024) and small output
X     format (320 x 240).  In this case, the time for pallslabs
X     to compute screen() would be 2-3 orders of magnitude larger
X     than that to write the output file.  Here, one might use
X     up to several hundred instances of pallslabs (except that
X     communication overhead and the bottleneck in pvrplot
X     would come into play first).
X 
XThe current implementation of this program emphasises the
Xability to generate a series of frames each one of which is
Xbased upon a different data file as in, e.g. a time series.
XIt would also be useful to have the program simply keep a single
Xdata file and to generate a series of frames representing the
Xviewer's movement around the data object as in, e.g. a rotation.
X 
XThe cluster of RS/6000 machines on which this application was
Xdeveloped and currently runs uses AFS (from Transarc Corp.,
XPittsburgh, PA) for the sharing of filesystems.  This has made
Xdevelopment and use of pvm applications a bit easier.  For
Xexample, the sharing of filesystems has reduced the amount of
Xcopying of files (via ftp or other mechanisms) back and forth.
XOne such example of this is found in the script used to copy
Xexecutable pvm components into the appropriate directory and to
Xstart up the daemons.  A suitably general version of the script
Xlooks like this:
X 
X# pvm daemons running on each node see *the same* files:
Xcp ~myhome/path_to_vrend_stuff/pallslabs   ~myhome/pvm/RIOS/.
Xcp ~myhome/path_to_vrend_stuff/pvrplot     ~myhome/pvm/RIOS/.
X 
X# ... but /tmp is always a *local* filesystem, i.e. not shared
X/tmp/pvm/pvmd -i  ~myhome/pvm/hosts/hosts5
X 
X 
XImplementation Notes
X--------------------
X 
XForking children (see frk.pvm.incl):
X-----------------------------------
Xpvm provides three ways of binding ('initiating') instances of
Xexecutable modules to nodes.  This program uses the finitiatem()
Xroutine to bind an instance of a module on a *particular* node
Xspecified by its network address.
X 
XNodes and their network addresses are learned during execution by
Xa call to subroutine gethosts which prompts for and reads the pvm
Xhost file being used.  The program next prompts the user to
Xspecify whether the component, pallslabs, is to be run on a
Xuni-processor node or a multi-processor node.  If MP, it then
Xprompts for the number of instances of pallslabs to initiate.
XHaving read the host file, the program knows the number of nodes
Xspecified.
X 
XThere are a few modest constraints on the structure of the host
Xfile beyond the existing one that the 'master' node, i.e. the
Xone on which the parent code and its pvmd are running, be the
Xfirst record in the host file.  These additional constraints
Xspecify a convention for which nodes are to run which components
Xof the application.  These are that:
X 
XEach node's name must fit into columns 1-35 of the line it occupys.
X 
XThe first node name in the host file *always* runs a single
Xinstance of pvrmain.
X 
XIf the node to run pallslabs is an MP node:
X 
X... and there is only 1 node in the host file,
Xa single instance of pvrplot will be initiated on the node and
Xthe user-specified number of instances of pallslabs will be
Xinitiated on the node.
X 
X... and there are 2 nodes in the host file,
Xthe user is prompted to specify whether the single instance of
Xthe plotting component, pvrplot, is to be run on the "local" node
X(i.e. node 1) or the remote node (node 2).  That is, the user
Xmust decide whether to let pvrplot and pallslabs run on the same
Xnode or different nodes.
X 
X... and there are more than 2 nodes in the host file,
Xa single instance of pvrplot will
Xbe initiated on node 2 and one or more instances of pallslabs
X(depending upon the number previously specified by the user)
Xwill be initiated on node 3.  Any nodes in the host file beyond
Xnumber 3 will be unused.
X 
XIf the node to run pallslabs is a UP node:
X 
X... and there is only 1 node in the host file,
Xa single instance of pvrplot will be initiated on the node and
Xa single instance of pallslabs will be initiated on the node.
X(N.B.: This is the serial case but still passes messages and,
Xtherefore, incurs the basal degree of overhead for this application.)
X 
X... and there are 2 nodes in the host file,
Xa single instance of pvrplot will be initiated on node 2 and
Xa single instance of pallslabs will be initiated on node 2.
X 
X... and there are more than 2 nodes in the host file,
Xa single instance of pvrplot will be initiated on node 2 and
Xa single instance of pallslabs will be initiated on each of
Xthe remaining nodes in the host file.
X 
X 
XThe logic for all this is found in frk.pvm.incl.  This
Xsemi-automatic approach to binding processes to nodes and
Xprocessors gives the user a good deal of control over where
Xcomponents run - possibly at the expense of a small loss in
Xrun-time flexibility as might be obtained by use of the
Xfinitiate() routine.
X 
XA couple of coding tricks and conventions used include:
X** variable & array names are in lower case; PARAMETERs are in
Xupper case and generally found in .incl files.
X 
X** Use of d|D in column 1 for debugging (most modern FORTRAN
Xcompilers, e.g. xl FORTRAN, have this feature; IBM's VS FORTRAN
X2.5 does not, however).  This is a very handy way to
Xenable/disable debugging via WRITE() statements.  Accordingly,
Xthe source code for this application is rather heavily
Xinstrumented with WRITE(6,*) statements for tracing.  Note also
Xthat the testing of return codes from pvm calls is done in this
X'debug enabled' mode only.  If you have philosophical qualms
Xabout this, change the 'd' in these statements to a space and
Xloose a little performance.
X 
X** Version 2.4.1 of pvm (and probably future ones) automatically
Xdetects whether the sender and receiver of a message are of the
Xsame or different "architecture".  If the same, messages
Xcontaining floating-point numbers in which putnfloat/getnfloat
Xare sent as-is.  If different, these numbers are converted to an
Xintermediate representation by XDR and unconverted at the
Xreceiving end.  This is to ensure that machines architectures
Xhaving differing floating-point formats, e.g. IEEE, System/370,
XCray, interpret and use floating-point messages correctly.
XThere is a small but noticeable performace penalty here.  This is
Xrather easily gotten around by using putbytes/getbytes instead of
Xputnfloat/getnfloat.
X 
X** extensive use of include files - especially for calls to pvm
Xlibrary routines; this should make porting to other parallel
Xenvironments, e.g. Express, somewhat easier.
X 
SHAR_EOF
chmod 0666 README || echo "restore of README fails"
sed 's/^X//' << 'SHAR_EOF' > Makefile &&
X# Makefile for pvm version (2.4.1) of CNSF's 3-d volume-renderer
X# for RS/6000.  - Hugh Caffey;  12 August '92
X
XCC       = cc
XCFLAGS   =    -c -DRIOS
XFC       = xlf
XFFLAGS   = -O -D
X# libpvm.a contains the C pvm routines
X# libf2c.a contains FORTRAN "wrapper" routines for the C pvm routines
XFLIBS    = -L/usr/local/lib -lf2c -lpvm
X 
X# You may want to set up your own directories for "timers" and
X# "utils".  Here, I have defined these macros to be the working
X# directory.
XTIMDIR   = .
XUTLDIR   = .
XCIODIR   = .
X 
XTIMOBJS  = $(TIMDIR)/getrtc.o $(TIMDIR)/cput.o $(TIMDIR)/wallt.o
XUTLOBJS  = $(UTLDIR)/chunks.o $(UTLDIR)/gethosts.o $(UTLDIR)/int2char.o \
X           $(UTLDIR)/chnks.o
X 
XCIOOBJS  = $(CIODIR)/copn.o $(CIODIR)/ccls.o $(CIODIR)/rleout.o
X
XMAINOBJS = dinit.o fixit.o hlsrgb.o pvrmain.o rgbval.o setshl.o \
Xin3dfmt.o startit.o varld.o varsve.o read3d.o vrrays.o
X
XPALLOBJS = pallslabs.o plndst.o
X 
XPVRPLOBJS = pvrplot.o fixit.o encode.o
X
Xall:		pvrmain pallslabs pvrplot
X
Xpvrmain:	$(MAINOBJS) $(TIMOBJS) $(UTLOBJS)
X	$(FC) -o $(@) $(FFLAGS) $(MAINOBJS) $(TIMOBJS) $(UTLOBJS) $(FLIBS)
X
Xpallslabs:	$(PALLOBJS) $(TIMOBJS)
X	$(FC) -o $(@) $(FFLAGS) $(PALLOBJS) $(TIMOBJS) $(FLIBS)
X
Xpvrplot:	$(PVRPLOBJS) $(TIMOBJS) $(UTLOBJS) $(CIOOBJS)
X	$(FC) -o $(@) $(FFLAGS) $(PVRPLOBJS) $(TIMOBJS) $(UTLOBJS) $(CIOOBJS) $(FLIBS)
X
Xclean:
X	rm -f $(PALLOBJS) $(MAINOBJS) $(PVRPLOBJS) core pvrmain pallslabs pvrplot *.lst
SHAR_EOF
chmod 0666 Makefile || echo "restore of Makefile fails"
sed 's/^X//' << 'SHAR_EOF' > getrtc.s &&
X#
X#        TITLE 'Read the Real-Time CLock,  Upper and Lower'
X# This assembly language routine is from an anonymous source at
X# working for a Major Vendor of computer systems.  It is specfic to
X# the RISC System/6000 line of workstations from IBM.
X#
X# It is not supported by the Major Vendor, nor by the CNSF.
X#
X	.csect .getrtc[PR]
X	.globl .getrtc[PR]
Xloop:    mfspr 4,4           # Get the upper half of the real time clock
X         mfspr 5,5           # Get the lower half of the real time clock
X         mfspr 0,4           # Get the upper half of the real time clock
X         cmp   0,0,4         # Check rtcu.
X         bc    4,2,loop      # Try again if not equal.
X         stsi  4,3,8         # Save the clock halves in caller's area.
X         br                  # Return.
SHAR_EOF
chmod 0666 getrtc.s || echo "restore of getrtc.s fails"
exit 0
