tpetsc.md - pism - [fork] customized build of PISM, the parallel ice sheet model (tillflux branch)
 (HTM) git clone git://src.adamsgaard.dk/pism
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) LICENSE
       ---
       tpetsc.md (13095B)
       ---
            1 PETSc: An overview for PISM users {#petscID}
            2 =====
            3 
            4 ## Preamble
            5 
            6 In v0.5, because of abstractions especially in IceModelVec and its derived
            7 classes, programmers building derived classes of IceModel now need minimal
            8 knowledge of PETSc and especially PETSc DMs.  What *is* important to know is
            9 that solving the SSA in parallel is relatively deep technology.  Very
           10 specifically, because effective parallel iterative linear algebra requires
           11 preconditioning, *the solution of the SSA stress balance depends on the number
           12 of processors*.  This will also be true of high resolution parallel solution
           13 of all other membrane-including stress balances (Blatter, Stokes).
           14 
           15 ## Introduction
           16 
           17 The PETSc library [@ref petsc-user-ref] provides essential support for distributed 
           18 arrays and linear solvers in a parallel computing environment.  "PETSc" stands for 
           19 "Portable, Extensible Toolkit for Scientific Computation."  It is a suite of data structures 
           20 and routines in C for the scalable parallel solution of partial differential equations and their 
           21 scientific applications.  Large parts of PETSc relate especially to finite-difference-type regular, 
           22 rectangular grids but PETSc has been used for unstructured grids as well.
           23 
           24 PETSc employs the MPI standard for all message-passing communication but it is deliberately at a 
           25 higher level of abstraction than MPI.  We find that using PETSc protects the programmer from 
           26 explicit consideration of message passing about 90% of the time.
           27 
           28 Documentation for PETSc is at 
           29 
           30 http://www.mcs.anl.gov/petsc/.
           31 
           32 PISM is a C++ program and PETSc is a C library, but all PISM calls to PETSc use
           33 the PETSc C API, not the C++ API.
           34 
           35 
           36 ### PETSc types
           37 
           38 Most important variables deep inside PISM are of these PETSc types:
           39   - `DM`
           40   - `Vec`
           41   - `Mat`
           42   - `KSP`
           43 
           44 In fact most of the PETSc types merely declare pointers, as these are
           45 abstract data types.  Variables must be created with calls to functions like `DMDACreate2d()`, 
           46 `VecCreate()`, etc., and destroyed when no longer needed.  These concepts are
           47 largely buried inside IceGrid (esp. DM type) and IceModelVec (esp. Vec type).
           48 But Mat and KSP types are exposed in the IceModel code which solves the SSA.
           49 
           50 
           51 ### Distributed arrays and vectors
           52 
           53 PETSc has an abstract date type called a Distributed Array, which is a kind of
           54 a Distributed Mesh (type `DM`). `DMs` contain information about the grid and
           55 stencil. They set up *ghosted* values which are intended to duplicate the
           56 values on neighboring processors so communication can be done in big batches.
           57 
           58 Vectors (type `Vec`) are created *using* a `DM` with `DMCreateLocalVector()` and similar 
           59 procedures.  These vectors are distributed across the processors according to the information
           60 in the `DM`.  Just for emphasis: *PISM distributes itself across the processors by calling
           61 PETSc procedures which create a `DM`*.
           62 
           63 There are two parameters of note when creating a `DMDA`: *stencil type* and 
           64 *stencil width.*  
           65 
           66 The stencil types are `DMDA_STENCIL_STAR` and `DMDA_STENCIL_BOX`.  
           67 They are generalizations of the five point and nine point stencils typical of two-dimensional 
           68 discretizations of second order partial derivatives.  Using `DMDA_STENCIL_STAR` means
           69 that ghosted points are needed only along the coordinate axes, while `DMDA_STENCIL_BOX` 
           70 indicates that ghosted points are needed in the box-shaped %region surrounding each point, 
           71 and thus each processor.  (On processor *N,* information owned by a neighboring processor 
           72 but duplicated onto *N* is called *ghosted.*  A picture is worth a thousand words here, so 
           73 find the appropriate picture in the PETSc manual!)  We only need `BOX` style stencils when gradient 
           74 terms must be evaluated on a staggered grid (*h* in SIA regions and @f$\bar{u},\bar{v}@f$
           75 in the computation of effective viscosity in SSA regions).  Keeping all other 
           76 two-dimensional vectors on a `STAR` type stencil would reduce the necessary communication
           77 slightly, but for now all two dimensional arrays are based on `BOX`.
           78 
           79 The stencil width indicates how many points in each direction will be needed.  We never 
           80 need a stencil width greater than 1.
           81 
           82 The three dimensional distributed arrays are distributed across processors in the same way
           83 as two-dimensional arrays, in the sense that *each column of the ice (or bedrock in that 
           84 thermal model) is owned by exactly one processor*.  From the point of view of parallelization,
           85 our problem is two-dimensional.  All three-dimensional arrays have stencil type `STAR` in the 
           86 horizontal directions.
           87 
           88 One point of confusion (we admit ...) is the redefinition of the @f$x,y,z@f$ axes. Contrary 
           89 to the PETSc default, our @f$z@f$ axis changes most rapidly through memory while the
           90 @f$x@f$ axis changes most slowly.  The desirable consequence, however, is that our C
           91 arrays are addressed as `u[i][j][k]` where `i,j,k` are the coordinate 
           92 indices in the directions @f$x,y,z@f$ respectively.
           93 
           94 
           95 ### Accessing the processor's part of a DM-managed Vec
           96 
           97 `DM` -based vectors must be accessed by a call to `DMDAVecGetArray()` before the access and
           98 a call to `DMDAVecRestoreArray()` after.  This just means that one gets a valid pointer 
           99 to the actual memory, for the abstract data type `Vec`.  The point has type 
          100 `double**` for a 2-dimensional array and type `double***` for a 3-dimensional 
          101 array.
          102 
          103 The resulting pointer can be addressed using normal (to the extent that C is "normal" in 
          104 this regard ...) multidimensional array indexing.  Furthermore, the indices themselves 
          105 allow one to pretend that the given processor is addressing the entire array, even though only a
          106 chunk is stored on the local processor.  *No message passing occurs here.*  Instead, the crucial
          107 idea is that the `DM` knows what are the valid index ranges for each processor.  That's why 
          108 essentially all loops over two dimensional arrays in PISM look like this:
          109 
          110 
          111       for (PetscInt i = grid.xs; i < grid.xs + grid.xm; ++i) {
          112         for (PetscInt j = grid.ys; j < grid.ys + grid.ym; ++j) {
          113         ... vH(i,j) ...
          114         }
          115       }
          116 
          117 
          118 The integers `grid.xs`, `grid.xm`, `grid.ys`, `grid.ym`, are just the numbers that come
          119 from a call to `DMDAGetLocalInfo()`, just after the `DM` is created, but normally
          120 a programmer just calls IceGrid.
          121 
          122 
          123 ### "Local" versus "global" Vecs
          124 
          125 PETSc `DM` -based vectors can be "local" or "global".   (This weird PETSc language should
          126 probably be translated as "has ghosts" or "sans ghosts".)  "Local" vectors include 
          127 space for the ghosted points.  That is, when `DAVecGetArray()` is called, the resulting 
          128 array can be indexed on all of the local portion owned by the processor, *plus* the 
          129 ghosted points.  The processor owns *copies* of the values at grid points owned by the neighboring
          130 processors, according to the stencil type and stencil width in the `DA`.  The processor should 
          131 treat these ghost values as "read-only", because of what happens at the communication stage 
          132 (which we describe in a moment).
          133 
          134 *Example.*  We create a local `Vec` `v` which we want to have `STAR` type stencil.  Assuming 
          135 `da` is `DM` created with `DMDA_STENCIL_STAR` then we do this to create `v` and get an array 
          136 pointer `arr` into the local portion of it:
          137 
          138     DMDACreateLocalVector(da, &v);
          139     DMDAVecGetArray(da, v, &arr);
          140 
          141 Now the processor can access `arr[i][j]` for every 
          142 
          143     grid.xs <= i <= grid.xs + grid.xm - 1
          144 
          145 and
          146 
          147     grid.ys <= j <= grid.ys + grid.ym - 1
          148 
          149 But this is just the regular (owned by the processor) portion.  In addition, all of these 
          150 are valid for every `i` and `j` in the above ranges:
          151 
          152     arr[i+1][j],   arr[i-1][j],   arr[i][j+1],   arr[i][j-1].
          153 
          154 (*If* the `da` were of type `DMDA_STENCIL_BOX` then, in addition 
          155 
          156     arr[i+1][j+1],   arr[i-1][j+1],   arr[i+1][j-1],   arr[i-1][j-1].
          157 
          158 would all be valid.)  Once the processor is done with modifying its portion of `v` (i.e. not including 
          159 the ghosts, although it may read them) then we would want to update the ghost values so that
          160 all processors agree on the values at those grid points:
          161 
          162     DMDAVecRestoreArray(da, v, &arr);
          163     DMDALocalToLocalBegin(da, v, INSERT_VALUES, v);
          164     DMDALocalToLocalEnd(da, v, INSERT_VALUES, v);
          165 
          166 (Note we release the array pointer before the communication stage, although that may not be
          167 essential.)
          168 
          169 `DMDALocalToLocalBegin()` and then `DMDALocalToLocalEnd()` are called to update (communicate) the 
          170 ghosted values before the next stage at which they will be needed.  This pair of commands 
          171 perform a stage of actual (MPI) message passing.  Only enough values are passed around to 
          172 update the ghosts.  The size of the messages is relatively small, and *latency is more important
          173 (in slowing performance) than bandwidth*.  The significance of `STAR` versus `BOX` is not
          174 in the size of the extra memory, but in the extra messages which must be passed to the "diagonal"
          175 neighboring processors.
          176 
          177 Global vectors do not hold ghosted values.  Certain array operations are more appropriate to
          178 these kind of vectors, like viewing arrays in graphics windows.  (Of course, viewing an array 
          179 distributed across all the processors requires message passing.  I think PETSc sends all the
          180 values to processor 0 to display then.)  In any case, when ghosts are not needed there is
          181 no need to allocate space for them, and a "global" `Vec` is appropriate.  Local vectors
          182 typically need to be mapped to global vectors before viewing or using in a linear system, but
          183 this is an entirely "local" operation which does not require message passing (`DMLocalToGlobalBegin/End()`).
          184 
          185 At risk of repetition, most of the above distinction between "local" and "global"
          186 is buried in IceModelVec abstraction.
          187 
          188 ### Solving linear systems
          189 
          190 PETSc is designed for solving large, sparse systems in parallel.  Iterative methods are 
          191 the name of the game and especially Krylov subspace methods such as conjugate gradients and 
          192 GMRES [@ref TrefethenBau, @ref Saad].  For consistency, all methods use the same 
          193 Krylov-subspace-method-with-preconditioner `KSP` interface, even though some are really direct 
          194 methods.
          195 
          196 The place in PISM where this is already used is in the solution of the linearized SSA velocity
          197 problem.  See the documentation for SSAFD and SSAFEM in this *Manual.*
          198 
          199 One starts by declaring an object of type `KSP`; in PISM this is done in IceModel::createVecs().
          200 Various options can be set by the program, but, if the program calls `KSPSetFromOptions()` 
          201 before any linear systems are solved, which PISM does, then user options like 
          202 `-ksp_type`, `-pc_type`, and `-ksp_rtol` can be read to control (respectively) 
          203 which Krylov method is used, which preconditioner is used, and what relative 
          204 tolerance will be used as the convergence criterion of the Krylov solver.  The default is
          205 GMRES(30) with ILU preconditioning.
          206 
          207 (As a general mechanism, PETSc has an options database which holds command line options.
          208 All PISM options use this database.  See IceModel::setFromOptions().)
          209 
          210 To solve the system, a matrix must be attached to the `KSP`.  The first time
          211 `KSPSolve()` is called, the matrix will be factored by the preconditioner and reused
          212 when the system is called for additional right hand sides.  In the case of PISM and the 
          213 solution of the balance of momentum equations for the SSA, the matrix changes because the
          214 equations are nonlinear (because the effective viscosity depends on the strain rates).
          215 
          216 The default matrix format is similar to the *Matlab* `sparse` format.  Each processor
          217 owns a range of rows.  Elements in matrices and vectors can be set using `MatSetValues()`
          218 and `VecSetValues()`.  These routines use a "global" indexing which is not based on the regular
          219 grid and stencil choices in the `DA`.  One can set any value on any processor but a lot of
          220 message passing has to occur; fortunately PETSc manages that all.  Values are
          221 cached until one calls `MatAssemblyBegin()` followed by `MatAssemblyEnd()` to
          222 communicate the values.
          223 
          224 ### Additional PETSc utility functions
          225 
          226 Quite ellaborate error tracing and performance monitoring is possible with PETSc.  All
          227 functions return `PetscErrorCode` which should be checked by the macro `CHKERRQ()`.
          228 Therefore runtime errors normally print sufficient traceback information.
          229 If this information is not present (because the error did not get traced back), you 
          230 may need to use a debugger which is accessible with the command line (PETSc) options 
          231 `-start_in_debugger` and `-on_error_attach_debugger`.  Option `-log_summary`
          232 is useful to get some diagnostics written to the terminal.
          233 
          234 The `PetscViewer` interface allows PETSc objects to be displayed. One
          235 can "view" a `Vec` to a graphical (X) window, but the "view" can be
          236 saving the `Vec` to a binary file on disk, in plain text to the
          237 terminal (standard out), or even by sending to a running instance of
          238 *Matlab.* In PISM the X views are available using the `-view`
          239 option. See the "User's Manual". When viewing `Vec` s graphically in
          240 multiprocessor jobs, the display may have to be set on the command
          241 line, for instance as `-display :0` or similar; this must be given as
          242 the final option. For example,
          243 
          244     mpiexec -n 2 pisms -view thk -display :0
          245 
          246 allows a two processor run to view the ice thickness.