tpetsc.md - pism - [fork] customized build of PISM, the parallel ice sheet model (tillflux branch)
(HTM) git clone git://src.adamsgaard.dk/pism
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) LICENSE
---
tpetsc.md (13095B)
---
1 PETSc: An overview for PISM users {#petscID}
2 =====
3
4 ## Preamble
5
6 In v0.5, because of abstractions especially in IceModelVec and its derived
7 classes, programmers building derived classes of IceModel now need minimal
8 knowledge of PETSc and especially PETSc DMs. What *is* important to know is
9 that solving the SSA in parallel is relatively deep technology. Very
10 specifically, because effective parallel iterative linear algebra requires
11 preconditioning, *the solution of the SSA stress balance depends on the number
12 of processors*. This will also be true of high resolution parallel solution
13 of all other membrane-including stress balances (Blatter, Stokes).
14
15 ## Introduction
16
17 The PETSc library [@ref petsc-user-ref] provides essential support for distributed
18 arrays and linear solvers in a parallel computing environment. "PETSc" stands for
19 "Portable, Extensible Toolkit for Scientific Computation." It is a suite of data structures
20 and routines in C for the scalable parallel solution of partial differential equations and their
21 scientific applications. Large parts of PETSc relate especially to finite-difference-type regular,
22 rectangular grids but PETSc has been used for unstructured grids as well.
23
24 PETSc employs the MPI standard for all message-passing communication but it is deliberately at a
25 higher level of abstraction than MPI. We find that using PETSc protects the programmer from
26 explicit consideration of message passing about 90% of the time.
27
28 Documentation for PETSc is at
29
30 http://www.mcs.anl.gov/petsc/.
31
32 PISM is a C++ program and PETSc is a C library, but all PISM calls to PETSc use
33 the PETSc C API, not the C++ API.
34
35
36 ### PETSc types
37
38 Most important variables deep inside PISM are of these PETSc types:
39 - `DM`
40 - `Vec`
41 - `Mat`
42 - `KSP`
43
44 In fact most of the PETSc types merely declare pointers, as these are
45 abstract data types. Variables must be created with calls to functions like `DMDACreate2d()`,
46 `VecCreate()`, etc., and destroyed when no longer needed. These concepts are
47 largely buried inside IceGrid (esp. DM type) and IceModelVec (esp. Vec type).
48 But Mat and KSP types are exposed in the IceModel code which solves the SSA.
49
50
51 ### Distributed arrays and vectors
52
53 PETSc has an abstract date type called a Distributed Array, which is a kind of
54 a Distributed Mesh (type `DM`). `DMs` contain information about the grid and
55 stencil. They set up *ghosted* values which are intended to duplicate the
56 values on neighboring processors so communication can be done in big batches.
57
58 Vectors (type `Vec`) are created *using* a `DM` with `DMCreateLocalVector()` and similar
59 procedures. These vectors are distributed across the processors according to the information
60 in the `DM`. Just for emphasis: *PISM distributes itself across the processors by calling
61 PETSc procedures which create a `DM`*.
62
63 There are two parameters of note when creating a `DMDA`: *stencil type* and
64 *stencil width.*
65
66 The stencil types are `DMDA_STENCIL_STAR` and `DMDA_STENCIL_BOX`.
67 They are generalizations of the five point and nine point stencils typical of two-dimensional
68 discretizations of second order partial derivatives. Using `DMDA_STENCIL_STAR` means
69 that ghosted points are needed only along the coordinate axes, while `DMDA_STENCIL_BOX`
70 indicates that ghosted points are needed in the box-shaped %region surrounding each point,
71 and thus each processor. (On processor *N,* information owned by a neighboring processor
72 but duplicated onto *N* is called *ghosted.* A picture is worth a thousand words here, so
73 find the appropriate picture in the PETSc manual!) We only need `BOX` style stencils when gradient
74 terms must be evaluated on a staggered grid (*h* in SIA regions and @f$\bar{u},\bar{v}@f$
75 in the computation of effective viscosity in SSA regions). Keeping all other
76 two-dimensional vectors on a `STAR` type stencil would reduce the necessary communication
77 slightly, but for now all two dimensional arrays are based on `BOX`.
78
79 The stencil width indicates how many points in each direction will be needed. We never
80 need a stencil width greater than 1.
81
82 The three dimensional distributed arrays are distributed across processors in the same way
83 as two-dimensional arrays, in the sense that *each column of the ice (or bedrock in that
84 thermal model) is owned by exactly one processor*. From the point of view of parallelization,
85 our problem is two-dimensional. All three-dimensional arrays have stencil type `STAR` in the
86 horizontal directions.
87
88 One point of confusion (we admit ...) is the redefinition of the @f$x,y,z@f$ axes. Contrary
89 to the PETSc default, our @f$z@f$ axis changes most rapidly through memory while the
90 @f$x@f$ axis changes most slowly. The desirable consequence, however, is that our C
91 arrays are addressed as `u[i][j][k]` where `i,j,k` are the coordinate
92 indices in the directions @f$x,y,z@f$ respectively.
93
94
95 ### Accessing the processor's part of a DM-managed Vec
96
97 `DM` -based vectors must be accessed by a call to `DMDAVecGetArray()` before the access and
98 a call to `DMDAVecRestoreArray()` after. This just means that one gets a valid pointer
99 to the actual memory, for the abstract data type `Vec`. The point has type
100 `double**` for a 2-dimensional array and type `double***` for a 3-dimensional
101 array.
102
103 The resulting pointer can be addressed using normal (to the extent that C is "normal" in
104 this regard ...) multidimensional array indexing. Furthermore, the indices themselves
105 allow one to pretend that the given processor is addressing the entire array, even though only a
106 chunk is stored on the local processor. *No message passing occurs here.* Instead, the crucial
107 idea is that the `DM` knows what are the valid index ranges for each processor. That's why
108 essentially all loops over two dimensional arrays in PISM look like this:
109
110
111 for (PetscInt i = grid.xs; i < grid.xs + grid.xm; ++i) {
112 for (PetscInt j = grid.ys; j < grid.ys + grid.ym; ++j) {
113 ... vH(i,j) ...
114 }
115 }
116
117
118 The integers `grid.xs`, `grid.xm`, `grid.ys`, `grid.ym`, are just the numbers that come
119 from a call to `DMDAGetLocalInfo()`, just after the `DM` is created, but normally
120 a programmer just calls IceGrid.
121
122
123 ### "Local" versus "global" Vecs
124
125 PETSc `DM` -based vectors can be "local" or "global". (This weird PETSc language should
126 probably be translated as "has ghosts" or "sans ghosts".) "Local" vectors include
127 space for the ghosted points. That is, when `DAVecGetArray()` is called, the resulting
128 array can be indexed on all of the local portion owned by the processor, *plus* the
129 ghosted points. The processor owns *copies* of the values at grid points owned by the neighboring
130 processors, according to the stencil type and stencil width in the `DA`. The processor should
131 treat these ghost values as "read-only", because of what happens at the communication stage
132 (which we describe in a moment).
133
134 *Example.* We create a local `Vec` `v` which we want to have `STAR` type stencil. Assuming
135 `da` is `DM` created with `DMDA_STENCIL_STAR` then we do this to create `v` and get an array
136 pointer `arr` into the local portion of it:
137
138 DMDACreateLocalVector(da, &v);
139 DMDAVecGetArray(da, v, &arr);
140
141 Now the processor can access `arr[i][j]` for every
142
143 grid.xs <= i <= grid.xs + grid.xm - 1
144
145 and
146
147 grid.ys <= j <= grid.ys + grid.ym - 1
148
149 But this is just the regular (owned by the processor) portion. In addition, all of these
150 are valid for every `i` and `j` in the above ranges:
151
152 arr[i+1][j], arr[i-1][j], arr[i][j+1], arr[i][j-1].
153
154 (*If* the `da` were of type `DMDA_STENCIL_BOX` then, in addition
155
156 arr[i+1][j+1], arr[i-1][j+1], arr[i+1][j-1], arr[i-1][j-1].
157
158 would all be valid.) Once the processor is done with modifying its portion of `v` (i.e. not including
159 the ghosts, although it may read them) then we would want to update the ghost values so that
160 all processors agree on the values at those grid points:
161
162 DMDAVecRestoreArray(da, v, &arr);
163 DMDALocalToLocalBegin(da, v, INSERT_VALUES, v);
164 DMDALocalToLocalEnd(da, v, INSERT_VALUES, v);
165
166 (Note we release the array pointer before the communication stage, although that may not be
167 essential.)
168
169 `DMDALocalToLocalBegin()` and then `DMDALocalToLocalEnd()` are called to update (communicate) the
170 ghosted values before the next stage at which they will be needed. This pair of commands
171 perform a stage of actual (MPI) message passing. Only enough values are passed around to
172 update the ghosts. The size of the messages is relatively small, and *latency is more important
173 (in slowing performance) than bandwidth*. The significance of `STAR` versus `BOX` is not
174 in the size of the extra memory, but in the extra messages which must be passed to the "diagonal"
175 neighboring processors.
176
177 Global vectors do not hold ghosted values. Certain array operations are more appropriate to
178 these kind of vectors, like viewing arrays in graphics windows. (Of course, viewing an array
179 distributed across all the processors requires message passing. I think PETSc sends all the
180 values to processor 0 to display then.) In any case, when ghosts are not needed there is
181 no need to allocate space for them, and a "global" `Vec` is appropriate. Local vectors
182 typically need to be mapped to global vectors before viewing or using in a linear system, but
183 this is an entirely "local" operation which does not require message passing (`DMLocalToGlobalBegin/End()`).
184
185 At risk of repetition, most of the above distinction between "local" and "global"
186 is buried in IceModelVec abstraction.
187
188 ### Solving linear systems
189
190 PETSc is designed for solving large, sparse systems in parallel. Iterative methods are
191 the name of the game and especially Krylov subspace methods such as conjugate gradients and
192 GMRES [@ref TrefethenBau, @ref Saad]. For consistency, all methods use the same
193 Krylov-subspace-method-with-preconditioner `KSP` interface, even though some are really direct
194 methods.
195
196 The place in PISM where this is already used is in the solution of the linearized SSA velocity
197 problem. See the documentation for SSAFD and SSAFEM in this *Manual.*
198
199 One starts by declaring an object of type `KSP`; in PISM this is done in IceModel::createVecs().
200 Various options can be set by the program, but, if the program calls `KSPSetFromOptions()`
201 before any linear systems are solved, which PISM does, then user options like
202 `-ksp_type`, `-pc_type`, and `-ksp_rtol` can be read to control (respectively)
203 which Krylov method is used, which preconditioner is used, and what relative
204 tolerance will be used as the convergence criterion of the Krylov solver. The default is
205 GMRES(30) with ILU preconditioning.
206
207 (As a general mechanism, PETSc has an options database which holds command line options.
208 All PISM options use this database. See IceModel::setFromOptions().)
209
210 To solve the system, a matrix must be attached to the `KSP`. The first time
211 `KSPSolve()` is called, the matrix will be factored by the preconditioner and reused
212 when the system is called for additional right hand sides. In the case of PISM and the
213 solution of the balance of momentum equations for the SSA, the matrix changes because the
214 equations are nonlinear (because the effective viscosity depends on the strain rates).
215
216 The default matrix format is similar to the *Matlab* `sparse` format. Each processor
217 owns a range of rows. Elements in matrices and vectors can be set using `MatSetValues()`
218 and `VecSetValues()`. These routines use a "global" indexing which is not based on the regular
219 grid and stencil choices in the `DA`. One can set any value on any processor but a lot of
220 message passing has to occur; fortunately PETSc manages that all. Values are
221 cached until one calls `MatAssemblyBegin()` followed by `MatAssemblyEnd()` to
222 communicate the values.
223
224 ### Additional PETSc utility functions
225
226 Quite ellaborate error tracing and performance monitoring is possible with PETSc. All
227 functions return `PetscErrorCode` which should be checked by the macro `CHKERRQ()`.
228 Therefore runtime errors normally print sufficient traceback information.
229 If this information is not present (because the error did not get traced back), you
230 may need to use a debugger which is accessible with the command line (PETSc) options
231 `-start_in_debugger` and `-on_error_attach_debugger`. Option `-log_summary`
232 is useful to get some diagnostics written to the terminal.
233
234 The `PetscViewer` interface allows PETSc objects to be displayed. One
235 can "view" a `Vec` to a graphical (X) window, but the "view" can be
236 saving the `Vec` to a binary file on disk, in plain text to the
237 terminal (standard out), or even by sending to a running instance of
238 *Matlab.* In PISM the X views are available using the `-view`
239 option. See the "User's Manual". When viewing `Vec` s graphically in
240 multiprocessor jobs, the display may have to be set on the command
241 line, for instance as `-display :0` or similar; this must be given as
242 the final option. For example,
243
244 mpiexec -n 2 pisms -view thk -display :0
245
246 allows a two processor run to view the ice thickness.