tinput-output.rst - pism - [fork] customized build of PISM, the parallel ice sheet model (tillflux branch)
(HTM) git clone git://src.adamsgaard.dk/pism
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) LICENSE
---
tinput-output.rst (7962B)
---
1 .. include:: ../../global.txt
2
3 .. _sec-input-output:
4
5 Input and output
6 ----------------
7
8 PISM is a program that reads NetCDF files and then outputs NetCDF files. Table
9 :numref:`tab-input-output-options` summarizes command-line options controlling the most
10 basic ways to input and output NetCDF files when starting and ending PISM runs.
11
12 .. list-table:: Basic NetCDF input and output options
13 :name: tab-input-output-options
14 :header-rows: 1
15 :widths: 1,2
16
17 * - Option
18 - Description
19
20 * - :opt:`-i`
21 - Chooses a PISM output file (NetCDF format) to initialize or restart from. See
22 section :ref:`sec-initboot`.
23
24 * - :opt:`-bootstrap`
25 - Bootstrap from the file set using :opt:`-i` using heuristics to "fill in" missing
26 fields. See section :ref:`sec-initboot`.
27
28 * - :opt:`-ssa_read_initial_guess false`
29 - Turns off reading the ``ubar_ssa`` and ``vbar_ssa`` velocities saved by a previous
30 run using the ``ssa`` or ``ssa+sia`` stress balance (see section
31 :ref:`sec-stressbalance`).
32
33 * - :opt:`-o`
34 - Chooses the output file name. Default name is ``unnamed.nc``.
35
36 * - :opt:`-o_size` ``size_keyword``
37 - Chooses the size of the output file to produce. Possible sizes are
38
39 - ``none`` (*no* output file at all),
40 - ``small`` (only variables necessary to restart PISM),
41 - ``medium`` (the default, includes diagnostic quantities listed in the
42 configuration parameter :config:`output.sizes.medium`, if they are available in
43 the current PISM setup),
44 - ``big_2d`` (same as ``medium``, plus variables listed in
45 :config:`output.sizes.big_2d`), and
46 - ``big`` (same as ``big_2d``, plus variables listed in
47 :config:`output.sizes.big`).
48
49 :numref:`tab-stdout` lists the controls on what is printed to the standard output.
50 Note the ``-help`` and ``-usage`` options for getting help at the command line.
51
52 .. list-table:: Options controlling PISM's standard output
53 :header-rows: 1
54 :name: tab-stdout
55 :widths: 1,2
56
57 * - Option
58 - Description
59
60 * - :opt:`-help`
61 - Brief descriptions of the many PISM and PETSc options. The run occurs as usual
62 according to the other options. (The option documentation does not get listed if
63 the run didn't get started properly.) Use with a pipe into ``grep`` to get
64 usefully-filtered information on options, for example ``pisms -help | grep cold``.
65
66 * - :opt:`-info`
67 - Gives information about PETSc operations during the run.
68
69 * - :opt:`-list_diagnostics`
70 - Prints a list of all available diagnostic outputs (time series and spatial) for the
71 run with the given options. Stops run after printing the list.
72
73 * - :opt:`-log_summary`
74 - At the end of the run gives a performance summary and also a synopsis of the PETSc
75 configuration in use.
76
77 * - :opt:`-options_left`
78 - At the end of the run shows an options table which will indicate if a user option
79 was not read or was misspelled.
80
81 * - :opt:`-usage`
82 - Short summary of PISM executable usage, without listing all the options, and
83 without doing the run.
84
85 * - :opt:`-verbose`
86 - Increased verbosity of standard output. Usually given with an integer level;
87 0,1,2,3,4,5 are allowed. If given without argument then sets level 3, while
88 ``-verbose 2`` is the default (i.e. equivalent to no option). At the extremes,
89 ``-verbose 0`` produces no stdout at all, ``-verbose 1`` prints only warnings and a
90 few high priority messages, and ``-verbose 5`` spews a lot of usually-undesirable
91 stuff. ``-verbose 3`` output regarding initialization may be useful.
92
93 * - :opt:`-version`
94 - Show version numbers of PETSc and PISM.
95
96 The following sections describe more input and output options, especially related to
97 saving quantities during a run, or adding to the "diagnostic" outputs of PISM.
98
99 .. _sec-pism-io-performance:
100
101 PISM's I/O performance
102 ^^^^^^^^^^^^^^^^^^^^^^
103
104 When working with fine grids (resolutions of 2km and higher on the whole-Greenland scale,
105 for example), the time PISM spends writing output files, spatially-varying diagnostic
106 files, or backup files can become significant.
107
108 For fast file I/O the order of dimensions of a NetCDF variable in an output file has to
109 match the order used by PISM in memory, so we use the ``time,y,x,z`` storage order instead of
110 the more convenient (e.g. for NetCDF tools) order ``time,z,y,x``.
111
112 To transpose dimensions in an existing file, use the ``ncpdq`` ("permute dimensions
113 quickly") tool from the NCO_ suite. For example, run
114
115 .. code-block:: none
116
117 ncpdq -a time,z,zb,y,x bad.nc good.nc
118
119 to turn ``bad.nc`` (with any inconvenient storage order) into ``good.nc`` using the
120 ``time,z,y,x`` order.
121
122 PISM also supports parallel I/O using parallel NetCDF_, PnetCDF_, or ParallelIO_, which
123 can give better performance in high-resolution runs.
124
125 Use the command-line option :opt:`-o_format` (parameter :config:`output.format`) to choose
126 the approach to use when writing to output files (see :numref:`tab-output-format`). The
127 ``netcdf4_parallel`` requires parallel NetCDF, ``pnetcdf`` requires PnetCDF, and
128 ``pio_...`` require ParallelIO build with parallel NetCDF and PnetCDF. Section
129 :ref:`sec-install-pism-cmake-options`) explains how to select these libraries when
130 building PISM.
131
132 .. note::
133
134 When built with parallel NetCDF or PnetCDF (or both) PISM attempts to choose the best
135 way to *read* from input files and this logic appears to work well. This is why there
136 is no ``-i_format``.
137
138 .. csv-table:: Methods of writing to output files
139 :name: tab-output-format
140 :header: ``-o_format`` argument, Description
141
142 ``netcdf3``, (default); serialized I/O from rank 0 (NetCDF-3 file)
143 ``netcdf4_parallel``, parallel I/O using NetCDF (HDF5-based NetCDF-4 file)
144 ``pnetcdf``, parallel I/O using PnetCDF (CDF5 file)
145 ``pio_pnetcdf``, parallel I/O using ParallelIO (CDF5 file)
146 ``pio_netcdf4p``, parallel I/O using ParallelIO (HDF5-based NetCDF-4 file)
147 ``pio_netcdf4c``, serial I/O using ParallelIO (*compressed* HDF5-based NetCDF-4 file)
148 ``pio_netcdf``, serial I/O using ParallelIO (using data aggregation in ParallelIO)
149
150 The ParallelIO library can aggregate data in a subset of processes used by PISM. To choose
151 a subset, set
152
153 - :config:`output.pio.n_writers` number of "writers"
154 - :config:`output.pio.base` the index of the first writer
155 - :config:`output.pio.stride` interval between writers
156
157 .. note::
158
159 The CDF5 file format is a large-variable extension of the NetCDF-3 file format
160 developed by the authors of PnetCDF. This format is supported by NetCDF since version
161 4.4.
162
163 We recommend performing a number of test runs to determine the best choice for your
164 simulations.
165
166 In our test runs on 120 cores (whole Greenland setup on a 900m grid) ``pio_pnetcdf`` with
167 :config:`output.pio.n_writers` set to the number of cores used by PISM (120) gave the best
168 performance.
169
170 .. note::
171
172 It is important to make sure that PISM's output files are written to a parallel file
173 system and this file system is configured to achieve optimal performance.
174
175 On Lustre_ (a common parallel file systems) the theoretical throughput when writing to
176 a file depends on the number of *object storage targets* used to store it: if a target
177 can write 500 MiB/s, a file spread over 2 could be written at 1000 MiB/s assuming that
178 we are writing to both of them at the same time, and so on.
179
180 **For maximum speed we want to distribute an output file over all available targets.**
181
182 To do this:
183
184 1. Create a directory that will contain PISM output files (``output_directory`` below).
185 2. Run
186
187 .. code-block:: bash
188
189 lfs setstripe -c -1 output_directory
190
191 This sets the "stripe count" to ``-1``, which means "all".
192
193 Now all files in ``output_directory`` and all its sub-directories can use all
194 available targets.