tinput-output.rst - pism - [fork] customized build of PISM, the parallel ice sheet model (tillflux branch)
 (HTM) git clone git://src.adamsgaard.dk/pism
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) LICENSE
       ---
       tinput-output.rst (7962B)
       ---
            1 .. include:: ../../global.txt
            2 
            3 .. _sec-input-output:
            4 
            5 Input and output
            6 ----------------
            7 
            8 PISM is a program that reads NetCDF files and then outputs NetCDF files. Table
            9 :numref:`tab-input-output-options` summarizes command-line options controlling the most
           10 basic ways to input and output NetCDF files when starting and ending PISM runs.
           11 
           12 .. list-table:: Basic NetCDF input and output options
           13    :name: tab-input-output-options
           14    :header-rows: 1
           15    :widths: 1,2
           16 
           17    * - Option
           18      - Description
           19 
           20    * - :opt:`-i`
           21      - Chooses a PISM output file (NetCDF format) to initialize or restart from. See
           22        section :ref:`sec-initboot`.
           23 
           24    * - :opt:`-bootstrap`
           25      - Bootstrap from the file set using :opt:`-i` using heuristics to "fill in" missing
           26        fields. See section :ref:`sec-initboot`.
           27 
           28    * - :opt:`-ssa_read_initial_guess false`
           29      - Turns off reading the ``ubar_ssa`` and ``vbar_ssa`` velocities saved by a previous
           30        run using the ``ssa`` or ``ssa+sia`` stress balance (see section
           31        :ref:`sec-stressbalance`).
           32 
           33    * - :opt:`-o`
           34      - Chooses the output file name.  Default name is ``unnamed.nc``.
           35 
           36    * - :opt:`-o_size` ``size_keyword``
           37      - Chooses the size of the output file to produce. Possible sizes are
           38 
           39        - ``none`` (*no* output file at all),
           40        - ``small`` (only variables necessary to restart PISM),
           41        - ``medium`` (the default, includes diagnostic quantities listed in the
           42          configuration parameter :config:`output.sizes.medium`, if they are available in
           43          the current PISM setup),
           44        - ``big_2d`` (same as ``medium``, plus variables listed in
           45          :config:`output.sizes.big_2d`), and
           46        - ``big`` (same as ``big_2d``, plus variables listed in
           47          :config:`output.sizes.big`).
           48 
           49 :numref:`tab-stdout` lists the controls on what is printed to the standard output.
           50 Note the ``-help`` and ``-usage`` options for getting help at the command line.
           51 
           52 .. list-table:: Options controlling PISM's standard output
           53    :header-rows: 1
           54    :name: tab-stdout
           55    :widths: 1,2
           56 
           57    * - Option
           58      - Description
           59 
           60    * - :opt:`-help`
           61      - Brief descriptions of the many PISM and PETSc options. The run occurs as usual
           62        according to the other options. (The option documentation does not get listed if
           63        the run didn't get started properly.) Use with a pipe into ``grep`` to get
           64        usefully-filtered information on options, for example ``pisms -help | grep cold``.
           65 
           66    * - :opt:`-info`
           67      - Gives information about PETSc operations during the run.
           68 
           69    * - :opt:`-list_diagnostics`
           70      - Prints a list of all available diagnostic outputs (time series and spatial) for the
           71        run with the given options. Stops run after printing the list.
           72 
           73    * - :opt:`-log_summary`
           74      - At the end of the run gives a performance summary and also a synopsis of the PETSc
           75        configuration in use.
           76 
           77    * - :opt:`-options_left`
           78      - At the end of the run shows an options table which will indicate if a user option
           79        was not read or was misspelled.
           80 
           81    * - :opt:`-usage`
           82      - Short summary of PISM executable usage, without listing all the options, and
           83        without doing the run.
           84 
           85    * - :opt:`-verbose`
           86      - Increased verbosity of standard output. Usually given with an integer level;
           87        0,1,2,3,4,5 are allowed. If given without argument then sets level 3, while
           88        ``-verbose 2`` is the default (i.e. equivalent to no option). At the extremes,
           89        ``-verbose 0`` produces no stdout at all, ``-verbose 1`` prints only warnings and a
           90        few high priority messages, and ``-verbose 5`` spews a lot of usually-undesirable
           91        stuff. ``-verbose 3`` output regarding initialization may be useful.
           92 
           93    * - :opt:`-version`
           94      - Show version numbers of PETSc and PISM.
           95 
           96 The following sections describe more input and output options, especially related to
           97 saving quantities during a run, or adding to the "diagnostic" outputs of PISM.
           98 
           99 .. _sec-pism-io-performance:
          100 
          101 PISM's I/O performance
          102 ^^^^^^^^^^^^^^^^^^^^^^
          103 
          104 When working with fine grids (resolutions of 2km and higher on the whole-Greenland scale,
          105 for example), the time PISM spends writing output files, spatially-varying diagnostic
          106 files, or backup files can become significant.
          107 
          108 For fast file I/O the order of dimensions of a NetCDF variable in an output file has to
          109 match the order used by PISM in memory, so we use the ``time,y,x,z`` storage order instead of
          110 the more convenient (e.g. for NetCDF tools) order ``time,z,y,x``.
          111 
          112 To transpose dimensions in an existing file, use the ``ncpdq`` ("permute dimensions
          113 quickly") tool from the NCO_ suite. For example, run
          114 
          115 .. code-block:: none
          116 
          117    ncpdq -a time,z,zb,y,x bad.nc good.nc
          118 
          119 to turn ``bad.nc`` (with any inconvenient storage order) into ``good.nc`` using the
          120 ``time,z,y,x`` order.
          121 
          122 PISM also supports parallel I/O using parallel NetCDF_, PnetCDF_, or ParallelIO_, which
          123 can give better performance in high-resolution runs.
          124 
          125 Use the command-line option :opt:`-o_format` (parameter :config:`output.format`) to choose
          126 the approach to use when writing to output files (see :numref:`tab-output-format`). The
          127 ``netcdf4_parallel`` requires parallel NetCDF, ``pnetcdf`` requires PnetCDF, and
          128 ``pio_...`` require ParallelIO build with parallel NetCDF and PnetCDF. Section
          129 :ref:`sec-install-pism-cmake-options`) explains how to select these libraries when
          130 building PISM.
          131 
          132 .. note::
          133 
          134    When built with parallel NetCDF or PnetCDF (or both) PISM attempts to choose the best
          135    way to *read* from input files and this logic appears to work well. This is why there
          136    is no ``-i_format``.
          137 
          138 .. csv-table:: Methods of writing to output files
          139    :name: tab-output-format
          140    :header: ``-o_format`` argument, Description
          141 
          142    ``netcdf3``, (default); serialized I/O from rank 0 (NetCDF-3 file)
          143    ``netcdf4_parallel``, parallel I/O using NetCDF (HDF5-based NetCDF-4 file)
          144    ``pnetcdf``, parallel I/O using PnetCDF (CDF5 file)
          145    ``pio_pnetcdf``,  parallel I/O using ParallelIO (CDF5 file)
          146    ``pio_netcdf4p``, parallel I/O using ParallelIO (HDF5-based NetCDF-4 file)
          147    ``pio_netcdf4c``, serial I/O using ParallelIO (*compressed* HDF5-based NetCDF-4 file)
          148    ``pio_netcdf``,   serial I/O using ParallelIO (using data aggregation in ParallelIO)
          149 
          150 The ParallelIO library can aggregate data in a subset of processes used by PISM. To choose
          151 a subset, set
          152 
          153 - :config:`output.pio.n_writers` number of "writers"
          154 - :config:`output.pio.base` the index of the first writer
          155 - :config:`output.pio.stride` interval between writers
          156 
          157 .. note::
          158 
          159    The CDF5 file format is a large-variable extension of the NetCDF-3 file format
          160    developed by the authors of PnetCDF. This format is supported by NetCDF since version
          161    4.4.
          162 
          163 We recommend performing a number of test runs to determine the best choice for your
          164 simulations.
          165 
          166 In our test runs on 120 cores (whole Greenland setup on a 900m grid) ``pio_pnetcdf`` with
          167 :config:`output.pio.n_writers` set to the number of cores used by PISM (120) gave the best
          168 performance.
          169 
          170 .. note::
          171 
          172    It is important to make sure that PISM's output files are written to a parallel file
          173    system and this file system is configured to achieve optimal performance.
          174 
          175    On Lustre_ (a common parallel file systems) the theoretical throughput when writing to
          176    a file depends on the number of *object storage targets* used to store it: if a target
          177    can write 500 MiB/s, a file spread over 2 could be written at 1000 MiB/s assuming that
          178    we are writing to both of them at the same time, and so on.
          179 
          180    **For maximum speed we want to distribute an output file over all available targets.**
          181 
          182    To do this:
          183 
          184    1. Create a directory that will contain PISM output files (``output_directory`` below).
          185    2. Run
          186 
          187       .. code-block:: bash
          188 
          189          lfs setstripe -c -1 output_directory
          190 
          191       This sets the "stripe count" to ``-1``, which means "all".
          192 
          193       Now all files in ``output_directory`` and all its sub-directories can use all
          194       available targets.