Introduction to MVS 4 - Volumes, Catalogs and Datasets
======================================================

OK, here's the TL;DR:

Dataset = file
Volume = disk (but can also be a tape) that stores datasets
Catalog = directory that maps datasets to volumes

Lets start with the first one...

Datasets
--------

Data in MVS is stored in datasets. Not files, but datasets. I believe
IBM prefers 'data sets'. Whatever. The thing is, from an application's
point of view, a 'file' can consist of more than one dataset,
depending on how you set things up at run time.

I/O in MVS is record-based, in contrast to the byte-based facilities
you find in Unix and Windows. Yes, yes, I *know* you can write
software to read and write by record within an application, but this
is different. There's nothing in those other OSes to stop you from
writing byte-at-a-time to a file consisting of nicely-formatted
records, and hosing the whole thing. MVS *prevents* an application
from doing this by enforcing the dataset format at the OS
level. Attempts to read or write incorrectly-formatted data will fail
(or be padded/truncated if appropriate).  However it obviously can't
help you if your app is writing correctly-formatted garbage...

The problem with this is that it means a bit more care and attention
is required when you need to create datasets, and that applications
and the datasets they access need to be in agreement with regards to
the format of the data they are expecting.

There are a number of different types of dataset, varying by
organisation and record format [1].

Datasets are given a name when they are created, which can be of the
following format:

o Name segments ('qualifiers') of no more than 8 characters in
  length, separated by periods 
o Not more than 44 characters in length (including periods)
o Using only alphanumerics and 'national characters' ('#', '@', '$')
o First character of each segment must be alphabetic or national

For example:

IBMUSER.ACC1PRG4.COBOL
SUNDOG.$$README
A.PRETTY.DAMN.STUPID.DS.NAME

Exceptions to this are 'partitioned data sets' (PDS) which behave like
a dataset containing other datasets ('members'):

IBMUSER.GOPHER.COBOL(GOPHCL)

Here, the dataset named 'IBMUSER.GOPHER.COBOL' is a PDS, and the
dataset actually being referred to is the 'GOPHCL' member contained
within it.

Enough about datasets, or I'll be at this all day.

Volumes
-------

Volumes in MVS are the media on which the data is stored. All volumes,
whether tape or disk have a 6-character alphanumeric ID - the volume
serial number. Tape volumes require some different handling to disk,
so I'll stick with disk volumes here.

Disk volumes in MVS are not your everyday hard drives. The disks used
on more familiar systems (fibre channel, SCSI, SAS, SATA, etc) are
known as 'fixed-block' devices. The block sizes of the physical
devices are of a fixed and uniform size (commonly 512 bytes but 4096
is becoming more common). Not so in MVS - even now z/OS cannot use
fixed-block devices, and instead must be presented with 'CKD' ('Count
Key Data' ie. variable block size) disk devices to be happy. The link
at [2] has way more information on this than you'd ever want to
know. Having said that, CKD disks are no longer manufactured, and all
storage attached to MVS systems nowadays is fixed-block devices
emulating CKD.

When initialised for use by MVS, the volume serial number is written
to the disk, along with the VTOC ('Volume Table of Contents'
[3]). Provided the volume is online, it is now ready for use -
datasets can be created on that volume.

Catalogs
--------

So, we've got a bunch of volumes and we've created a pile of datasets
on there. How to access them?

MVS uses a system of 'catalogs' to manage the locations of datasets,
so that given the name of a particular dataset, it knows which volume
to access in order to read or write data to it (provided the dataset
in question has been cataloged).

There are two types of catalog - master catalogs, and user
catalogs. There *must* be one master catalog defined to the
system. There can be zero or more user catalogs.

The master catalog is the first place MVS looks when trying to locate
a dataset (given only a dataset name). It goes something like this:

If we need to access the SYS1.PARMLIB dataset, MVS goes away and looks
in the master catalog for a 'SYS1.PARMLIB' entry. As it's one of the
datasets used by MVS itself, it's right there in the master catalog -

SYS1.PARMLIB -> OS39RA

Found it, it's on volume OS39RA. MVS checks the VTOC on volume OS39RA,
and finds out exactly where on the volume the dataset is. Job done.

"Wait a minute!" you say, "Surely having thousands of datasets in a
big lookup table is just terribly inefficient and a management
nightmare?" This is where user catalogs come in. User catalogs work
like this:

You remember about dataset names and 'qualifiers'? We can create
'aliases' in the master catalog, which group datasets by qualifier,
and then datasets with common qualifiers can be cataloged in separate
'user' catalogs.

Say we have a... I dunno, a COBOL compiler to install on the
system. All the datasets that make up the compiler (compiler
executables, libraries, library source and so on) have a common
high-level qualifier (the first segment of the dataset name), for
example 'COBOL703':

COBOL703.COMPILER.BIN
COBOL703.LIBS.BIN
COBOL703.LIBS.SOURCE
...etc.

We define a user catalog 'USERCAT.COMPILERS' (which we might use for
all compilers/assemblers/debuggers for instance), and then create an
alias in the master catalog as follows:

COBOL703 -> USERCAT.COMPILERS

And then all datasets beginning 'COBOL703' can be cataloged in the new
user catalog. The MVS dataset search then goes:

Check master catalog for 'COBOL703.LIBS.SOURCE'
Follow alias for COBOL703 to 'USERCAT.COMPILERS' user catalog
Check USERCAT.COMPILERS for 'COBOL703.LIBS.SOURCE' volume serial number
Get 'COBOL703.LIBS.SOURCE' location from volume VTOC

More than one alias can point to the same user catalog.

In the same way as devices can be attached/mounted/accessed in Unix
systems, disk devices can be moved between MVS systems. Attach the
device (including catalog), import the catalog, and MVS now knows
about the datasets on the imported volume.

NOTE: It is not required for a dataset to be cataloged. If you know
the volume on which a dataset resides, the volume + dataset name is
sufficient to locate a dataset without accessing any catalog. HOWEVER!
This also means it is perfectly possible to have duplicate dataset
names on different volumes - only one of those datasets can be listed
in the catalogs. For example

master catalog:
COBOL703.COMPILER.BIN -> volume COM001

volume COM001 VTOC:
COBOL703.COMPILER.BIN -> location xxx

volume OLD321 VTOC:
COBOL703.COMPILER.BIN -> location yyy

Those two 'COBOL703.COMPILER.BIN' datasets may have different formats,
contents, access permissions and so on. Hmmm. BIG opportunities for
footgun moments here.

There's a *lot* more to MVS storage than this whistle-stop tour would
indicate, but it's enough for an overview of what's going on.

[1] https://en.wikipedia.org/wiki/Data_set_(IBM_mainframe)
[2] https://en.wikipedia.org/wiki/Count_key_data
[3] https://en.wikipedia.org/wiki/Volume_Table_of_Contents