Introduction to MVS 4 - Volumes, Catalogs and Datasets ====================================================== OK, here's the TL;DR: Dataset = file Volume = disk (but can also be a tape) that stores datasets Catalog = directory that maps datasets to volumes Lets start with the first one... Datasets -------- Data in MVS is stored in datasets. Not files, but datasets. I believe IBM prefers 'data sets'. Whatever. The thing is, from an application's point of view, a 'file' can consist of more than one dataset, depending on how you set things up at run time. I/O in MVS is record-based, in contrast to the byte-based facilities you find in Unix and Windows. Yes, yes, I *know* you can write software to read and write by record within an application, but this is different. There's nothing in those other OSes to stop you from writing byte-at-a-time to a file consisting of nicely-formatted records, and hosing the whole thing. MVS *prevents* an application from doing this by enforcing the dataset format at the OS level. Attempts to read or write incorrectly-formatted data will fail (or be padded/truncated if appropriate). However it obviously can't help you if your app is writing correctly-formatted garbage... The problem with this is that it means a bit more care and attention is required when you need to create datasets, and that applications and the datasets they access need to be in agreement with regards to the format of the data they are expecting. There are a number of different types of dataset, varying by organisation and record format [1]. Datasets are given a name when they are created, which can be of the following format: o Name segments ('qualifiers') of no more than 8 characters in length, separated by periods o Not more than 44 characters in length (including periods) o Using only alphanumerics and 'national characters' ('#', '@', '$') o First character of each segment must be alphabetic or national For example: IBMUSER.ACC1PRG4.COBOL SUNDOG.$$README A.PRETTY.DAMN.STUPID.DS.NAME Exceptions to this are 'partitioned data sets' (PDS) which behave like a dataset containing other datasets ('members'): IBMUSER.GOPHER.COBOL(GOPHCL) Here, the dataset named 'IBMUSER.GOPHER.COBOL' is a PDS, and the dataset actually being referred to is the 'GOPHCL' member contained within it. Enough about datasets, or I'll be at this all day. Volumes ------- Volumes in MVS are the media on which the data is stored. All volumes, whether tape or disk have a 6-character alphanumeric ID - the volume serial number. Tape volumes require some different handling to disk, so I'll stick with disk volumes here. Disk volumes in MVS are not your everyday hard drives. The disks used on more familiar systems (fibre channel, SCSI, SAS, SATA, etc) are known as 'fixed-block' devices. The block sizes of the physical devices are of a fixed and uniform size (commonly 512 bytes but 4096 is becoming more common). Not so in MVS - even now z/OS cannot use fixed-block devices, and instead must be presented with 'CKD' ('Count Key Data' ie. variable block size) disk devices to be happy. The link at [2] has way more information on this than you'd ever want to know. Having said that, CKD disks are no longer manufactured, and all storage attached to MVS systems nowadays is fixed-block devices emulating CKD. When initialised for use by MVS, the volume serial number is written to the disk, along with the VTOC ('Volume Table of Contents' [3]). Provided the volume is online, it is now ready for use - datasets can be created on that volume. Catalogs -------- So, we've got a bunch of volumes and we've created a pile of datasets on there. How to access them? MVS uses a system of 'catalogs' to manage the locations of datasets, so that given the name of a particular dataset, it knows which volume to access in order to read or write data to it (provided the dataset in question has been cataloged). There are two types of catalog - master catalogs, and user catalogs. There *must* be one master catalog defined to the system. There can be zero or more user catalogs. The master catalog is the first place MVS looks when trying to locate a dataset (given only a dataset name). It goes something like this: If we need to access the SYS1.PARMLIB dataset, MVS goes away and looks in the master catalog for a 'SYS1.PARMLIB' entry. As it's one of the datasets used by MVS itself, it's right there in the master catalog - SYS1.PARMLIB -> OS39RA Found it, it's on volume OS39RA. MVS checks the VTOC on volume OS39RA, and finds out exactly where on the volume the dataset is. Job done. "Wait a minute!" you say, "Surely having thousands of datasets in a big lookup table is just terribly inefficient and a management nightmare?" This is where user catalogs come in. User catalogs work like this: You remember about dataset names and 'qualifiers'? We can create 'aliases' in the master catalog, which group datasets by qualifier, and then datasets with common qualifiers can be cataloged in separate 'user' catalogs. Say we have a... I dunno, a COBOL compiler to install on the system. All the datasets that make up the compiler (compiler executables, libraries, library source and so on) have a common high-level qualifier (the first segment of the dataset name), for example 'COBOL703': COBOL703.COMPILER.BIN COBOL703.LIBS.BIN COBOL703.LIBS.SOURCE ...etc. We define a user catalog 'USERCAT.COMPILERS' (which we might use for all compilers/assemblers/debuggers for instance), and then create an alias in the master catalog as follows: COBOL703 -> USERCAT.COMPILERS And then all datasets beginning 'COBOL703' can be cataloged in the new user catalog. The MVS dataset search then goes: Check master catalog for 'COBOL703.LIBS.SOURCE' Follow alias for COBOL703 to 'USERCAT.COMPILERS' user catalog Check USERCAT.COMPILERS for 'COBOL703.LIBS.SOURCE' volume serial number Get 'COBOL703.LIBS.SOURCE' location from volume VTOC More than one alias can point to the same user catalog. In the same way as devices can be attached/mounted/accessed in Unix systems, disk devices can be moved between MVS systems. Attach the device (including catalog), import the catalog, and MVS now knows about the datasets on the imported volume. NOTE: It is not required for a dataset to be cataloged. If you know the volume on which a dataset resides, the volume + dataset name is sufficient to locate a dataset without accessing any catalog. HOWEVER! This also means it is perfectly possible to have duplicate dataset names on different volumes - only one of those datasets can be listed in the catalogs. For example master catalog: COBOL703.COMPILER.BIN -> volume COM001 volume COM001 VTOC: COBOL703.COMPILER.BIN -> location xxx volume OLD321 VTOC: COBOL703.COMPILER.BIN -> location yyy Those two 'COBOL703.COMPILER.BIN' datasets may have different formats, contents, access permissions and so on. Hmmm. BIG opportunities for footgun moments here. There's a *lot* more to MVS storage than this whistle-stop tour would indicate, but it's enough for an overview of what's going on. [1] https://en.wikipedia.org/wiki/Data_set_(IBM_mainframe) [2] https://en.wikipedia.org/wiki/Count_key_data [3] https://en.wikipedia.org/wiki/Volume_Table_of_Contents