Return-Path: Date: Mon, 7 Oct 91 07:08:51 IST From: List Processor (1.2) Subject: File: "LISTLPUN MEMO" being sent to you To: bzs@WORLD.STD.COM LISTEARN List Processor, Release 1.0 ------------------------------------------- LISTEARN 1.0 (c) EARN Association 1989 is derived from: LISTSERV 1.5o (c) Eric Thomas 1986,1987,1988,1989 +-----------------------------------------------------------+ | Revised LISTSERV: LISTSERV-Punch Implementation Guide | +-----------------------------------------------------------+ | | | Document number: R01-007-0 (October 4th, 1987) | | | | Author: Eric Thomas | | | | Document fileid: "LISTLPUN MEMO" (from "Info LPunch") | +-----------------------------------------------------------+ Preface This manual is a Reference Guide for application programmers who have to implement the LISTSERV-Punch file transfer format on their oper- ating system. It provides a complete technical description of the LISTSERV-Punch format, along with a brief summary of some relevant information about the CMS file system. In addition, sample conversion programs in PASCAL and C are provided to assist the application programmer in implementing a satisfactory LISTSERV-Punch conversion program on his system. NOTE: The LISTSERV-Punch file transfer format was introduced with release 1.4 of Revised LISTSERV. Although the LISTSERV-Punch format might be expanded in the future to provide additional functions, full upwards compatibility is warrantied provided that the user-written conversion program strictly conforms to the standard defined in this document. Deviations from the norm (e.g. not ignoring the remainder of the ID control card) might lead to problems with future releases of LISTSERV. Conventions ----------- The following typographical conventions have been made in this docu- ment to improve its readability: | o Recent changes in the publication are indicated by a vertical bar | in the left margin. ! o Intermediate changes between two releases of the document ("Pre- ! releases") are flagged with an exclamation point in the left ! margin. Features described in this fashion should be considered ! as not documented and not officially supported until the exclama- ! tion point is removed. > o Temporary restrictions or circumventions are marked with a > "greater than" sign in the left margin. This sign may also be used > to signal obsolete features for which support will be dropped in > the next release. LISTSERV-PUNCH FORMAT DESCRIPTION _________________________________ **************** * Introduction * **************** The term "LISTSERV-Punch" refers to a "file transmission format" which is used by LISTSERV when sending files larger than 80 characters per record to computing systems which do not have the ability to process Netdata, CARD DUMP or DISK DUMP format. It was designed to accomodate file transmission through mail-only gateways. Its primary features are: o Ability to transfer files of any record length up to 65535 charac- ters per record. This limit is arbitrary and corresponds to the longest string a high-level language compiler can usually handle. o Ability to transfer files without automatically padding or strip- ping records of trailing blanks. That is, leading or trailing blanks are not removed by the file transfer process itself. o Simplicity of the decoding program. Sample conversion programs in PASCAL and C are provided at the end of this document. The PASCAL program was written and tested in about 45 minutes, and any PASCAL programmer should be able to adapt it very quickly to work on his own system. The C program should work unmodified under any standard UNIX system, but might need to be improved to allow the specification of input and output parameters and/or system-dependent switches for the various filing system calls. o Acceptable network efficiency: very little extra lines are gener- ated, and trailing blanks are stripped before transmission. o Possibility for a human reader to get a general idea of the contents of the file without having to resort to a conversion program. **************************** * Format of the INPUT file * **************************** A LISTSERV-Punch formatted file (also called "card deck", with the term "card" being used to refer to a 80-characters file record) has the following outer aspect: +--------------------------------------------------------------------+ | junk line | | . | | . | | junk line | | | | ID/ control card | | | | data line #1 | | . | | . | | data line #n | | | | END/ control card | | | | junk line | | . | | . | | junk line | | | | Figure 1. Input card deck format | +--------------------------------------------------------------------+ Junk lines ---------- Anything before the "ID/" card (e.g. mail header, comments) is ignored. Similarly, anything after the "END/" card (e.g. mailer keywords, "Acknowledge-To:" field) is discarded. Encountering the end of the file without any ID or END card is an error. However, encount- ering an END card before any ID card is not an error since anything before the first ID card must be ignored without even being parsed. Control lines ------------- The ID card contains information about the file being sent, while the END card is merely an end-of-stream indicator. The format of these two cards is: +--------------------------------------------------------------------+ | ID/filename filetype recfm lrecl reserved | | | | | | | | | | V V V V V V | | Column numbers: 1 4 13 22 24 29 | | | | END/reserved | | | | | | V V | | Column numbers: 1 5 | | | | Blanks shown between keywords are real. See later on for more | | information on the meaning of "filename", "recfm", etc. | | | | Figure 2. Control cards format | +--------------------------------------------------------------------+ NOTE: The end of the ID card (from column 30 onwards, inclusive) and the end of the END card (from column 5 onwards, inclusive) are reserved fields which should be ignored by the conversion program. Failure to observe this rule might result in severe compatibility problems with future releases of LISTSERV. Data lines ---------- Folding algorithm Each record of the source file is broken into one or more physical "cards" before being transmitted. The first card of each such group will indicate the number of cards in the group, counting itself as one card (i.e. this is not the number of continuation cards but the total number of cards), and possibly the length of the source record line (only if "recfm" is 'V' - see below). The remainder of the first card, as well as the second and following cards in the group, are pure data bytes. The resulting concatenated record will have to be padded with blanks as determined by the record length indication, if needed. Description of the format of a data line Fixed-length records +--------------------------------------------------------------------+ | recfm = F | | ncards/data | | | | | | V V | | Column numbers: 1 2+Length(ncards) | | | | Figure 3. Data line format, fixed-length records | +--------------------------------------------------------------------+ Variable-length records +--------------------------------------------------------------------+ | recfm = V | | lrecl/ncards/data | | | | | | | V | V | | Column numbers: 1 | 3+Length(lrecl)+Length(ncards) | | V | | 2+Length(lrecl) | | | | Figure 4. Data line format, variable-length records | +--------------------------------------------------------------------+ NCARDS is the total number of cards in the group (ncards >= 1). LRECL is the logical record length of the source file record asso- ciated with the group; it does not appear in recfm F files for which the record length is defined in the ID card. *************************************** * Summary of CMS file characteristics * *************************************** Since LISTSERV operates in an IBM VM/SP CMS environment and the file will have to be received on a different operating system, a very short description of the CMS file system has been included here. CMS file names -------------- Under CMS, each file is identified by a "filename" and a "filetype", both of them being strings of 1 to 8 characters taken out of the following set: A-Z a-z 0-9 #$@-+:_ However, LISTSERV will never send a file containing a lowercase char- acter in its filename or filetype, because a lot of systems have to convert them to uppercase or will convert network interactive messages to uppercase, thus making it impossible for users to enter mixed case file names in the commands they send to LISTSERV. CMS file structure ------------------ CMS files consist in a series of "records", as opposed to UNIX or MS-DOS files which consist in a series of bytes. When calling the filing system to perform a write operation, you must provide a full record. It is not possible to write "byte by byte" since this has no meaning (you could of course write a series of one-byte records but this would not create the file you expected). Each CMS file has therefore, among other attributes: o A "number of records" field. o A "logical record length" (which is called lrecl). This is the size of the longest record in the file, and consequently the minimum size of the storage buffer required to read the file record by record. o A "record format" (which is called recfm). This "record format" is a single character, F for FIXED or V for VARIABLE. A recfm of "F" indicates that all the records in the file have exactly the same lrecl, while "V" indicates the opposite. The filing system uses more efficient algorithms when handling recfm F files, but they require more disk space of course. Implementation considerations ----------------------------- Before you start writing the conversion program, you must answer the following set of questions: 1. Which name will I use on my system for the newly converted file? The sample C program reads from standard input and writes the converted file to standard output, leaving the user responsible for providing adequate redirection on the command line. The sample PASCAL program, which is written for MS-DOS, uses the filename field as file name and the three first characters of the filetype as file extension, i.e. filename+'.'+Left(filetype,3). You may use a similar algorithm or select a constant output file name such as "LISTSERV.PUN". 2. How should recfm F files be written on my system? The sample program treats a recfm F file as a recfm V file which just happens to have records of identical length. A CR+LF sequence is therefore output at the end of each record, regarless of the recfm. Alternatively, the program could have been designed to write recfm F files "as is", i.e. without any CR+LF sequence at the end of the record. It all depends on the capabilities of your system and (above all) of your system's editor. SAMPLE PASCAL PROGRAM _____________________ ********************* * Preliminary notes * ********************* This sample PASCAL program was written for Turbo-PASCAL (a trademark of Borland International). It is provided only as an example - no warranty of any kind is made that the program will function properly on your operating system. Detailed comments have been provided for all the "system dependent" procedures. The program was designed for transportability, not efficiency - there are a lot of places where some optimization would greatly improve execution speed, but that was not the objective of the program. All the functions and procedures which are not part of the standard library have been Capitalized, while standard functions have been entered in lower case. These func- tions might exist on some of the compilers, possibly under a different name. The compiler is assumed to be able to handle the "string" type; if it doesn't, you will have to write a few "string-handling" func- tions which emulate strings from an array of char. Finally, since not all terminals accept the special characters used by PASCAL, the following conversion has been made: opening bracket --> (. closing bracket --> .) opening curly brace --> (* closing curly brace --> *) NOTE: in Turbo-PASCAL, string concatenation is done using the "+" sign. *----------------------------- Cut here --------------------------------* (* * LISTSERV-Punch PASCAL conversion program, version number 03 * * Written by Eric Thomas * * * This public domain program has been tested on an IBM PC-compatible * system on several small files that had to be keyed-in manually on * the PC since no PC-to-mainframe connection was available. * * The working version of the program had to be printed and re-keyed * in manually into the document you are reading. There can therefore * be several keying errors -- proofreading programs is very tiring * and consequently very difficult. * * * Synopsis: * * The input file from LISTSERV is assumed to be CARDS.DAT * * Output is directed to file "filename.filetype", with filetype * being truncated to 3 characters of course. * * * Problems: * * Send problem or bug reports to . * Metaphysical complaints about the Rules of Structured Programming * and their relationship to this blasphemous program should better * stay in their author's own warm, safe and cosy mailbox. * *) program LPUNCH(input,cardfile,output,outfile); (* Note: Turbo-PASCAL requires square brackets for the "string" type, while other compilers require parenthesis. Since some terminals cannot display square brackets, parenthesis will be used. *) type cards=string(80); string5=string(5); (* This is because Turbo is not a very power- *) string8=string(8); (* ful compiler *) anystring=string(255); (* Substitute the maximum length of a "string" var on your compiler *) var cardfile:text; (* Input file -- compiler dependent declaration *) outfile:file of char; (* compiler dependent declaration *) card:cards; filename,filetype:string8; recfm:char; i,ncards,lrecl,xlrecl:integer; function Substr(s:anystring;start,size:integer):anystring; (* This function must return the substring of "s" starting at character number "start" and of length "size". The resulting string must NOT be padded with blanks if start+size-1 > length of the string. *) begin Substr:=Copy(s,start,size) (* Turbo has it under another name *) end; function Length(s:anystring):integer; (* This return the length of the string, with 0 for a null string *) begin Length:=ord(s(.0.)) (* Same as the built-in Turbo function *) end; procedure Opencards; (* This procedure must open the input file, wherever it may be or your system, and display an error message and exit it there is an error *) begin assign(cardfile,'CARDS.DAT'); (*$i-*) reset(cardfile); (*$i+*) if ioresult <> 0 then begin writeln('Error opening input file'); halt (* HALT is a GOTO just before the END. statement *) end end; procedure Closecards; (* This procedure must close the input file. You may wish to have it delete the input file too *) begin (*$i-*) close(cardfile); if ioresult = 0 then; (* just reset turbo's hang... *) (*$i+*) end; function Getcard:cards; var result:cards; (* This function must read one 80-chars line (one "card") from input and return it as result. A premature EOF is an error and must cause termination of the program. The result must be EXACTLY 80 characters in length *) begin (*$i-*) readln(cardfile,result); (*$i+*) if ioresult <> 0 then begin writeln('Premature EOF on input'); halt end; while Length(result) <> 80 do result:=result+' '; Getcard:=result end; function Dec2bin(s:string5):integer; var n,error:integer; (* This function converts a string of numeric characters into its binary representation ("integer"). An error should cause termination*) begin val(s,n,error); if error = 0 then Dec2bin:=n else begin writeln('Invalid decimal argument'); halt end end; procedure Openout(fn,ft:string8); (* This procedure must open the output file under whatever file-id you may choose *) begin assign(outfile,fn+'.'+Substr(ft,1,3)); (*$i-*) rewrite(outfile); (*$i+*) if ioresult <> 0 then begin writeln('Error opening output file'); halt end end; procedure Closeout; (* This procedure must close the output file. Errors should be detected since they can mean a disk-full condition while writing the last buffer, or suchlike *) begin (*$i-*) close(outfile); if ioresult <> 0 then begin writeln('Error occured while closing output file'); halt end (*$i+*) end; function Gobbleword(var card:cards):string5; var s:string5; (* This function should be left "as is" *) begin s:=''; repeat s:=s+card(.1.); card:=Substr(card,2,80) until card(.1.) = '/'; card:=Substr(card,2,80); (* Delete the "/" sign too *) Gobbleword:=s end; procedure Outchar(c:char); (* This outputs a character to "outfile". On byte-based file systems such as UNIX or MS-DOS, the character is just sent to standard output. On record-based systems it must be appended to a buffer (string(65535) or array of char) before the actual write is performed (see below) *) begin (*$i-*) write(outfile,c); (*$i+*) if ioresult <> 0 then begin writeln('Error writing to output file'); halt end end; procedure Endofrecord; (* This procedure is called when the end of a record is reached. On record-based file systems, the buffer must be written; on byte-based systems a CR+LF (or similar) must be output for recfm V files, and possibly for recfm F files (your option) *) begin Outchar(chr(13)); (* CR *) Outchar(chr(10)) (* LF *) end; procedure Writeout(s:cards;var l:integer); var i:integer; (* This one must be left "as is" *) begin if Length(s) > l then s:=Substr(s,1,l); for i:=1 to Length(s) do Outchar(Substr(s,i,1)); l:=l-Length(s) end; (* Main program -- nothing needs to be changed *) begin Opencards; repeat card:=Getcard until Substr(card,1,3) = 'ID/'; filename:=Substr(card,4,8); filetype:=Substr(card,13,8); recfm:=Substr(card,22,1); lrecl:=Dec2bin(Substr(card,24,5)); Openout(filename,filetype); repeat card:=Getcard; if Substr(card,1,4) <> 'END/' then begin if recfm = 'V' then xlrecl:=Dec2bin(Gobbleword(card)) else xlrecl:=lrecl; ncards:=Dec2bin(Gobbleword(card)); repeat Writeout(card,xlrecl); ncards:=ncards-1; if ncards <> 0 then card:=Getcard until ncards = 0; for i:=1 to xlrecl do Outchar(' '); Endofrecord end until Substr(card,1,4) = 'END/'; Closecards; Closeout; writeln('File "',filename,' ',filetype, '" has been successfully converted.') end. *----------------------------- Cut here --------------------------------* SAMPLE C PROGRAM ________________ ********************* * Preliminary notes * ********************* This sample C program is provided only as an example - no warranty of any kind is made that the program will function properly on your oper- ating system. Detailed comments have been provided for all the "system dependent" procedures. All the functions and procedures which are not part of the standard library have been Capitalized, while standard functions have been entered in lower case. The program should operate properly as is on any standard UNIX system, although some system-dependent additions might be necessary in some cases (e.g. special options on the output file fopen call, or addition of command-line switches). Finally, since not all terminals accept the special characters used by C, the following conversion has been made: opening bracket --> (: closing bracket --> :) opening curly brace --> (* closing curly brace --> *) backslash --> You should be able to reverse this conversion easily with your favourite text editor before compiling the program. *----------------------------- Cut here --------------------------------* /* * LISTSERV-Punch C conversion program, version number 02 * * Written by Eric Thomas * * * This public domain program has been tested on a VM system with the * huge BITEARN NODES file as input (over 23,000 LISTSERV-Punch format * records), as well as several other smaller files. No problem has * been encountered in the testing phase. * * The working version of the program has been electronically copied * into the document you are reading, untouched. There can therefore * be no keying error, or then it was in the original program too and * was not detected during the testing period. * * * Synopsis: * * The input file from LISTSERV is assumed to be the standard input. * * Output is directed to the standard output, but the program is * structured in such a way that this can be easily changed. * * * Problems: * * Send problem or bug reports to . * Philosophical lectures about the grandiose organization of C and * how it has been violated in this program should be directly * forwarded to /dev/null for a prompt answer. * */ #include char card(:81:), /* Input record image */ filename(:9:), filetype(:9:), number(:6:), /* Scratch string to hold a number */ recfm; int i,ncards,lrecl,xlrecl; FILE *outfile; /* This function does not have to be tailored */ Substr(outstring,instring,start,length) char *outstring,*instring; int start,length; (* register int rd=0,wr=0; register char c; --start; while(rd < start) (* if (instring(:rd++:)=='0') (* outstring=""; return; *) *) while(length-- > 0) (* if ((c=instring(:rd++:)) != '0') outstring(:wr++:)=c; else break; *) outstring(:wr:)='0'; return; *) /* This function must read one 80-chars line (one "card") from input and return it as result. A premature EOF is an error and must cause termination of the program. The result must be EXACTLY 80 characters in length */ Getcard(string) char *string; (* register int i=0,c; while(i < 80) (* if (((c=getchar()) != EOF) && (c != 'n')) string(:i++:)=c; else break; *) if (c == EOF) (* fprintf(stderr,"Premature EOF on input.n"); exit(100); *) if (i==80) (* if (getchar() != 'n') (* fprintf(stderr,"Input file contains records larger than 80.n"); exit(100); *) *) while (i < 80) (* string(:i++:)=' '; *) string(:i:)='0'; return; *) /* This function does not have to be tailored */ Dec2bin(result,string) char *string; int *result; (* if (sscanf(string,"%d",result) != 0) return; fprintf(stderr,"Invalid decimal argument -- '%s'.n",string); exit(100); *) /* This procedure must open the output file under whatever file-id you may choose */ Openout(fn,ft) char *fn,*ft; (* /* Our implementation uses standard output as file pointer */ outfile=stdout; /* The following instruction must be uncommented for use under most VM C's if the output file is to have a lrecl > 80 */ /* outfile=fopen("LPUNCH OUTPUT A (recfm v lrecl 65535","w"); */ return; *) /* This procedure must close the output file. Errors should be detected since they can mean a disk-full condition while writing the last buffer, or suchlike */ Closeout() (* /* The chosen implementation does nothing and relies on the operating system to close the standard output file. This is not good programming practice but at least it's transportable. */ return; *) /* This function should be left "as is" */ Gobbleword(inpstring,outstring) char *inpstring,*outstring; (* register int rd=0,wr=0; while (inpstring(:rd:) != '/') (* outstring(:wr++:)=inpstring(:rd++:); *) outstring(:wr:)='0'; for(wr=0;inpstring(:++rd:) != '0';) (* inpstring(:wr++:)=inpstring(:rd:); *) inpstring(:wr:)='0'; return; *) /* This function does not have to be tailored */ Writeout(string,lenptr) char *string; int *lenptr; (* int i=0; while((string(:i:) != '0') && ( *lenptr > 0)) (* putc(string(:i++:),outfile); --( *lenptr); *) *) /* Main program -- nothing needs to be changed */ main() (* do (* /* Read up to and including 'ID/' card */ Getcard(card); *) while ((card(:0:)!='I') || (card(:1:)!='D') || (card(:2:)!='/')); Substr(filename,card,4,8); Substr(filetype,card,13,8); recfm=card(:21:); Substr(number,card,24,5); Dec2bin(&lrecl,number); Openout(filename,filetype); for (;;) (* Getcard(card); if ((card(:0:)=='E') && (card(:1:)=='N') && (card(:2:)=='D') && (card(:3:)=='/')) break; if (recfm=='V') (* Gobbleword(card,number); Dec2bin(&xlrecl,number); *) else xlrecl=lrecl; Gobbleword(card,number); Dec2bin(&ncards,number); do (* Writeout(card,&xlrecl); if (--ncards != 0) Getcard(card); *) while(ncards!=0); for(i=1;i <= xlrecl;i++) (* putc(' ',outfile); *) putc('n',outfile); *) Closeout(); *) *----------------------------- Cut here --------------------------------* ************************************* * Appendix A. The LISTSERV Library * ************************************* o User's guide . . . . . . . . . . . . . . . . . . . . (U01-001) o List Manager's guide . . . . . . . . . . . . . . . . (M01-002) o Installation guide . . . . . . . . . . . . . . . . . (S01-003) o Application Programmer's guide . . . . . . . . . . . (A01-004) o Maintenance guide . . . . . . . . . . . . . . . . . . (S01-005) o File Server Functions . . . . . . . . . . . . . . . . (U01-006) --> o Listserv-Punch Implementation . . . . . . . . . . . . (R01-007) o File Maintainer's guide . . . . . . . . . . . . . . . (M01-008) o BITNET-Oriented Presentation . . . . . . . . . . . . (P01-009) o Public Utilities Reference . . . . . . . . . . . . . (A01-010) o Licensed Utilities Reference . . . . . . . . . . . . (S01-011) o Database Functions . . . . . . . . . . . . . . . . . (U01-012) LISTSERV Document Numbers ------------------------- U 01 - 006 - 0 _ __ ___ _ | | | | Document Class -----------+ | | | | | | | | | | | | Product Number --------------+ | | | | | | | | Publication Number -------------------+ | | | | Revision Number ------------------------+ Document Class The Document Class indicates for which category of persons the publi- cation was written. The current classes are: A Documents intended for Application Programmers. These publica- tions are usually very technical. M Documents intended for Software Managers, i.e. operators, "list owners", "file maintainers", et al. P General Presentation documents intended for persons who do not have any particular knowledge in the product. These are gener- ally non-technical documents. R Reference documents defining protocols used by the product. These documents are very technical and are intended for people who have to write interfaces for the product or attempt to port it to an operating system or environment for which it was not originally written. S Documents intended for Systems Programmers, i.e. the persons responsible for the installation and operation of the product. U Documents intended for General Users. Product Number The Product Number is a unique number associated with the product to which the publication relates. Number 01 refers to LISTSERV, number 02 corresponds to the NETINFO sub-product, etc. Publication Number This is a unique number associated with the publication. Publication Numbers are assigned sequentially, disregarding the Document Class. There is a different set of Publication Numbers for each product. Revision Number This number is incremented at every release change in the publication. Fractional numbers indicate intermediate changes between two releases.