23-Apr-86 19:32:57-PST,5277;000000000001 Date: Wed, 23 Apr 86 20:04:12 EST From: Edward_Vielmetti%UMich-MTS.Mailnet@MIT-MULTICS.ARPA To: info-ibmpc@USC-ISIB.ARPA Subject: Breakup.doc Documentation for BREAKUP December, 1983 Charles Roth BREAKUP is a utility for "breaking up" large files on MSDOS systems. It is intended primarily as a companion utility for The FinalWord editor, but it may have other uses as well. BREAKUP was written in C by Charles Roth, and is in the public domain. Many text editors for microcomputers can only deal with files of a certain maximum size. We know, by Murphy's Law, that some files will always be larger than any given size. Thus, there is a need to be able to break up large files in a convenient way. Also, when moving a collection of files from one machine to another, it is sometimes easier to concatenate all of the files together, ship them as one file, and then break them up again. BREAKUP uses Unix-ish command arguments to allow the user to break up a file in a variety of ways. Each "command" is actually a pair of arguments that specify the next place to break the file. The user can tell BREAKUP to break after so many bytes, or after so many lines, or when a particular string is encountered. The general syntax looks like: BREAKUP File.Ext -C1 A1 -C2 A2 -C3 A3 etc.... where "file.ext" is the name of the file to be broken, and each "-Cn An" pair specifies a breaking point (as described below). The pieces of the broken-up file are put in the files File.000, File.001, and so on. Note that there is usually one more file than the number of breaking points specified. The command specifiers for the "-Cn An" pairs are: -B nnnn Break after nnnn Bytes, where nnnn is a decimal number -L nnnn Break after nnnn Lines, where nnnn is a decimal number -S string Break after the next occurrence of "string" -LB nnnn Break at the first end-of-line after nnnn bytes -LS string Break at the first end-of-line after next occurrence of "string" -R Repeat the last command specifier indefinitely EXAMPLES: BREAKUP File.Ext -b 1000 -b 1000 breaks "file.ext" into three pieces. File.000 would contain the first 1000 bytes, File.001 would contain the second 1000 bytes, and File.002 would contain everything else that was in File.Ext. BREAKUP File.Ext -l 1000 -r would chop File.Ext after every 1000 lines. (The last piece might be smaller than 1000 lines, of course.) BREAKUP File.Ext -l 200 -s Mom -s "Apple Pie" breaks File.Ext at 3 points: at the (end of the) 200th line; at the next occurrence of the string "Mom" in the text; and at the first occurrence of the string "Apple Pie" after "Mom". (Quotes are optional and are not part of the string searched for. They are required if the string contains one or more blanks.) NOTES: 1) Breaking at a point is inclusive. That is, breaking at 200 bytes means the first piece will contain the 200th byte. Ditto for lines and strings, i.e. breaking at "Mom" means the piece will end with "Mom". 2) The size of a file in bytes has two slightly different meanings. To programs written in C (BREAKUP, FinalWord) the end of a line is marked by a single character. Inside MSDOS, the end of a line is marked by the two characters Carriage-Return and Line-Feed. Thus, breaking off at piece at 100 bytes may result in a file that (according to DIR) is slightly larger. 3) The -s strings may include control characters. Of course, you can't just type the control characters as part of the -s string; MSDOS will try and interpret them right away. So instead, BREAKUP uses a special notation (borrowed from the C language) for control characters that always begins with a "\" (backslash). Similarly, since " and \ already mean something special, we must have a way to represent a single " or \. These special notations are listed below. \ddd is the character with the OCTAL value ddd. Must be 3 digits. \\ is a single backslash \" is a double-quote character \n is a newline (end-of-line character) The last sequence is particularly useful. Breaking at -s "\nA" would mean "break at the next place where there is an A at the beginning of the line". Warning: do NOT try to break about a null character, i.e. \000. Since the C string routines use \0 as a string terminator, BREAKUP will not understand its use as a breakpoint. 4) BREAKUP prints out the filenames of the pieces as they are produced. You can redirect this output to a file, if you wish, by placing >filename after the list of breakpoint specifiers. (You do not need MSDOS 2.x to do this.)