User's Manual for the ARJ archiver program, September 1991 ARJ software and manual copyright (c) 1990,91 by Robert K Jung. All rights reserved. ARJ version 2.21 release ** IMPORTANT NEWS **************************************************** Users of ARJ should read the WHATSNEW.DOC and UPDATE.DOC which contain information about the latest improvements to ARJ. ********************************************************************** TOPICS COVERED IN THIS DOCUMENT ------------------------------- INTRODUCTION TERMINOLOGY MAJOR FEATURES OF ARJ ARCHIVER BENCHMARKING RELEASE NOTES ARJR AND DEARJ PROGRAMS INSTALLATION QUICK START TO USING ARJ HOW TO CREATE AN EXECUTABLE SELF-EXTRACTING ARJ ARCHIVE CONVERTING OTHER ARCHIVE FILES TO ARJ FORMAT HOW TO USE ARJ ARJ LIMITATIONS DIFFERENCES BETWEEN ARJ AND LHARC IMPORTANT NOTES TIPS TO USING ARJ EFFICIENTLY USING ARJ WITHIN OTHER PROGRAMS ARJMENU PROGRAM USING ARJ AS A BACKUP PROGRAM THE FILESPEC "..." ARJ ERROR SITUATIONS ARJ DOS ERRORLEVELS ARJ USER ACTION PROMPTS ARJ ENVIRONMENT VARIABLE ARJ COMMAND LINE SYNTAX ARJ COMMANDS ARJ SWITCH OPTIONS ARJ_SECURITY ENVELOPE KNOWN ARJ ISSUES/PROBLEMS ARJ TECHNICAL SUPPORT ARJ AVAILABILITY DISTRIBUTORS ACKNOWLEDGEMENTS USAGE AND DISTRIBUTION POLICY FINAL COMMENTS INTRODUCTION: ARJ is the result of my desire to use my interest in compression technology to produce an archiver for personal use on my PC and on minicomputers that provides power and excellent flexibility. This all started with the invention of an LZ77 brute force hashing algorithm that outperformed all other LZ77 algorithms. Having an invention in hand gave me the confidence to enter into the compression fray. ARJ is prototyped in ANSI C and only uses ANSI C standard libraries. All platform dependent functions (file date-time modified, file attributes, etc.) are contained in a single environment source file using #ifdefs to enable one to maintain a single version of the source code for multiple different machines. The MS-DOS production version of ARJ has manually optimized compression, extraction, CRC, and output routines (in assembler). There are plans to port versions of ARJ to other platforms in the coming year pending sufficient time and funding. TERMINOLOGY: The following terms are used through this manual. ARCHIVE - This is a file containing one or more files in a compressed or non-compressed state and containing file related information such as filename and date-time last modified, etc. ARJ FILE - This is an archive created by ARJ. COMPRESSION - The process of encoding redundant information into data requiring less storage space. COMPRESSION PERCENTAGE/RATIO - The percentage compression reported by ARJ is a variation of one of the TWO standard methods of expressing compression ratio in the technical literature. ARJ uses the compressed size / original size ratio. The other method is the inverse ratio. When ARJ reports 96% as the compression ratio, that means that the compressed file is 96 percent of the original size (very little compression). Other archivers use their own methods. LHARC uses the same ratio as ARJ. EXTRACTION or UNCOMPRESSION - The processing of recreating the exact information that was previously compressed. SELF-EXTRACTION MODULE (SFX) - This is an archive that is an executable file that is capable of extracting self-contained files. TEXT MODE - In text mode, ARJ inputs the file using the C library text mode which translates the carriage return, linefeed control characters of MS-DOS to a single linefeed character. This saves space and provides the option for cross platform file extraction. On another platform, the host C library would change the single linefeed to the host text newline separator sequence. In addition, for platforms such as PRIMOS which set bit 8 in ASCII text characters, ARJ sets/resets bit 8 according to the platform extracted to. All input is stripped to 7-bit ASCII. VOLUMES - These are ARJ archives that are in sequence and have been created by a single ARJ command. Files in the volumes may span volumes in a split format. These volumes are usable archives. MAJOR FEATURES OF ARJ: Currently ranks as the best in compression in terms of size reduction of the currently available archivers including PKZIP 1.10, PAK 2.51, ARC 7.0 (ARC PLUS), LHARC 1.13c, LHA 2.13 and the new ZOO 2.10. ARJ is particularly effective with database files, graphics files, and large documents. With the "-jm" or "-jm1" option, ARJ usually compresses even smaller at a cost of time. With the appropriate compression options, ARJ is also FASTER compressing while still compressing tighter than all of the aforementioned archivers. (Against PKZIP, use ARJ -m3). Archive and individual file comments with option of inputting comments from a file. ARJ has MS-DOS 3.x international language support for the proper casing of filenames and text. 32 bit CRC file integrity check. DOS volume label support. Empty directory support. Test new archive before overwriting the original archive option. Multiple volume archives with one ARJ command. This allows the user to backup a full hard disk drive to multiple floppies. Recovery of individual files is convenient because each volume is an individual archive except for split files. No need to use SLICE with ARJ. File re-ordering facility with the option of sorting by file size, file extension, CRC value, date-time modified, filename, pathname, compression ratio, file attribute and more. String searching with context display within archive files. Built-in facility to recover files from broken archives. Self-extraction feature that is internal to the ARJ runfile. The SFX module is full-featured with a built-in help screen. Internal string data integrity check in ARJ to resist hacking a la LHARC to ICE. Archive security envelope feature to resist tampering with secured archives. This feature disallows ANY changes to a secured archive. Not even comments can be changed. Password option to encrypt archived files. Text mode data compression option to enable movement of text files from one host machine to another. Text mode also results in slightly greater file size reduction on MS-DOS machines. File extraction to screen in a paged mode to permit browsing through an archive. Specification of the files to be added to an archive via one or more list files. In addition, ARJ can generate a list file. Specification of files to be excluded from processing by ARJ. Sub-directory recursion during compression and extraction. ARCHIVER BENCHMARKING: This is information for those who plan to publish benchmark test results comparing ARJ with other file archivers. The ARJ -jm -m1 compression is intended to demonstrate the best that ARJ can do in terms of size reduction. The ARJ -m1 (default) compression is intended to compete with LHA 2.12. The ARJ -m2 compression is intended to compete with PKZIP 1.10. The ARJ -m4 is provided as a very fast compression method suitable for use in disk backups. The ARJ -e option is necessary during size benchmarks because ARJ by default stores the entire specified pathname in the archive as opposed to other archivers which strip path specs. The very size of the ARJ runfile adds significantly to the compression and extraction times when testing smaller archives. It would be fairer to use larger archives or run the archivers from a RAMDISK. Also, compressed executables affect the test times, so that all archivers should be tested in uncompressed form. The ARJ comment header adds bytes to the size of an ARJ archive. The larger file headers also add bytes to the size of an ARJ archive. So, in size benchmarks compressing a large number of small files, the header size difference will be evident. RELEASE NOTES: The only difference between the registered version and the shareware version is the version/copyright message. This release states that it is NOT for commercial use. Commercial, institutional and government users must purchase a site license to obtain a registered version of ARJ for their use. However, commercial, institutional and government users may use ARJ for evaluation purposes for a period of 30 days. See the LICENSE.DOC for details. While evaluating ARJ, you should use the "-jt" (test archive) option to verify new ARJ archives of your data. This version has been tested under DOS 2.11, 3.3, 4.01, and DOS 5.0. The executables ARJ.EXE and REARJ.EXE can be compressed by DIET, LZEXE, and PKLITE (not PKLITE 1.05). Here is a suggested command that will test ARJ on all of your files: ARJ a -r -jt -y "-vasdel a:\vol.*" a:\vol c:\*.* ARJR AND DEARJ PROGRAMS: The new programs ARJR and DEARJ are available to registered and licensed users of ARJ. ARJR is the ARJ program minus the help screen and SFX modules. DEARJ is the ARJR program minus the archive creation/modification functions. See the LICENSE.DOC and ORDERFRM.DOC for more information. UNARJ and DEARJ are NOT the same program. INSTALLATION: I assume that you have a copy of the self-extracting ARJ module named ARJ221.EXE. Typing ARJ221 [RETURN] at the DOS command prompt will initiate the self-extraction feature. ARJ221 will by default extract its files to the current directory. When ARJ221 starts, you will see several lines of text describing ARJ and then a line asking if you wish to continue extraction. Entering "yes" or "y" will continue the extraction. If there are any duplicate filenames in the current directory, the program will prompt you for overwriting. You can say "yes", "no", or "quit". To install the ARJ software, simply copy ARJ.EXE, REARJ.EXE, REARJ.CFG, and ARJSORT.COM to one of the directories named in your DOS PATH statement found in your AUTOEXEC.BAT. On many PCs, this directory may be C:\DOS or C:\BIN. With MS-DOS 3.0 and above, you can use path notation "\BIN\ARJ e archive" to use ARJ. You may, of course, prefer to use ARJ 1.00 or higher to extract the contents of ARJ221.EXE file manually. Example: ARJ e ARJ221.EXE \temp\ QUICK START TO USING ARJ: Please note that switch options may be placed anywhere in the command line. To create an ARJ archive containing all of the files in the current directory: ARJ a archive To create an ARJ archive containing all files with the ".DOC" extension in the current directory: ARJ a archive *.DOC To create an ARJ archive containing all of the files in the named directory and all files in subdirectories of the named directory: ARJ a -r archive named_directory\*.* To create an archive containing files without path specs: ARJ a -e archive named_directory\*.* For maximum compression, use the "-jm" or "-jm1" options. For better speed, use the -m2 option. ARJ a -r -jm archive named_directory\*.* ARJ a -r -m2 archive named_directory\*.* To create an ARJ archive containing the full specified pathnames of the stored files including any drive and root specs. ARJ a -r -jf archive C:\top_directory\*.* To backup your hard disk to multiple volume archives on drive A with archive testing and archive bit resetting: ARJ a -r -jf -jt -a1 -b2 -vvas A:backup C:\*.* To extract all of the files in an archive to the current directory: ARJ e archive To extract all of the files in an archive to a named directory: ARJ e archive named_directory\ To extract all files with the ".DOC" extension to the current directory: ARJ e archive *.DOC To extract all of the files in an archive recreating the original directory structure: ARJ x archive original_directory_name\ The ending "\" character is optional. To extract all of the files in an archive containing absolute pathnames to the original paths: ARJ x -jf archive To list all of the files in an archive: ARJ l archive HOW TO CREATE AN EXECUTABLE SELF-EXTRACTING ARJ ARCHIVE The command "ARJ f -je archive ..." will create a full featured self-extracting archive from an already built archive. No error message will be displayed concerning not finding "...". The command "ARJ f -je1 archive ..." will create a smaller self-extracting archive. Syntax: ARJ f -je archive ... produces archive.exe Under DOS systems other than 2.11, 3.2, 3.3, 4,0, and 5.0 you may have to rename the self-extract module to ARJSFX.EXE to do the extraction. See the "-je" option for more information. CONVERTING OTHER ARCHIVE FILES TO ARJ FORMAT Included with this software is the program REARJ. This program can be used to individually or collectively convert archive files from other formats to the ARJ format. REARJ *.ZIP *.ARC *.LZH will convert all ZIP, ARC, and LZH archives in the current directory to the ARJ format. See the REARJ.DOC for more information about REARJ. HOW TO USE ARJ: If you type ARJ [return], you will see a simple help screen. If you type ARJ -? [return], you will see more detailed help information. ARJ LIMITATIONS: ARJ will accept up to: 64 filenames/wildnames on command line 16000 filenames resulting from wildnames 8000 filenames/wildnames to exclude 8000 ARJ filenames resulting from wildnames 2048 character comments (up to 25 lines or 1 file) For compressing, ARJ requires approximately 282,000 bytes plus the memory necessary to store all of the pathnames to be archived when using the default compression method (-m1). For extracting, ARJ requires approximately 166,000 bytes plus. The program DEARJ (available to registered users) requires approximately 123,000 bytes plus. There is no limitation on the number of files that can be stored in one archive. However, each add command can only add a maximum of 16000 files at a time depending upon memory availability. I expect that a normal maximum of 5000 to 8000 filenames can be handled without running out of memory during the compress phase. If you do not have enough memory, you should use the "-l" switch to dump the filenames to a list file. You can then break the list file into smaller files and use multiple ARJ commands to archive all of the files. Example: ARJ a -r -lname.lst archive \*.* If the above command fails due to lack of memory, split the name.lst file into smaller pieces named name1.lst, name2.lst, etc. Then execute: ARJ a archive !name1.lst ARJ a archive !name2.lst . . ARJ currently does NOT differentiate between wildnames like "C:*.*" and "C:\*.*". ARJ would expand each of those two wildnames into a list that could be up to twice as long as necessary. When updating an archive, ARJ creates a temporary file named ARJTEMP.$nn in the current directory or work directory. While ARJ is scanning a wildcard filespec, ARJ will change the name of the target archive to ARJTEMP.$nn while the scan is proceeding to avoid including the archive itself in an add or move command. Also, as a result, you cannot add a file named ARJTEMP.$nn to an ARJ archive. Please note that the name of this temporary file may change at a future revision of ARJ. DIFFERENCES BETWEEN ARJ AND LHARC: The archive formats are NOT compatible. The compression and decompression algorithms are NOT compatible. ARJ only supports its own archive format. ARJ by default stores the full specified pathname of files archived minus any drive letter and root symbol. The "e" and "x" commands will by default extract all of the files in the archive without using date time stamps to select files. You should specify "-u -y" to duplicate LHARC functionality. The "f" command in ARJ requires the -r switch to be identical to the LHARC f command. The ARJ archive suffix is ".ARJ". IMPORTANT NOTES: When using the "-w" working directory switch, ARJ does not check on space availability before overwriting the original archive if it exists. Be sure that you have enough disk space for the new archive before using the "-w" switch. If ARJ aborts in this situation because of disk space, ARJ will keep the temporary archive. By default, ARJ does not see hidden or system files. ARJ will process system and hidden files when you either specify the "-a" switch. Like LHARC and PKZIP, ARJ requires extra disk space to UPDATE an archive file. ARJ will backup the original archive while it creates the new archive, so enough room must be available for both archives at the same time. Currently, ARJ will not extract overwriting a readonly file. TIPS TO USING ARJ EFFICIENTLY When archiving to diskettes, you should use the "-w" option to set a working directory on your RAMDRIVE or hard disk drive to speed up building the archive. Using the "-js" option saves time by not compressing archives. You should use the "-jt" option when archiving to diskettes or when you really want to be sure that ARJ will be able to extract what you have archived. There are cases where your hardware or memory resident software will corrupt your work, so the "-jt" option is excellent insurance. You should use the "-e" option whenever you do not need to store pathnames in an archive that you are creating. This will save space. Convert an ARJ archive into a self-extracting archive with a command like the following: ARJ f archive ... -je To capture a comment from an ARJ archive, use the following command: ARJ e archive ... -zcomment.txt ARJ has several compression methods that provide size/time tradeoffs. Method 4 "-m4" is about twice as fast as method 1. The "-jm1" and "-jm" options modify the "-m1" and "-m2" options to provide even greater compression at a cost in time. USING ARJ WITHIN OTHER PROGRAMS Since ARJ uses over 280,000 bytes of memory during compression, it is difficult to use ARJ in a large application program unless that program swaps itself out of memory when it executes DOS commands like ARJ. However, there is at least one shareware program available that will automatically swap your large application program out of memory whenever it shells out to DOS to execute a command. The program SHROOM by Davis Augustine should be able solve this memory problem for you. The latest version is named SHROM17D.ZIP on Channel1 BBS. According to the SHROOM documentation, you can reach the author at: CompuServe id 72230,3053 Davis Augustine P.O. Box 390178 Cambridge, MA 02139 This is not an endorsement of the product SHROOM. The easiest way I have found to use this product is to type: SHROOM COMMAND SHROOM -v COMMAND will let you see SHROOM in action when you shell out to execute a DOS command. ARJMENU PROGRAM A new program called ARJMENU by Michael McCombs will be released shortly. As far as I know, it is the only menu-driven interface program that supports ALL of the features of ARJ. This program is aimed at users who hate command line interfaces. ARJMENU allows the user to pick and choose ARJ options. The user does not have to remember the ARJ switch syntax. This program supports ARJ version 2.21. You can reach the author at: Internet/ARPANet: mccombs@sumax.seattleu.edu Michael McCombs 517 Ninth Ave. #310 Seattle, WA. 98104 USING ARJ AS A BACKUP PROGRAM ARJ can be used as a substitute for a backup program. However, it does not have the diskette critical error handling or data recovery facilities of a FASTBACK, etc. So you should be sure of the reliability of your diskettes. The following partial command lines illustrate a full backup command, an incremental update command, and a restore command. The only parts missing are the names of the files to backup/restore. ARJ a A:backup -r -vvas -a1 -b2 -i1 -js -jt -jiC:\backup.inx -wC:\ -m4 ARJ a A:backup -r -vvas -a1 -b1 -i1 -js -jt -jiC:\backup.inx -wC:\ -m4 ARJ x A:backup -vv -jycn You should familiarize yourself with the above switches so that you can modify the above command lines as needed. If you have a RAMDRIVE large enough, you should change the "-w" option to point to the RAMDRIVE. If you have enough free hard disk space, you can build all of the diskette volumes on the hard disk for later copying to diskette. In this case, you will need to change the name of the archive to "C:backup" or similar. The "-vvas" option should be changed to "-v360", "-v720" or whatever is appropriate for your diskette size. Please note that 360, 720, 1200, and 1440 are abbreviations for the standard diskette sizes. Other sizes will require your entering the entire number. Another change is to add the option "-y" which will turn off the "Ok to proceed ..." prompt. Lastly, if the "-w" option is pointing to the hard disk, you should remove the "-w" option entirely. ARJ a C:backup -r -v360 -m4 -y ***IMPORTANT*** Only a maximum of 100 volumes can be built on disk at one time because of the volume suffix rolling over at *.A99 to *.A00. A workaround to this limitation is to archive your hard disk with several ARJ commands building different volume names. Both backup commands will pause for a "system command". You can execute DOS commands at this point. This is a suitable place to do a "dir a:" to make sure that your disk is formatted and has enough free space on it. You may need to execute "format a:" or "del a:\". A very useful command might be "QDR A:". QDR is a utility from Vern Buerg. You will need to type "exit" to allow ARJ to continue. If the backup fails after completing one or diskettes, you can restart at the next archive after the last successful volume. You will need to examine the information in the "backup.inx" file to find the name of the file that is to start this archive. It will usually be the same as the last filename in the previous volume. You will also need the byte position to start in this same file. That can be determined from the information in "backup.inx". You can then retype the exact same backup command as before with a few changes. You will append the right ".Ann" suffix to the archive name and you will add the options "-jx" and "-jn" with the proper arguments. For example, if the above backup command failed during diskette two, on filename "DOS\MODE.COM" which was started at byte 125. This would be the correct command: ARJ a A:backup.A01 -r -vvas -a1 -m4 -jx125 -jnDOS\MODE.COM The most error prone step is determining the correct "-jn" option. Be sure to spell the filename exactly the same as it appears in the "backup.inx" file. If the restore fails after one or more diskettes, simply retype the same command as before but add the right ".Ann" suffix to the archive name. If ARJ has aborted because of a disk full on a file split between volumes, you will have to restart at the first volume that contains that file. THE FILESPEC "..." Several times in this document and the UPDATE.DOC file, there is mention of the filespec "...". This filespec is chosen so as to not match any existing filename. ARJ will NOT generate an error or warning for not matching "..." specifically. ARJ ERROR SITUATIONS: ADD: If a user specified file is not found during an add, ARJ will continue processing, and will keep the archive and terminate with an error condition. In a disk full condition or any other file i/o error, ARJ will promptly terminate with an error condition and delete the temporary archive file unless the user has specified the "-jk" switch. MOVE: ARJ will only delete files that have been successfully added to the archive. If you have specified the "-jt" (test) switch, ARJ will abort on any error. If you specify the "-jk" switch, ARJ will not delete the temporary archive upon an abort. EXTRACT: In a disk full condition or any other file i/o error, ARJ will promptly terminate with an error condition and delete the current output file. CRC ERRORS OR BAD HUFFMAN CODE: In the case where an ARJ archive has been corrupted ARJ will report a CRC error or a Bad Huffman code error. These corruptions can be the result of an unreliable diskette, a computer memory problem, a file transfer glitch, or incompatible CACHING software. Most of these errors are the result of file transfer glitches and bad diskettes. A few are the result of an incompatible interaction with SUPER PCKWIK 3.3 advanced diskette support. ARJ DOS ERRORLEVELS: 0 -> success 1 -> warning (specified file to add to archive not found, specified file to list, extract, etc., not found, or answering negatively to "OK to proceed to next volume..." prompt) 2 -> fatal error 3 -> CRC error (header or file CRC error) 4 -> ARJ-SECURITY error or attempt to update an ARJ-SECURED archive 5 -> disk full or write error 6 -> can't open archive or file 7 -> simple user error (bad parameters) 8 -> not enough memory ARJ USER ACTION PROMPTS: ARJ prompts the user for action at certain times. There are several types of prompts. One is for yes/no permission, another is for a new filename, another is for archive comments, and one other is for search strings. The yes/no prompts will also accept "quit" for program termination and "always" to bypass further user prompts. Since ARJ uses STDIN for user input, be careful about typing ahead anticipating prompts. ARJ may prompt you for an unexpected action and use your earlier input. The "-jy" option lets you change the prompting modes to single character query mode. See the section on "-jy" for more information. ARJ ENVIRONMENT VARIABLE: ARJ will first look for an environment variable named ARJ_SW and use its value as switch options for ARJ. If ARJ finds such an environment variable, it will display a message to that effect. You can inhibit ARJ from using this environment variable by using the "-+" option. SET ARJ_SW= Example: SET ARJ_SW=-w\temp -k -e Do NOT add any blanks after the variable name ARJ_SW. As in LHARC, command line switches can be selected to override ARJ_SW settings. ARJ will allow you to use a different switch character "-" or "/" in ARJ_SW and in the command line except when using the "-ju" (unix) option. If the ARJ_SW environment variable specifies a filename (text not beginning with a switch character), ARJ will open that filename and scan it looking for a line of text that begins in column 1 with the same letter as the ARJ command being executed. The following text is processed as the ARJ_SW switches. This allows each ARJ command to have its own switch settings. If no such command text is found, then no switch settings are processed as the ARJ_SW switches. SET ARJ_SW=C:\ARJ\ARJ.CFG C:\ARJ\ARJ.CFG contains: a -jm1 -jt -i1 l -jp e -i1 c -zcomment.txt In the above example, any ARJ "a" commands will use "-jm1 -jt -i1" as the ARJ_SW switch options. The ARJ_SW variable or the ARJ_SW configuration file switch settings may NOT have quoted switches such as "-vasformat a:". ARJ COMMAND LINE SYNTAX: ARJ [-[-|+|