EPUB2TXT by Ben Collver =============================================== * Description * DOS Instructions * Unix Instructions * Requirements * Known Limitations * Regression testing * Source code Description =========== epub2txt.awk converts EPUB files to plain text. It can run on DOS and Unix. Output: * index.html - HTML content, restructured for readability. HTML content requires LFN support on DOS. * plaintxt/index.txt - Plain text with UTF-8 encoding * plaintxt/index.dos - Plain text with CP437 encoding, hard wrapped DOS Instructions ================ * LFN (Long File Name) support is required to run this script on DOS. If your DOS lacks LFN support, run: doslfn.com * Make sure the EPUB2TXT environment variable matches your directory path. SET EPUB2TXT=C:\epub2txt * Change to the directory where you want the output to go cd \book * Run the script \epub2txt\epub2txt.bat \Documents\book.epub It can take a long time to process, depending on the size of the EPUB file and how puny the DOS machine is. Be patient and perhaps go do something else for a while. WARNING: Don't use GAWK.EXE to run this script on DOS. When i tested this script using DJGPP GAWK.EXE on FreeDOS, it was prone to FAT corruption. It seemed to be affected by the BUFFERS= setting and the memory manager, but i could not find a stable, working configuration. So i used NAWK32.EXE instead. Unix Instructions ================= * Make sure all the required utilities are in your path * Change to the directory where you want the output to go cd ~/book * Run the script ~/epub2txt/epub2txt.awk ~/Documents/book.epub It can take a while to process. Requirements ============ I have already included the required utilities to run this script on DOS. I list them in parenthesis below. This script requires the following commands in your path: * cp (gnucp.exe, from DJGPP cp.exe) * awk (nawk32.exe) * find (gnufind.exe, from DJGPP find.exe) * unzip (https://infozip.sf.net/) * utf8tocp (gopher://tilde.pink/1/~bencollver/files/dos/util/utf8tocp/) * webdump (gopher://codemadness.org/1/phlog/webdump/) * xml2tsv.awk (xml2tsv.bat) On DOS, this script also requires the following in your path: * comp.com (from FreeDOS) * deltree.com (from FreeDOS) * doslfn.com (from FreeDOS) * redir.exe (from DJGPP) Notes: * This script uses a modified version of utf8tocp in order to transliterate "unknown" Unicode codepoints to meaningful CP437 and ASCII equivalents. Known Limitations ================= * No support for SVG images * No precautions against malicious/pathological EPUB files Regression testing ================== See tests/readme.txt for details on regression testing. On DOS, the test scripts rely on deltree.com Source code =========== Download or view the source code at: