HTMSTRIP.TXT                         1                         Mar 10, 2001

WinXX NOTICE:  As with  most  DOS-based  utilities,  this  program  doesn't
understand the weird subdirectories,  long  filenames,  invalid  characters
that are possible under Windows.  This is not a bug.  It's just the way DOS
applications work.  For compatibility reasons,  most  versions  of  Windows
maintain a short filename for each file (in WinNT,  you  can  probably  say
"DIR /X" in DOS to find them).  The short filenames are the ones  with  "~"
characters in them (like "MYFILE~1.TXT").  The  short  filenames  are  what
these utilities will process.

The HTMSTRIP.EXE program attempts to  read  HTML  pages,  remove  the  HTML
coding, and write the file out as something more useful.  Features of  this
program:

  * Ideal way to prep HTML documents for later  retransmission  via  e-mail
    (which doesn't support the fonts, pictures, etc).  Beats out Netscape's
    Save As Text option hands down.
  * Can be run across an entire  subdirectory  (for  example,  your  entire
    cache subdirectory), and will only process the HTML documents  that  it
    finds. (There are some options on this.)
  * Removes all embedded HTML commands.
  * Recodes the standard HTML  "entity  references"  (so  "&copy;"  becomes
    "(c)"). The actual replacements are coded in a  user-modifiable  lookup
    file.
  * Handles standard indent, heading, selection groups, menus, tables, etc.
  * Supports most  of  what's  in  the  HTML  3.2  specification  and  some
    additional 4.0 features.
  * Reflows all text as appropriate.
  * Can provide character-translation table to filter out  characters  that
    only work under Windows.
  * Can indicate bolding, underlining, etc with user-specified characters.
  * Optionally,  will  replace  Link,  Image,  and  Input  references  with
    user-definable text representations.
  * Optionally, alerts you to possible errors in the HTML code itself.
  * Supports ISO 8859/1 8-bit single-byte (Windows), 7-bit DOS  ASCII,  and
    8-bit DOS ASCII character sets.
  * Optionally creates a logfile of file activity.
  * Pressing escape stops the program early.

HTML  codes  are  surrounded   within   <...>   indicators.    For   upward
compatibility reasons, Web  browsers  ignore  any  codes  that  they  don't
understand and just process the ones they can handle.

HTMSTRIP removes all HTML codes.  It also handles the standard HTML "&xxx;"
"entity references" (for example, "&copy;" is replaced by "(c)").  You  can
add or  change  these  replacements  as  desired  by  using  the  INI  file
(documented later).


Quickie instructions:

Okay!  You hate to read.  I know that.  And there aren't any cute  pictures
in this documentation and, like everything I write, it's way  too  long  to
keep your attention for  long.   So,  let's  bottom  line  it;  what's  the
quickest way to use this program without learning any of the options?


HTMSTRIP.TXT                         2                         Mar 10, 2001

Let's presume you're running under  Windows.   Take  the  HTMSTRIP.EXE  and
HTMSTRIP.INI files from the HTMSTymm.ZIP file and copy  them  to  the  same
subdirectory somewhere.  (They should be in the same  subdirectory  already
since that's how uncompressing them would  have  gone.)  This  subdirectory
should be in your path.  If you're not sure what your path is, hop  to  DOS
and type "SET".  There should be a line  shown  that  says  something  like
"PATH=C:\;C:\DOS;C:\WINDOWS".  I wouldn't advise copying  HTMSTRIP.EXE  and
HTMSTRIP.INI to your WINDOWS subdirectory.  Maybe your root?

Get on the Web and save the source of an HTML document to your  hard  disk.
This is done from the Netscape Navigator by bringing up a page  and  saying
"Save As...".  Remember the file name and what subdirectory you  saved  the
document to.   Just  for  example's  sake,  let's  say  the  file  name  is
"UPEPIS.HTM".

Hop to DOS.  (You can run HTMSTRIP from the Run option in Windows but  it's
easier to explain this  way.)  Make  the  directory  where  you  saved  the
document your default subdirectory.  (This is usually done with a series of
"CD" commands in DOS.)

Now, type:

        HTMSTRIP

You didn't pass in any parameters so HTMSTRIP will request the name of  the
file to process.  Enter the name of the HTML file.  In our case, this would
appear like:

        Enter filespec to process? UPEPIS.HTM

Presuming you did everything correctly, the HTMSTRIP program will read  the
HTML file and tell you it created a new file with  the  file  extension  of
".OUT" (in our case, "UPEPIS.OUT").

That was pretty easy.  Now, hop back into Windows and bring the new file up
in your text editor (use Write or something else that uses  TrueType  fonts
instead of NotePad).  With luck, you'll see the file looking similar to how
it did when you were viewing it under your Web browser.  The difference  is
that it's now a properly-formatted text document which fits on  the  screen
and can be e-mailed to someone.

Hop back into DOS.  Type "HTMSTRIP /?".  You'll see there are  a  bunch  of
other parameters that you can pass in.  If  you're  not  pleased  with  the
output file that was created, you might want to read  the  quick  on-screen
description of each option and then consult the HTMSTRIP.TXT file for  more
instructions about anything that sounds interesting.   (Since  the  program
doesn't alter the source HTML file, you can keep rerunning the program with
different parameters to see how they affect things.) Chances are, you won't
want to revise any of the system defaults at least initially.  If you  find
yourself consistently needing to change some options,  you  might  want  to
edit the HTMSTRIP.INI  file  to  specify  those  new  defaults.   Read  the
BRUCEINI.TXT file for information on overriding defaults.

Note that the instructions tell you you can use  wildcards  for  the  input
file name.  You can do something like "HTMSTRIP *.HTM" and it will  process
every file with an ".HTM" extension in your default subdirectory.


HTMSTRIP.TXT                         3                         Mar 10, 2001

HTML codes:

HTMSTRIP is also tuned to allow it  to  specially-handle  several  embedded
HTML codes found through HTML version 3.2.  You can go to http://www.w3.org
to get more information about the HTML  standards.   These  codes  are  the
following:

                           HTML codes supported

                 Supported
Element          Attributes   Description

<!-- ... -->                  Comments (skip)
<A ...>...</A>                External link
                 HREF=site      Start of hypertext link
                 ID=anchor      Establishes target for hypertext links
                 NAME=anchor    Establishes target for hypertext links
<AREA>                        Client-side image hotspot
                 HREF=site      Hypertext link
                 ALT=text       What to display if text-only environment
<B>..</B>                     Bold text
<BASE ...>                    Establishing a root directory
                 HREF=site      Prefix to add to unqualified sites
<BLOCKQUOTE>...</BLOCKQUOTE>  Indented block of text (same as <BQ>...</BQ>)
<BR>                          Forced line break
<CAPTION>...</CAPTION>        Title for a table block
<CENTER>...</CENTER>          Centering text block
<DD>                          Term definition
<DIR>...</DIR>                Directory list of items (obsolete)
<DL>...</DL>                  Definition list block
<DT>                          First term of definition list/glossary
<EM>...</EM>                  Emphasize text
<H1> to <H6>...</H1> to </H6> Heading items
<HR>                          Horizontal rule
<I>..</I>                     Italicize text
<IMG ...>                     Image
                 SRC=site       Location of the image
                 ALT=text       What to display if text-only environment
<INPUT ...>                   User input
                 TYPE=CHECKBOX  Type of input -- shown as [_]
                 TYPE=HIDDEN    Type of input -- suppress
                 TYPE=RADIO     Type of input -- shown as (_)
                 CHECKED        Makes [X] or (X)
                 SIZE=n         Specifies length for input fields
                 VALUE=text     Specifies default value for input fields
<LI>                          Menu/Ordered/Unordered/Directory list item
<MAP>...</MAP>                Client-side image map
<MENU>...</MENU>              Menu listing block (obsolete)
<OL>...</OL>                  Ordered listing block (numbering's skipped)
<OPTION>                      Used for single/multiple choice menus
<P>                           Paragraph indicator
<PRE>...</PRE>                Preserve spacing block (preformatted text)

Table Continues...

HTMSTRIP.TXT                         4                         Mar 10, 2001

                     HTML codes supported (continued)

                 Supported
Element          Attributes   Description

<SCRIPT>...</SCRIPT>          Java script blocks are ignored
<SELECT>...</SELECT>          Block for single/multiple choice menu
                 MULTIPLE       Allow for multiple selections
<STRONG>..</STRONG>           "Strong" text (HTMSTRIP treats this tag
                              the same as <B>..</B>)
<TABLE>...</TABLE>            Table block
<TD>...</TD>                  Table data (cell)
                 ALIGN=spec     How to align the cell (default is LEFT)
                 COLSPAN=n      How many columns to span with this cell
                 ROWSPAN=n      How many rows to span with this cell
<TH>...</TH>                  Table heading
                 ALIGN=spec     How to align the cell (default is CENTER)
                 COLSPAN=n      How many columns to span with this cell
                 ROWSPAN=n      How many rows to span with this cell
<TITLE>...</TITLE>            Title item
<TR>...</TR>                  Table row
<U>..</U>                     Underlining text
<UL>...</UL>                  Unordered listing

If you run across other codes that become vital, let me know and  I'll  see
about handling them somehow.


How to get HTML files:

Some people who are using regular Web  browsers  like  Mosaic  or  Netscape
don't realize that they're automatically saving HTML files  to  their  hard
disk throughout every Web session.  That's because  just  about  every  Web
browser saves the most-recently accessed files from the Web (including HTML
source code, GIF's, and JPG's) on your hard disk and reads them from  there
instead of requiring you to download them every  time  you  go  back  to  a
previous page.  This is typically settable by you under  "Preferences"  and
"Cache" on your Web browser.


HTMSTRIP.TXT                         5                         Mar 10, 2001

I usually set my Web browser to have a huge cache,  maybe  10MB.   Anything
beats downloading the same pages over again even at 28.8K.  And I make sure
that I do not have anything specified like "clear cache at the end of every
session".  Then I just go through  the  files  in  the  cache  subdirectory
afterward and reprocess them.

Two disadvantages to a cache...  It takes up hard disk space but, hey,  the
Web browser is typically in Windows so why are you surprised.   The  second
disadvantage is that if the page actually  changes  between  sessions,  you
typically won't notice the new page as long as it remains in your cache. If
you think a page is still in cache and should have been changed but didn't,
you can typically ask your  Web  browser  to  reload  the  page.   On  some
browsers, this is shown as an arrow in the form of a circle.

HTMSTRIP can process  the  entire  cache  subdirectory.   It  automatically
detects non-HTML files for you and processes accordingly.  It  creates  new
text file versions of just the HTML pages it finds.

Another great way to get HTML pages is to use the URL-minder service at:

        http://www.netmind.com

This is a free service which automatically tells you whenever a Web  page's
contents changes.  If you use the advanced features, you can have  the  Web
page's HTML code sent to you as a file attachment (it's easier than dealing
with the "embed" option too).  Then you can run HTMSTRIP on  the  resulting
file.


Specifying parameters:

Parameters for this program can be set in the  following  ways.   The  last
setting encountered always wins:
  - Read from an *.INI file (see BRUCEINI.TXT file),
  - Through the use of an environmental variable  (SET  HTMSTRIP=whatever),
    or
  - From the command line (see "Syntax" below)

HTMSTRIP also allows you to define:
  - How "entity references" (things like "&reg;") are shown
  - How "symbolic references" (things like "[input]" and "<B>") are shown
  - Which characters should be filtered into other characters (things  like
    showing "" as "'" -- none of these should actually appear on Web pages
    by the way)

These are explained in sections at the end of this documentation.


HTMSTRIP.TXT                         6                         Mar 10, 2001

Syntax:

    HTMSTRIP [ filespec | (filelist) | @listfile ] [ /Cpath ] [ outfile ]
      [ /EXT=.xxx ] [ /COPY=path ] [ /CREATE ] [ /ALL ] [ /ATTR=attribs ]
      [ /WIDTH=n ] [ /FORCE ] [ /RULE=s ] [ /BORDER=c ] [ /BUFF=n ]
      [ /SPACES ] [ /RSPACE ] [ /WARNINGS ] [ /-TABLE ] [ /-INDENT ]
      [ /CPn ] [ /A=spec ] [ /IMG=spec | /IMGALT=spec ]
      [ /MAP=spec | /MAPALT=spec ] [ /ALTONLY ] [ /HTTP=cc ] [ /-INPUT ]
      [ /Linitfile ] [ /FILTER | /FILTER=filename ]
      [ /LOG=logfile ] [ /T=temp_dir ] [ /MONO ]
      [ /Iinitfile | /-I ] [ /-ENV ] [ /? ] [ /?HEX ]

where:

"filespec" tells the routine which file or files are to be processed.   The
specification can include path and wildcards if  desired.   Typically,  the
file names are  *.HTM  files.   If  no  input  specification  (filespec  or
@listfile) is provided, you'll be prompted for one.   If  no  extension  is
provided, ".HTM" is presumed.  (If you want to process a  file  which  does
not have an extension, include the trailing period on the file  name,  such
as "HTMSTRIP HTTP_WWW." (with the period in there).

"(filelist)" allows you to specify multiple files to be processed from  the
command line.  File names should be separated by spaces.  They may  include
drive, path, and wildcard information.  Remember that a command line in DOS
cannot exceed 127 characters so you're limited as  to  how  many  different
file specifications you can provide in this fashion.

"@listfile" allows you to have a variety of file specifications saved in  a
text file named "listfile".  Each line in the file should  consist  of  one
file specification, each of which can  include  a  path  and  wildcards  if
desired.  Blank lines and lines  beginning  with  semi-colons,  colons,  or
quotes are ignored.  If no input specification (filespec or  @listfile)  is
provided, you'll be prompted for one.

"/Cpath" specifies that the cache is found in  a  particular  subdirectory.
This allows you to specify a default  location  in  your  *.INI  file  (see
BRUCEINI.TXT) and just specify something like "A*.HTM"  for  the  files  to
process.  Note, however, that if you don't use *.INI files, it's easier  to
just pass in the input file path with the "filespec"  parameter  ("HTMSTRIP
*.HTM /C\CACHE" and "HTMSTRIP \CACHE\*.HTM" are  the  same).   Defaults  to
your current default path.  If the input filespec includes  drive  or  path
information, this will override the /Cpath specification.

"outfile" is the name of the output file to create.  Is overwritten without
prompting if it exists already.  If no output file name  is  provided,  the
routine will use the infile  and  provide  an  extension  of  *.OUT.   (The
default .OUT extension can be overridden using the /EXT=.xxx parameter.) An
outfile cannot be specified if wildcards or  @listfile  are  used  for  the
input file specification.

"/EXT=.xxx" allows you to specify a different default  file  extension  for
the output file.  This parameter only matters  if  you  do  not  explicitly
specify an output file name.  Initially defaults to "/EXT=.OUT".


HTMSTRIP.TXT                         7                         Mar 10, 2001

"/COPY=path" specifies that the output files (for example,  BRUCE.OUT  when
the input was BRUCE.HTM) are to be created in the  specified  subdirectory.
By default, the program creates the output files in the same  path  as  the
input files.  If the subdirectory does not exist, you will be prompted  for
whether to create it or not based on the value of the /CREATE parameter.

"/CREATE" automatically creates the output subdirectory  if  /COPY=path  is
specified.  The default is "/-CREATE"; if the subdirectory  is  not  there,
the program prompts whether it should be created or not.

"/ALL" says that if the program encounters what it thinks is  just  a  text
file, it should take the file and try to fix up CR/LF problems (Unix  files
end with LF's instead of CR/LF which is what DOS needs) and that's it. This
can be somewhat risky since it may misdiagnose a file but it should be safe
if you're running it on your cache  subdirectory.   Initially  defaults  to
"/-ALL" which means it won't process it unless it thinks it's an HTML file.

"/-ALL" says to skip files if the program thinks it's  not  an  HTML  file.
This is initially the default.

"/ATTR=attribs" allows you to specify a combination of attributes that  you
want considered.  You can specify  any  combination  of  R  (read-only),  H
(hidden), S (system), or A (archive bit).  Precede  any  character(s)  with
"-" to exclude instead of include.  Unlike with the DOS  DIR  command,  the
inclusions and exclusions are subject to  "OR"  conditions;  /ATTR=HS  will
retrieve any file that is either hidden or a system file or both.  You  can
specify "/ATTR=ALL"  to  specify  that  all  files  are  to  be  processed.
Initially defaults to /ATTR=-H-S (skip hidden or system files).

"/WIDTH=n" specifies the desired line length for wrapping  long  lines  and
also for centering.  Initially defaults to "/WIDTH=80".

"/FORCE" says that the specified  width  must  be  adhered  to.   The  only
exception to this is that tables may force a width expansion if  the  cells
simply  can't  fit  on  the  page  otherwise.   Using  /FORCE  means   that
<PRE>...</PRE> blocks may be wrapped (typically a no-no) and some words  in
tables may get split up if the entire word can't fit in the  computed  cell
width.  The latter is especially a  problem  if  there  are  lots  of  cell
columns in a table; there isn't much room for  the  actual  data  when  the
cells themselves take up so much space.  Initially defaults to "/-FORCE".

"/-FORCE" says that the desired widths can be ignored  if  table  cells  or
<PRE>...</PRE>  blocks  would  look  more  natural  without  it.   This  is
initially the default.


HTMSTRIP.TXT                         8                         Mar 10, 2001

"/RULE=s" specifies that a string is to be repeated the width of the  line.
This is used to separate sections.  The string can be:
  * a single character (for example: /RULE=- ),
  * multiple characters (for example: /RULE="- " ),
  * it can contain decimal and hexadecimal characters
    (like "/RULE=\066\097\116"--see BRUCEHEX.TXT),
  * "/RULE=NULL" or "/-RULE" (both generate blank lines like "/RULE=\32")
  * "/RULE" which picks a setting for you based on /BORDER=c:
      * If /BORDER=S, is the same as "/RULE=\196" or "RULE="
      * If /BORDER=D, is the same as "/RULE=\205" or "RULE="
      * If other,     is the same as "/RULE=\045" or "RULE=-"
  * Left out in which case the default "/RULE=-" is used
Personally, if your  printer  supports  IBM  graphics  characters,  I  find
"/RULE=\196" to be the most pleasing of the rule lines.

"/BORDER=c" specifies the type of border to use.  The possible choices  for
"c" are:
  D  -- double line around row and stub, single lines elsewhere
  S  -- single line
  T  -- text character line    -- this is the default
  B  -- blanks (spaces)
  N  -- none
  DV -- double line is used around row and stub, lines are skipped in
        horizontal rows within the table itself
  SV -- same as DV except single line
  TV -- same as DV except text lines
Examples of the various border types:

     <D>ouble       <S>ingle        <T>ext         <B>lank        <N>one
   ͻ  Ŀ  +---+---+---+
    1  2  3    1  2  3   | 1 | 2 | 3 |    1   2   3      1   2   3
   ͹  Ĵ  +---+---+---+                   4   5   6
    4  5  6    4  5  6   | 4 | 5 | 6 |    4   5   6      7   8   9
   Ķ  Ĵ  +---+---+---+
    7  8  9    7  8  9   | 7 | 8 | 9 |    7   8   9
   ͼ    +---+---+---+

                      <DV>           <SV>           <TV>
                  ͻ  Ŀ  +---+---+---+
                   1  2  3    1  2  3   | 1 | 2 | 3 |
                  ͹  Ĵ  +---+---+---+
                   4  5  6    4  5  6   | 4 | 5 | 6 |
                   7  8  9    7  8  9   | 7 | 8 | 9 |
                  ͼ    +---+---+---+

Note that the /BORDER=D,  /BORDER=DV,  and  /BORDER=SV  presume,  as  shown
above, that the table has a one-column stub and a  one-row  column  header.
This is normal for many tables but  not  all.   When  in  doubt,  stick  to
something like /BORDER=S or /BORDER=T.

"/BUFF=n" specifies how many spaces to  position  on  either  side  of  the
vertical bars in the tables.  Defaults to /BUFF=1.


HTMSTRIP.TXT                         9                         Mar 10, 2001

"/SPACES" retains extra  vertical  spacing  between  sections.   There  are
frequently lots of extra blank lines that appear in the output file  either
due to specific HTML requests or to insure proper reformatting.  Specifying
/SPACES allows these to stay there.

"/-SPACES" removes these extra blank lines.   This  also  tries  to  remove
empty columns in tables as well as some blank  rows  in  tables.   This  is
initially the default.

"/RSPACE" requires that a blank line appear  before  and  after  horizontal
rule (<HR>) indicators.  Using this option with /SPACES may cause  multiple
blank lines around horizontal rules.  Initially defaults to "/-RSPACE".

"/-RSPACE" doesn't force a blank line around  horizontal  rule  indicators.
This is initially the default.

"/WARNINGS" displays on-screen warnings when HTMSTRIP finds either internal
problems in the document or things it can't handle.  Realistically, they're
not all that important because the program is working  around  them  anyway
but you might want to use them to help make suggestions to  the  webmaster.
If you create a logfile (using the "/LOG=filename" parameter), the warnings
are automatically written out to that file independently of the "/WARNINGS"
setting.  Initially defaults to "/-WARNINGS".

"/-WARNINGS" turns  off  the  warning  messages.   This  is  initially  the
default.

"/TABLE" says to process text within table declaration sections  as  tables
whenever the program can.  There are some maximum cell length limits in the
program and some tabular text will be dumped as straight ASCII text anyway.
This is initially the default.

"/-TABLE" says to  process  text  within  table  declarations  sections  as
straight text, removing it from the tabular structure entirely.  There  are
other cases where page authors  have  switched  to  tables  for  formatting
purposes and, when you convert the resulting page, you're left with  mostly
spaces.  Initially defaults to "/TABLE".

"/-INDENT" removes block indent sections from the output file.  By default,
five    spaces    are    inserted    before    each    line    within     a
<BLOCKQUOTE>...</BLOCKQUOTE> block.  These can be nested so you can end  up
with a lot of white space in your document.   "/-INDENT"  turns  them  off.
Initially defaults to "/INDENT".

"/INDENT" retains  the  <BLOCKQUOTE>...</BLOCKQUOTE>  indenting.   This  is
initially the default.


HTMSTRIP.TXT                         10                        Mar 10, 2001

"/CPn" specifies what character  pageset  to  use  when  processing  entity
references.  For example, should the &deg; entity be displayed as the  word
"degree", the 8-bit DOS character "" or the Windows character "".   (This
setting does not impact things like tables and rulers, which you're setting
yourself.) "n" can be 1, 2, or 3:

  /CP1 specifies that the program should use the 7-bit DOS  character  set.
       This is the most universally recognized character set out there  and
       should work for printing, e-mail, etc.  It does not  handle  foreign
       characters or miscellaneous symbols like "" so these are translated
       into rough ASCII equivalents.  (Consider this  font  to  be  for  us
       culturally-ignorant Americans who wouldn't recognize an accent  mark
       if it bit us.) Since this  is  the  lowest-common-denominator  font,
       it's initially the default for this routine.  Add /CP2  or  /CP3  to
       your HTMSTRIP.INI file if you want to change on a regular basis.
  /CP2 specifies that the program should use the 8-bit DOS  character  set.
       This works within DOS applications but doesn't read correctly  under
       Windows programs.
  /CP3 specifies  that  the  program  should  use  the  ISO  8859/1   8-bit
       single-byte graphic character set.  This set  works  within  Windows
       applications but may not e-mail correctly.

"/A=spec" tells the program how to handle <A...> hypertext links.  See  the
"How hypertext links (<A>)  are  displayed"  discussion  above.   Initially
defaults to "/A=NONE".

"/A" is the same thing as "/A=FSITE".

"/-A" is the same thing as "/A=NONE".

"/IMG=spec" tells the program how to handle <IMG...> links.  These are used
for embedded graphics.  "/IMG=spec" says to use the "src=" specification in
place of any "alt=" specification.  See the "How image elements (<IMG>) are
displayed" discussion below.  Initially defaults to "/IMG=NONE".

"/IMG" is the same thing as "/IMG=FSITE".

"/-IMG" is the same thing as "/IMG=NONE".

"/IMGALT=spec" is mutually exclusive with the "/IMG=spec" specification and
again tells the program how to handle <IMG...> links.  "/IMGALT=spec"  says
to use a "alt=" specification if one is provided before  falling  back  and
using the "src=" specification.  If "/ALTONLY"  is  specified,  the  "src="
specification will not be used at all.  See the "How image elements (<IMG>)
are displayed" discussion below.  Initially defaults to "/IMGALT=NONE".

"/IMGALT" is the same thing as "/IMGALT=FSITE".

"/-IMGALT" is the same thing as "/IMGALT=NONE".


HTMSTRIP.TXT                         11                        Mar 10, 2001

"/MAP=spec"  and  "/MAPALT=spec"  work  the   same   as   "/IMG=spec"   and
"/IMGALT=spec" (except /MAP=SYMBOL and /MAPALT=SYMBOL  are  not  supported)
but they apply to <AREA> specifications within a <MAP>...</MAP> block.  See
the "How client-side imagemaps (<MAP> and <AREA>) are displayed" discussion
below.  Initially defaults to "/MAP=NONE" (which is the same as "/-MAP").

"/MAP" is the same as "/MAP=FSITE".

"/ALTONLY" specifies that if an ALT=alias reference exists in  an  <IMG...>
or <AREA...> link, then the alias should be embedded  in  the  output  text
(appearing within brackets)  but,  otherwise,  all  <IMG...>  or  <AREA...>
references are to be ignored  in  the  input  file.   See  the  "How  image
elements (<IMG>) are displayed" and "How client-side Imagemaps  (<MAP>  and
<AREA>)  are  displayed"  discussions   below.    Initially   defaults   to
"/-ALTONLY".

"/-ALTONLY" allows <IMG...> and <AREA...> references to be added to  output
file even if an ALT=alias reference is not specified.   This  is  initially
the default.  See the "How image elements (<IMG>) are displayed"  and  "How
client-side imagemaps (<MAP> and <AREA>) are displayed" discussions below.

"/HTTP=cc" specifies the two characters that are to appear around the  text
substitution  in   /A=spec,   /IMG=spec,   /IMGALT=spec,   /MAP=spec,   and
/MAPALT=spec items.   Initially  defaults  to  "/HTTP=[]";  the  site  name
(typically) is preceded by " [" (including the leading space) and  followed
by "]  "  (including  the  trailing  space).   If  only  one  character  is
specified, it is the same as repeating it  so  "/HTTP=*"  is  the  same  as
"/HTTP=**".  You can  also  specify  /HTTP=NULL  but  the  leading/trailing
spaces will still be used.

"/-INPUT" skips any indication of the <INPUT> flags.  Initially defaults to
"/INPUT".

"/INPUT" shows <INPUT> flags.   This  allows  the  "<INPUT>  =  5<@+>"  (or
however you have it defined) from HTMSTRIP.INI to be  activated.   This  is
initially the default.

"/L" says to read "&xxx;" entity references and  "<A>"  etc  symbol  lookup
codes from your /Iinitfile file.  This is initially the default.

"/Linitfile" says to read the  "&xxx;"  entity  references  and  "<A>"  etc
symbol lookup codes from the specified file "initfile".  Specifying another
file is primarily useful if you want to have a  master  *.INI  file  and  a
separate code lookup table.  Initially defaults to "/L".

"/-L" says to not process any entity references  or  symbol  lookup  codes.
Initially defaults to "/L".


HTMSTRIP.TXT                         12                        Mar 10, 2001

"/FILTER" specifies that the program is to replace specific  characters  in
the input  file.   See  the  "Defining  Character-Translations"  discussion
below.  When this parameter is in effect, the program looks  for  character
translations in the entity reference  file  (/Linitfile),  which  typically
defaults to your initialization file (/Iinitfile).  The  is  initially  the
default.

"/FILTER=filename" specifies that  a  filter  is  to  be  applied  and  all
character replacements are in  the  file  "filename".   See  the  "Defining
Character-Translations" discussion below.

"/-FILTER" says to not bother removing the nonprintable characters from the
output.  Initially defaults to "/FILTER".

"/LOG=logfile" specifies that the program should create a simple  log  file
showing what files were processed  when  and  what  (if  any)  errors  were
encountered.  If the logfile exists already, it will be appended to  (lines
will be added to the end of it).  If no drive or  path  is  specified,  the
file will be created in your default drive or path.  Initially defaults  to
"/-LOG" (don't create a logfile).

"/-LOG" says to not create a log  file  at  all.   This  is  initially  the
default.

"/LOG" is the same as "/LOG=HTMSTRIP.LOG".

"/T=temp_dir" specifies where to write the temporary files that the routine
needs.  Examples are "/T=C:"  and  "/T=C:\TEMP".   If  not  specified,  the
routine writes to the following in sequence:

  - the value of any TEMP, then TMP, environmental variable
  - C:\TEMP
  - C:\

"/MONO" (or "/-COLOR") does not try to override screen  colors.   Initially
defaults to "/COLOR".

"/COLOR" (or "/-MONO") allows screen colors  to  be  overridden.   This  is
initially the default.

"/Iinitfile" says to  read  an  initialization  file  with  the  file  name
"initfile".  The file specification *must* contain a period.  If  no  drive
or path information is specified, the  program  will  search  for  initfile
beginning in your default subdirectory and then going throughout  your  DOS
path.  The use of an initialization file is optional.   Initially  defaults
to "/IHTMSTRIP.INI".

"/-I" (or "/INULL") says to skip loading  the  initialization  file.   Note
that this also drops loading the file that translates things  like  "&xxx;"
so you should specify /Linitfile if you drop the other file.


HTMSTRIP.TXT                         13                        Mar 10, 2001

"/ENV" says to look for %var% occurrences in the command line  and  try  to
resolve any apparent environmental variable references.   See  BRUCEINI.TXT
for more information.  This is initially the default.

"/-ENV" says to skip resolving apparent %var% occurrences  in  the  command
line.  Initially defaults to "/ENV".

"/?" or "/HELP" or "HELP" shows you the syntax for the command.

"/?HEX" gives you a hexadecimal and decimal conversion table.


Return codes:

HTMSTRIP returns the following ERRORLEVEL codes:

        0 = no problems, all files processed
      251 = could not find a file to process
      253 = operation aborted by pressing Escape
      255 = syntax problems, or /? requested



HTMSTRIP.TXT                         14                        Mar 10, 2001

How hypertext links (<A>) are displayed:

Hypertext links are placed in an HTML page to indicate  that  if  the  user
clicks in the defined "hot" area, they will be taken to an appropriate page
(or another section of the same page).  A typical hypertext link  would  be
something like:

        To enter Elsie's Picture Page, click
        <a href="http://www.erols.com/waynesof/elsie.htm">here!</a>

Using a browser like Netscape Navigator  or  Internet  Explorer,  the  user
would see this as:

        To enter Elsie's Picture Page, click here!

and "here!" would normally be underlined and perhaps be in red.

How HTMSTRIP reveals hypertext links is based on the /A=spec parameter. The
values of "spec" are mutually exclusive:

      /A=FSITE    says to show the site name, using its full  url  address,
                  and embed this name in the body of the text page
      /A=FSITEFN  says to show the site name, using its full  url  address,
                  and place this site name in a footnote section at the end
                  of the text page
      /A=SITE     says to show the site name, but only the part  after  the
                  last "/" or "\", and embed this name in the body  of  the
                  text page
      /A=SITEFN   says to show the site name, but only the part  after  the
                  last "/" or "\", and place this site name in  a  footnote
                  section at the end of the text page
      /A=SYMBOL   says to use the specified <A> symbol  (initially  defined
                  as "(link)" in the HTMSTRIP.INI file)
      /A=NONE     (or /-A) says that nothing is to be shown  for  hypertext
                  links.  This is initially the default.

Given:

        To enter Elsie's Picture Page, click
        <a href="http://www.erols.com/waynesof/elsie.htm">here!</a>

Setting             Yields
-------             ------
/A=FSITE            [http://www.erols.com/waynesof/elsie.htm]
/A=FSITEFN          [1] http://www.erols.com/waynesof/elsie.htm (footnote)
/A=SITE             [elsie.htm]
/A=SITEFN           [1] elsie.htm       (footnote)
/A=SYMBOL           (link)
/A=NONE                                 (is not shown)



HTMSTRIP.TXT                         15                        Mar 10, 2001

How image elements (<IMG>) are displayed:

Image elements are put in HTML code to indicate that a graphical  image  is
to be inserted at this point.  A typical  image  element  might  look  like
this:

        <IMG SRC="../movies/Anaconda/assets/title.gif" border=0
        alt="Anaconda - click to enter">

Presuming the Web browser has graphics enabled, the browser will  load  the
title.gif graphic from the specified site (the  example  here  is  using  a
relative reference to the site instead  of  an  absolute  one--don't  worry
about it).

If the browser does not support graphics, the text displayed in the  "alt="
parameter (in our case "Anaconda - click to enter") will be displayed.   If
no "alt=" parameter is provided, some symbol like "(image)" will be  filled
in instead.  You'll frequently see the "alt=" text  if  the  graphic  takes
awhile to load; most browsers show the text to give  you  a  head's  up  of
what's being loaded.  Under some of the newer  browsers,  the  "alt="  text
will be displayed if you move your cursor over the image.

Note that unlike the "src="  specification,  the  "alt="  specification  is
strictly optional under the HTML specifications.  Some images will have one
defined, most won't.

How  HTMSTRIP  displays  these  links  is  based   on   three   parameters:
"/IMG=spec", "/IMGALT=spec", and "/ALTONLY".

"/IMG=spec" and "/IMGALT=spec" are mutually exclusive.  The "spec" part  of
them functions almost  identically  to  how  they  work  in  the  "/A=spec"
parameter (above).

If "/IMG=spec" is used, the "SRC=" attribute is used exclusively.

If "/IMGALT=spec" is used, the "ALT="  attribute  is  used  if  it  exists,
otherwise,  the  "SRC="  attribute  is  used  unless  "/ALTONLY"  is   also
specified.

If "/IMGALT=spec" and "/ALTONLY" are used, then  the  "ALT="  attribute  is
used.  If one does not exist in the  image  element,  a  symbol  (typically
"(image)") is used instead.

The values of "spec" are mutually  exclusive  and  are  documented  in  the
"/A=spec" section above.  Initially, the parameters default to  "/IMG=NONE"
(and "/IMGALT=NONE") and "/-ALTONLY".  This results in nothing being  shown
for image elements.


HTMSTRIP.TXT                         16                        Mar 10, 2001

Given:

        <IMG SRC="../movies/Anaconda/assets/title.gif" border=0
        alt="Anaconda - click to enter">

Setting             Yields
-------             ------
/IMG=FSITE          [../movies/Anaconda/assets/title.gif]
/IMG=FSITEFN        [1] ../movies/Anaconda/assets/title.gif (footnote)
/IMG=SITE           [title.gif]
/IMG=SITEFN         [1] title.gif       (footnote)
/IMG=SYMBOL         (image)
/IMG=NONE                               (is not shown)

Setting             Yields (if "alt=" specification present)
-------             ----------------------------------------
/IMGALT=FSITE       [Anaconda - click to enter]
/IMGALT=FSITEFN     [1] Anaconda - click to enter (footnote)
/IMGALT=SITE        [Anaconda - click to enter]
/IMGALT=SITEFN      [1] Anaconda - click to enter (footnote)
/IMGALT=SYMBOL      [Anaconda - click to enter]
/IMGALT=NONE                            (nothing shown)

Given an image element without an "alt=" specification:

        <IMG SRC="../movies/Anaconda/assets/title.gif">

If  /-ALTONLY  is  operating,  /IMGALT=spec  is  treated   identically   to
/IMG=spec; the "src=" specification is used.  However, if  /ALTONLY  is  in
effect:

Setting             Yields (if no "alt=" spec and /ALTONLY in effect)
-------             --------------------------------------------------
/IMGALT=FSITE                           (nothing shown)
/IMGALT=FSITEFN                         (nothing shown)
/IMGALT=SITE                            (nothing shown)
/IMGALT=SITEFN                          (nothing shown)
/IMGALT=SYMBOL      (image)
/IMGALT=NONE                            (nothing shown)



HTMSTRIP.TXT                         17                        Mar 10, 2001

How client-side imagemaps (<MAP> and <AREA>) are displayed:

Client-side imagemaps are used for sites that display something like a  map
and allow you to click on different parts of the map and go to a  different
place.  As your mouse cursor moves over the  image,  it  typically  changes
from an arrow to a hand  and  back  again  as  it  hits  these  pre-defined
hotspots.  In general, imagemaps are relatively hard to set up and  they're
not in wide use.

Here's an example of an imagemap taken from the ZDNet site:

  <MAP NAME="botnav">
  <AREA SHAPE=RECT COORDS="0,12,83,26" HREF=http://www.zdnet.com/
     ALT="ZDNet Home Page">
  <AREA SHAPE=RECT COORDS="84,12,150,26" HREF=javascript:popup()
     ALT="ZDNet Site Map">
  <AREA SHAPE=RECT COORDS="151,12,246,26"
     HREF=http://xlink.zdnet.com/cgi-bin/texis/xlink/xlink/welcome.html
     ALT="Search ZDNet">
  <AREA SHAPE=RECT COORDS="248,12,327,26"
     HREF=http://xlink.zdnet.com/cgi-bin/texis/xlink/xlink?config=whatsnew
     ALT="What's New on ZDNet">
  <AREA SHAPE=RECT COORDS="328,12,383,26" HREF=/adverts/adinfo/
     ALT="ZDNet Advertising Info">
  <AREA SHAPE=RECT COORDS="384,12,468,26"
     HREF=http://www.zdnet.com/cgi-bin/contact ALT="ZDNet Contact Us">
  </MAP>
  <img WIDTH=468 HEIGHT=26
     SRC="http://www.zdnet.com/graphics/nav/botnav.gif"
     border=0 align=top vspace=7 usemap="#botnav">

This imagemap says to bring in a graphic image ("botnav.gif") and  use  the
separately-defined map ("botnav") to identify certain hotspots.

HTMSTRIP doesn't display graphics at all so you're not going to get a great
feeling for how the imagemap would be  displayed  using  the  program.   It
will, however, show you what links are called upon if that's what you  want
it to do.  The way that it  handles  these  is  with  the  "/MAP=spec"  and
"/MAPALT=spec" parameters.  They work very similarly to how "/IMG=spec" and
"/IMGALT=spec" are handled including  how  /ALTONLY  affects  things.   The
initial default for HTMSTRIP is "/MAP=NONE" which means the image maps  are
skipped entirely.


HTMSTRIP.TXT                         18                        Mar 10, 2001

Given:

  <MAP>
  <AREA SHAPE=RECT COORDS="0,12,83,26" HREF=http://www.zdnet.com/index
     ALT="ZDNet Home Page">
  </MAP>

Setting             Yields
-------             ------
/MAP=FSITE          Map: {  [http://www.zdnet.com/index]  }
/MAP=FSITEFN        [1] http://www.zdnet.com/index     (footnote)
/MAP=SITE           Map: {  [index]  }
/MAP=SITEFN         [1] index                          (footnote)
/MAP=SYMBOL                (results in error message)
/MAP=NONE                  (nothing shown)

Setting             Yields (if "alt=" specification present)
-------             ----------------------------------------
/MAPALT=FSITE       Map: {  [ZDNet Home Page]  }
/MAPALT=FSITEFN     [1] ZDNet Home Page    (footnote)
/MAPALT=SITE        Map: {  [ZDNet Home Page]  }
/MAPALT=SITEFN      [1] ZDNet Home Page    (footnote)
/MAPALT=SYMBOL             (results in error message)
/MAPALT=NONE               (nothing shown)

Given an imagemap without an "alt=" specification:

  <MAP>
  <AREA SHAPE=RECT COORDS="0,12,83,26" HREF=http://www.zdnet.com/index>
  </MAP>

If  /-ALTONLY  is  operating,  /MAPALT=spec  is  treated   identically   to
/MAP=spec; the "href=" specification is used.  However, if /ALTONLY  is  in
effect:

Setting             Yields (if no "alt=" spec and /ALTONLY in effect)
-------             --------------------------------------------------
/MAPALT=FSITE       Map: {  }
/MAPALT=FSITEFN     Map: {  }
/MAPALT=SITE        Map: {  }
/MAPALT=SITEFN      Map: {  }
/MAPALT=SYMBOL             (results in error message)
/MAPALT=NONE               (nothing shown)



HTMSTRIP.TXT                         19                        Mar 10, 2001

Defining entity references:

HTMSTRIP will process an entity reference definition file is one is  found.
This table can be in your standard *.INI file (for  example,  HTMSTRIP.INI)
if desired or it can be a separate  file  specified  using  the  /Linitfile
parameter.

Entity references  are  how  non-standard  characters  like  the  copyright
character are handled in HTML pages.  Entity references  are  indicated  as
"&xxx;" where "xxx" is either a code or a number preceded by a pound  sign.
The copyright symbol, for example, is indicated in HTML as "&copy;".

A default HTMSTRIP.INI is provided with over 300 entity reference  lookups.
To define or change these lookups, the INI file should include a series  of
lines in the following format:

        &xxx; = _outstr1_outstr2_outstr3_

where "&xxx;" is the HTML sequence and "outstr1", "outstr2", and  "outstr3"
is what you want to replace it with.   There  are  three  available  lookup
strings to match the setting for the character pageset parameter ("/CPn"):
* The first character(s) ("outstr1")  correspond  to  the  characters  used
  under 7-bit DOS (/CP1).  Files created using this character  set  can  be
  e-mailed to anyone and looks identical under DOS  and  Windows.   Foreign
  characters  and  symbols  are  translated  into  fairly  boring,  generic
  characters.
* The second character(s) ("outstr2") correspond  to  the  characters  used
  under 8-bit DOS (/CP2).  Files created using this character set look fine
  under DOS but look sick under Windows.
* The third character(s) ("outstr3")  correspond  to  the  characters  used
  under the ISO 8859/1 8-bit  single-byte  graphic  character  set.   Files
  created using this character set look fine under  Windows  but  look  bad
  under DOS.

For example:

        &AElig;    = _AE___

will use "AE" if /CP1 is in effect, "" if /CP2 is in effect,  and  ""  if
/CP3 is in effect.  Note that at least one of these "outstr" elements  will
look incorrect to you if you're viewing this help  file  under  Windows  or
DOS.  See the discussion about ENTITY.HTM below in order  to  see  how  the
different character sets are viewed under different environments.

In cases where the characters are identical between all character sets, you
can just include the lookup once:

        &amp;      = &

The same lookup value will be  used  irregardless  of  what  character  set
you're under.


HTMSTRIP.TXT                         20                        Mar 10, 2001

The  "outstr"  portions  can  consist  of  regular  non-space  ASCII   text
characters (like "A" or "z") as well as hexadecimal  values  (in  the  form
&Hxx) or decimal values (in the form \nnn).  (See the  BRUCEHEX.TXT  file.)
They can also be the word "NULL" which translates the string into  nothing.
You cannot use a space or equal sign in "outstr"; use  the  hexadecimal  or
decimal representations instead.  The table does not  have  to  be  in  any
specified order.  Lines can end with "/*" followed  by  a  comment  if  you
want.  Examples:

        &cent;     = _cents___         /* Cent symbol
        &copy;     = _(c)_(c)__         /* Copyright symbol
        &deg;      = _degree___        /* Degree symbol
        &emsp;     = \032                /* Thick space

Remember that "&xxx;" entity references  (yes,  I  hate  that  phrase)  are
case-sensitive in HTML.  "&deg;" will not find "&Deg;".

There seems to be a trend of late to relax some of the  replacement  coding
requirements in Web pages.  The ";" is now, apparently, becoming  optional.
Numeric replacements (for example, "&#32;") seem to no longer  require  the
leading pound sign.  Therefore, HTMSTRIP looks for both of these iterations
for any appropriate lookup.  "&copy;" will find "&copy" and  "&#153;"  will
find "&153".  The lookup itself has to be entered in the  formally  correct
way though.

You can see how these files will be processed under each character  pageset
by testing out the ENTITY.HTM file that is provided with  the  HTMSTymm.ZIP
file.  This contains all of the entity references defined  in  HTMSTRIP.INI
as of March 1997.

To try all three of the character sets, issue the following commands:

        HTMSTRIP ENTITY.HTM ENTITY.DOS /CP1
        HTMSTRIP ENTITY.HTM ENTITY.IBM /CP2
        HTMSTRIP ENTITY.HTM ENTITY.WIN /CP3

Then view the resulting files under the DOS EDIT command as well  as  under
the Windows Notepad program.

You may also find the ASCII character set document (ASCII.TXT)  useful  for
seeing how different fonts effect characters.  View this  file  in  Notepad
using the Courier font, then change your default font to Terminal, and  see
it again.


HTMSTRIP.TXT                         21                        Mar 10, 2001

Defining the Symbolic References:

You are also allowed to redefine the strings  that  are  used  for  several
symbolic references in the entity reference file.   For  example,  if  your
source code contains  an  <IMG>  (inline  image)  reference,  HTMSTRIP  can
indicate this by putting some text in place of  the  image.   (HTMSTRIP  is
text only so it's not going to put the actual image in  there.)  The  first
three replacements shown below are conditional based on other parameters:

* The  <A>  indicator  replaces  hyperlink  references  if   /A=SYMBOL   is
  specified.
* The <IMG> indicator replaces inline image references  if  /IMG=SYMBOL  or
  /IMGALT=SYMBOL is specified.
* The <INPUT> indicator replaces input fields if  /INPUT  is  left  as  the
  default.
* <I> replaces italics-on and </I> replaces italics-off.
* <U> replaces underline-on and </U> replaces underline-off.
* <B> replaces bold-on and </B> replaces bold-off.  It  also  replaces  the
  strong-on <STRONG> and strong-off </STRONG> tags favored under  the  HTML
  4.0 specification.
* <EM> replaces emphasis-on and </EM> replaces emphasis-off.
* <TITLE> ... </TITLE> indicates how to handle the document's title.
* <H1> ... </H1> indicates how to handle the level 1 headings.   Similarly,
  <H2> ... </H2> through <H6> ...  </H6>  indicates  how  to  handle  those
  levels of headings.

The default indicators are the following:

        Symbol               Meaning          Default Value
        <A>                  hyperlinks        -> (link)
        <IMG>                inline image      -> (image)
        <INPUT>              input fields      -> 5<@+>
        <I> and </I>         italics on/off    -> (null)
        <U> and </U>         underline on/off  -> (null)
        <B> and </B>         bold on/off       -> (null)
        <EM> and </EM>       emphasis on/off   -> (null)
        <TITLE> and </TITLE> document title    -> (null)
        <H1> through <H6>
          and </H1> thru </H6> level headings  -> (null)

You can redefine any and all of these entity references in the same  lookup
file.  These substitutions are specified more or  less  like  the  previous
substitutions.  For example:

        <A>      = (link)
        <IMG>    = (image)
        <INPUT>  = 5<@+>
        <U>      = _
        </U>     = _
        <B>      = *
        </B>     = *


HTMSTRIP.TXT                         22                        Mar 10, 2001

Unlike with the other lookups, the left  side  is  not  case  sensitive  so
"<a>=(link)" works just fine.  Hexadecimal  and  decimal  replacements  are
again acceptable (see BRUCEHEX.TXT file).  You might, for example, want  to
redefine some of them like this:

        <A>      = \251     /* Replaces with a  symbol
        <IMG>    = \015     /* Replaces with a  symbol (little flash cube)
        <INPUT>  = ?        /* Replaces with a question mark

The replacements aren't always perfect.  Web browsers  don't  italicize  or
display in bold spaces so the following  will  look  perfectly  fine  under
Netscape or Internet Explorer:

        The<B> Minnow </B>was Gilligan's ship.

However, if you have the following in your INI file:

        <B>     = *
        </B>    = *

The text will show up as:

        The* Minnow *was Gilligan's ship.

Which makes  it  look  like  the  wrong  words  are  emphasized.   This  is
unfortunate but it's the way things work.

If you normally print the results of  everything  from  HTMSTRIP,  you  can
probably find the print codes that are appropriate for  your  printer  that
will change the text in the way you want.

For example, if you're using a Hewlett-Packard  LaserJet  printer,  printer
codes are shown in the User's  Manual  which  can  do  different  types  of
bolding, underlining, etc.  You have to make sure that  you  turn  off  the
settings with the </xx> option (e.g.  </B>) though.  The  following  should
work on  many  HP  LaserJets  (check  your  manual  and  replace  with  the
appropriate codes if not):

        <I>      = \027(s1S   /* Turns italicizing on
        </I>     = \027(s0S   /* Turns italicizing off (restores upright)
        <U>      = \027&d0D   /* Turns underlining on
        </U>     = \027&d@    /* Turns underlining off
        <B>      = \027(s2B   /* Turns demi-bolding on
        </B>     = \027(s0B   /* Turns bolding off
        <EM>     = \027(s1B   /* Turns semi-bolding on
        </EM>    = \027(s0B   /* Turns bolding off (restores normal weight)

Note that the  program  counts  all  characters  (including  these  special
print-setting characters which don't  themselves  print)  when  it  reflows
text.  Also note that, on the HP at least, underlining underlines spaces as
well as characters, including indents.

Any symbolic references that you do not  redefine  will  default  to  their
original values.


HTMSTRIP.TXT                         23                        Mar 10, 2001

The <INPUT> item is a bit of  a  special  case.   It  has  several  special
options, and they are all present in the default value.

<INPUT> is used to indicate that the HTML page prompted for,  typically,  a
bit of text.  In the actual HTML page, this might be coded as:

        <INPUT NAME=q size=45 maxlength=200 VALUE="">

Ignoring most of the parameter, the "size=45" parameter says that  the  Web
navigator is to present an input line to the user which is 45 characters in
length.  "VALUE=""" indicates that no default value is  provided  for  this
input.

The default symbolic reference for handling an <INPUT> request is:

        <INPUT>  = 5<@+>

Each item of the assignment is explained below:

        <INPUT>  specifies the <INPUT> replacement
        5        means the maximum input length (SIZE=x) to be provided
                 is 5 characters; the value can be any number between 1
                 and 255; this rule is sometimes waived (see below)
        < and >  are extra text characters that will appear
        @        says to fill in the default value (VALUE="" above) is
                 one is provided
        +        says to expand the input field based on an specified
                 length (SIZE=45 above); if no SIZE= is provided on the
                 page, a default of SIZE=5 will be used; expansion is
                 done using underscore characters

With the above settings, if the program encountered this:

        <INPUT NAME=q size=45 maxlength=200 VALUE="">

It would actually write out the input references as:

        <___>

Similarly, if the program encountered this:

        <INPUT TYPE=submit VALUE=Submit>

It would write out this:

        <Submit>

On the other hand, the program will expand the field beyond  the  specified
maximum length if "@" (value) is requested and it's too large to fit in the
specified field length.  If the program encountered this:

        <INPUT TYPE=TEXT VALUE="This is my sample" SIZE=10>

It would write out this:

        <This is my sample>


HTMSTRIP.TXT                         24                        Mar 10, 2001

Defining Character-Translations (The Filter Table):

HTMSTRIP allows you to translate specified characters as the text is  read.
This is useful on output for characters that are defined under Windows  but
that's about it.  This should not be an issue because HTML is  supposed  to
be platform independent; the Web designer (or the  software  used  for  the
page) should have been smart enough to insert the proper  entity  reference
instead.

For example, "Disneys" shows up on the Disney site for some  reason.   The
filter table will translate this as "Disney's".  Also,  way  too  many  Web
designers use decimal 169 ("", as in "  1996")  as  a  copyright  symbol;
they're supposed to use the entity reference  &#169  instead.   The  filter
table will translate this as "c 1996".

There is a default character-translation table built into the entity lookup
file (HTMSTRIP.INI).  This will typically be loaded  automatically  by  the
program.  You can update the translations in the lookup  file  or  you  can
create  your  own  filter  file   and   invoke   it   by   specifying   the
"/FILTER=filename" parameter.  In most cases, however, you will not need to
modify this table.

The filter table is an ASCII text file which consists of a series of  lines
in the following format:

        inchar = outchar

where "inchar" is the character to change from and  "outchar"  is  what  to
change the character to.  Both portions can consist  of  regular  non-space
ASCII text characters (like "A" or "z") as well as hexadecimal  values  (in
the form &Hxx) or decimal values (in  the  form  \nnn).   Both  sides  must
reference a single character (exactly one character  is  always  translated
into exactly one character).  You cannot use  a  space  or  equal  sign  in
either  "inchar"   or   "outchar";   use   the   hexadecimal   or   decimal
representations instead.  The table does not have to be  in  any  specified
order.  Lines can end with "/*" followed by a comment if you want.

Hexadecimal and decimal equivalents are explained in BRUCEHEX.TXT.

Examples:

             a    = A     /* Translate lowercase "a" into capital "A"
             \032 = _     /* Translate space (decimal 032, &H20 too) into
                             underscore
             \027 = \032  /* Translate escape character into a space

Some leading characters in INI files are  treated  specially  within  Wayne
Software programs.   INI  lines  that  begin  with  any  of  the  following
characters may lead to odd results:  "[", "/", "&", "\", ";", ":", "<", and
",".  To avoid problems, use hexadecimal  or  decimal  representations  for
these characters.  For example, use \047 or &H2F if you  want  to  override
the definition of "/".



HTMSTRIP.TXT                         25                        Mar 10, 2001

Author:

This program was written by Bruce Guthrie of Wayne Software.   It  is  free
for use and redistribution provided relevant documentation is kept with the
program, no changes are made to the program or documentation, and it is not
bundled with commercial programs or charged  for  separately.   People  who
need to bundle it in for-sale packages must pay a $50 registration  fee  to
"Bruce Guthrie" at the following address.

Additional information about this and other Wayne Software programs can  be
found in the file BRUCE.TXT which should be included in  the  original  ZIP
file.  The recent change  history  for  this  and  the  other  programs  is
provided in the HISTORY.ymm file which should be in the same ZIP file where
"y" is replaced by the last digit of the year and "mm"  is  the  two  digit
month of the release; HISTORY.611 came out in November 1996.

The ZIP file that contained this program and its  associated  documentation
files is named with a two-digit year followed by  a  two-digit  month.   So
HTMS0001.ZIP came out in January 2000.

Comments and suggestions can be sent to the  following.   Realistically,  I
will not be revising this DOS-based program  much  in  the  future  but  it
doesn't hurt to suggest things anyway.

                Bruce Guthrie
                Wayne Software
                113 Sheffield St.
                Silver Spring, MD 20910

                e-mail: WayneSof@erols.com   fax: (301) 588-8986
                http://www.erols.com/waynesof

Please provide an Internet e-mail address on all correspondence.

A number of books have been published with HTMSTRIP stuck on  a  CD-ROM  in
the back.  If you do this with any of  my  utilities,  I  would  appreciate
receiving a free copy of any book published.


