  html2sgml documentation
  Peter Antman
  Tue Aug 26 12:26:46 MET DST 1997

  11..  RREEAADDMMEE

  html2sgml is a program wich converts html to sgml accroding to
  linuxdoc.sgml.  With a file in linuxdoc.dtd format you can create nice
  typset books, well structured html-documents and so forth.
  linuxdoc.dtd is the format used in Linux HOWTOS, for example.

  html2sgml i tuned to work well with Applix HTML, and will convert any
  footnotes apearent in the applix word-file that was used to produce
  the html.

  To use html2sgml you need Perl. To use the image converting routines
  you also need: giftopnm, ppmtopgm and pnmtops

  To do something usefull with the resulting file you also need
  linuxdoc-sgml or the follow up sgml-tools
  http://www.xs4all.nl/~cg/sgmltools/.

  11..11..  GGeettttiinngg hhttmmll22ssggmmll

  The homepage of html2sgml is
  http://www.abc.se/~m9339/prog/html2sgml.html It is possible to ftp it
  from ftp://ftp.mc.hik.se/pub/users/mia95anp/html2sgml/.  It has also
  been upploded to ftp://ftp.redhat.com and
  ftp://sunsite.unc.edu/pub/linux/.

  11..22..  IInnssttaallllaattiioonn

  To install html2sgml unpack the tarfile and cd into the disrtibution.
  Type

  make install

  It will install the programs html2sgml and mkbook, som files in the
  specified documentdirectory, including _e_x_t_r_a_s a couple of script that
  shows some examples of how you can merg severall html-files into one
  to use with html2sgml. A manual page will be installed too.

  Edit the makefile to change where to install and where Perl is on your
  system.  The default is /usr/bin/perl and prefix = /usr/local

  11..33..  UUssaaggee

  See the manual page

  22..  MMaannuuaall ppaaggee

  22..11..  NNAAMMEE

  html2sgml -- convert html to sgml according to linuxdoc.dtd

  22..22..  SSYYNNOOPPSSIISS

  html2sgml _f_i_l_e_._h_t_m_l

  22..33..  DDEESSCCRRIIPPTTIIOONN

  _h_t_m_l_2_s_g_m_l is a fileconverter that converts html-files to sgml-files
  according to linuxdoc.dtd. It will ouput a file with the same name as
  the specified file but with the ending html changed to sgml.

  It will not work on every html-file because of the free format of
  html. It is tuned to work well with html produced from _A_p_p_l_i_x _H_T_M_L_-
  _e_d_i_t_o_r. If it finds a applix word file in the same directory and with
  the same name as the specified file, it will include any _f_o_o_t_n_o_t_e_s
  from the aw-file in the produced sgml-file.

  _h_t_m_l_2_s_g_m_l will also try to convert all included images of type gif to
  postscript.

  By default html2sgml produces a ducument of type _a_r_t_i_c_l_e. To change to
  _b_o_o_k you can use the script _m_k_b_o_o_k. It also fills in a dummy name. If
  there is a title tag in the html-file it will use that as a title for
  the sgml-file. To change this you have to hand edit the sgml-file.

  If there are more than one _H_1 tag these are used as the toplevel
  section.  Everything marked H1 will become a _s_e_c_t in sgml, and _H_2 will
  become sect1 and so forth. If there is only one ore no H1, H2 will be
  used instead. If there is no H* tags, then the document i broken by
  design :-)

  The resulting sgml-file can then be used by _s_g_m_l_-_t_o_o_l_s _(_w_a_s _l_i_n_u_x_d_o_c_-
  _s_g_m_l_) to produce various new fileformats, eg latex, info, rtf.

  22..44..  TTIIPPSS

  _h_t_m_l_2_s_g_m_l should work fine with straight html, that is, when no
  special layout formating has been done. For example: it can handle
  html table tags, but it can not handle them well if they are used to
  produce layout.

  The best working thing is to use it with _A_p_p_l_i_x _h_t_m_l. You can both
  write directly in Applix Word or import document to applix word. Try
  to use predefined styles for your document. You can create heading1,
  heading2, pre, quote and so forth. Open Applix HTML and use
  _F_i_l_e_-_>_I_m_p_o_r_t _w_o_r_d_s _d_o_c_u_m_e_n_t. You will then get the chance to tell
  Applix wich html-tags your defined styles should match, eg heading1 ->
  html_h1. Then use _F_o_r_m_a_t _-_> _H_T_M_L _d_o_c_u_m_e_n_t _s_e_t_t_i_n_g where you can fill
  in the title; here you can also fill in the alternative to export
  Applix images as gif files. This is good to do because html2sgml can
  convert the gif files to ps-files wich can be used when/if converting
  to latex.

  22..55..  BBUUGGSS AANNDD FFEEAATTUURREESS

  _h_t_m_l_2_s_g_m_l is still under development and will most probably contain
  bugs. It also contain som features. All possible HTML and sgml tags
  are not implemented. Unimplemented HTML tags will show up in the sgml
  file where you have to hand edit them away. Some tags in sgml are also
  unsupported. More specific: no math tags is implemented. You can check
  the resulting sgml file with the command _s_g_m_l_c_h_e_c_k to discover any
  leftover tags.

  I have concentrated on making it work in english and in swedish. This
  means that there are a lot of characters that probably not will work
  OK, specialy when converting Applix footnotes. Look in the source and
  try to put in the missing characters if you have any problems. And
  pleas send the new improved version to mee.

  22..66..  AAUUTTHHOORR

  Peter Antman (peter.antman@abc.se)

  22..77..  SSEEEE AALLSSOO

  sgml2latex(1), sgml2html(1), sgml2txt(1), sgml2info(1), sgml2rtf,
  sgml2lyx(1)

