          WWWOFFLE - World Wide Web Offline Explorer - Version 1.1
          ========================================================


The WWWOFFLE programs simplify World Wide Web browsing from computers that use
intermittent (dial-up) connections to the internet.

Description
-----------

The wwwoffled program is a simple proxy server with special features for use
with dial-up internet links.  This means that it is possible to browse web pages
and read them without having to remain connected.

While Online
    - Cacheing of pages that are viewed for review later.
    - Conditional fetching to only get pages that have changed.

While Offline
    - The ability to follow links and mark other pages for download.
    - Browser or command line interface to select pages for downloading.
    - Optional info on bottom of pages showing cached date and allowing refresh.
    - Works with pages containing forms.

Automated Download
    - Downloading of specified pages non-interactively.
    - Can automatically fetch inlined images in pages fetched this way.
    - Automatically follows links for pages that have been moved.

Provides
    - An introductory page with information and links to the built-in pages.
    - Multiple indices of pages stored in cache for easy selection.
    - Interactive or command line control of online/offline status.
    - User selectable purging of pages from cache based on hostname.
    - Interactive or command line option to fetch pages and links recursively.

General
    - Can be used with one or more external proxies based on hostname.
    - Configurable to still allow use on intranets while offline.
    - All options controlled using a simple configuration file.
    - Optional password control for management functions.


Configuring A Web Browser
-------------------------

To use the wwwoffle programs, requires that your web browser is set up to use it
as a proxy.  The proxy hostname will be 'localhost' (or the name of the host
that wwwoffled is running on), and the port number will be the one that is used
by wwwoffled (default 8080).

Netscape V1: In the Options->Preferences dialog window, enter localhost as the
             proxy and 8080 as the port number.
Mosaic V2.6\
Lynx, Arena: Set the environment variable http_proxy to http://localhost:8080/
Emacs-W3   /

You will also need to disable the caching that the web browser performs itself
between sessions to get the best out of the program.


Welcome Page
------------

There is a welcome page at URL 'http://localhost:8080/' that gives a very brief
description of the program and has links to the index pages, interactive control
page and the wwwoffle internet home page.


Index Of Cached Files
---------------------

To get the index of cached files, use the URL 'http://localhost:8080/index/'.
The index allows sorting by time of cacheing, time of last access or
alphabetically.  This index lists all of the hosts, selecting one of these
provides an index for that host.  In this index there is a link to each page,
and a refresh button.  The link gets the cached version, the refresh button
requests a new copy if offline, or gets it if online.  From the main index,
there is also a flat index showing all the pages in the cache that have been
modified in the last week (or a user specifiable time).  This is sorted by
modification time, and split into one day intervals.


Interactive Refresh Page
------------------------

Pages can be specified by using whatever method is provided by the browser that
is used or as an alternative there is an interactive refresh page.  This allows
the user to enter a URL and then fetch it if it is not currently cached or
refresh it if it is in the cache.  There is also the option here to recursively
fetch the pages that are linked to by the page that is specified.  This
recursive fetching can be limited to pages from the same host or opened to fetch
pages from any web server.  This functionality is also provided in the
'wwwoffle' command line program.


Interactive Control Page
------------------------

The behaviour and mode of operation of the wwwoffle demon can be controlled from
an interactive control page at 'http://localhost:8080/control/'.  This has a
number of buttons that change the mode of the proxy server.  These provide the
same functionality as the 'wwwoffle' command line program.  To provide security,
this page can be password protected.


The Programs and Configuration File
-----------------------------------

There are two programs that make up this utility, with three distinct functions.

wwwoffle  - A program to interact with and control the HTTP proxy demon.

wwwoffled - A demon process that acts as an HTTP proxy.
wwwoffles - A server that actually does the fetching of the web pages.

The wwwoffles function is combined with the wwwoffled function into the
wwwoffled program from version 1.1 onwards.  This is to simplify the procedure
of starting servers, and allow for future improvements.

The configuration file, called wwwoffle.conf by default contains all of the
parameters that are used to control the way the wwwoffled and wwwoffles
functions work.


WWWOFFLE - User control program
-------------------------------

The control program (wwwoffle) is used to control the action of the demon
program (wwwoffled), or to request pages that are not in the cache.

The demon program needs to know if the system is online or offline, when to
fetch the pages that have been previously requested and when to purge the cache
of old pages.


The first mode of operation is for controlling the demon process.  These are the
functions that are also available on the interactive control page.

wwwoffle -online        Indicates to the demon that the system is online.

wwwoffle -offline       Indicates to the demon that the system is offline.

wwwoffle -fetch         Commands the demon to fetch the pages that were
                        requested by browsers while the system was offline.
                        wwwoffle exits when the fetching is complete.
                        (This requires the demon to be told it is online).

wwwoffle -config        Cause the configuration file for the demon process to be
                        re-read.

wwwoffle -purge         Commands the demon to purge from the cache the pages
                        that are older than the number of days specified in the
                        configuration file, using modification or access time.

wwwoffle -c <config-file>
                        Can be used with the above options to specify the
                        configuration file that contains the password and server
                        hostname.  If no password is used then no -c option is
                        needed, and the -p option can be used as needed.

wwwoffle -p <host[:port]>
                        Can be used with the above options to specify the
                        hostname and port number that the demon program listens
                        to for control messages.


The second mode of operation is to specify URLs to get.

wwwoffle <URL>          Specifies to the demon a URL that must be fetched.
                        If online then it is got immediately, else the request
                        is stored for a later fetch.

wwwoffle -p <host[:port]>
                        Can be used to specify the hostname and port number that
                        the demon program listens to for HTTP proxy connections.

wwwoffle -i             Specifies that the URLs when fetched are to be parsed for
                        images and these are also to be fetched.

wwwoffle -r[<depth>]    Specifies that the URL when fetched is to have the links
                        followed and these pages also fetched (to a depth
                        specified by the optional depth parameter, default 1).
                        Only links on the same server are to be fetched.

wwwoffle -R[<depth>]    This is the same as the '-r' option except that all of
                        the links are to be followed, even those to other
                        servers.


The other mode of operation is to provide help in using the first two modes.

wwwoffle -h             Gives help about the command line options.


WWWOFFLED - Demon program
-------------------------

The demon program (wwwoffled) runs as an HTTP proxy and also accepts connections
from the control program (wwwoffle).

The demon program needs to maintain the current state of the system, online or
offline, as well as the other parameters in the configuration file.

As HTTP proxy requests come in, the program forks a copy of itself (the
wwwoffles function) to handle the requests.  The server program can also be
forked in response to the wwwoffle program requesting pages to be fetched.


wwwoffled -c <config-file>      Starts the demon with the named configuration
                                file.

wwwoffled -h                    Gives help about the command line options.


There are a number of error and informational messages that are printed to
standard error as the program runs.  The wwwoffles program also uses the same
standard error since it is forked from wwwoffled.

If you are using Linux then it is possible to log the errors to syslog as well
as standard error [thanks to Yannick Versley <sa6z225@public.uni-hamburg.de> for
that patch].  If you are not using Linux but you know how to get syslog to work
then try setting USE_SYSLOG to 1 in errors.c and send me the required patch (if
any).

To have wwwoffled started automatically when the computer is booted, add the
following to the file /etc/rc.d/rc.local:

# The WWWOFFLE HTTP proxy server.
if [ -x /usr/local/sbin/wwwoffled ]; then
/usr/local/sbin/wwwoffled -c /var/spool/wwwoffle/wwwoffle.conf >/dev/null 2>&1 &
fi


WWOFFLES - Server program
-------------------------

The server (wwwoffles) starts by being forked from the demon (wwwoffled) in one
of three different modes.

Real  - When the system is online and acting as a proxy for a browser.
        All requests for web pages are handled by forking a new server which
        will connect to the remote host and fetch the page.  This page is then
        stored in the cache as well as being returned to the browser.  If the
        page is already in the cache then the remote server is asked for a newer
        page if one exists, else the cache one is used.

Fetch - When the system is online and fetching pages that have been requested.
        All web page requests in the outgoing directory are fetched by the
        server connecting to the remote host to get the page.  This page is then
        stored in the cache, there is no browser active.  If the page has been
        moved then the link is followed and that one fetched.

Spool - When the system is offline and acting as a proxy for a browser.
        All requests for web pages are handled by forking a server that will
        either return a cached page or store the request.  If the page is
        cached, it is returned to the browser, else a dummy page is returned
        (and stored in the cache), and the outgoing request is stored.
        If the cached page refers to a page that failed to be downloaded then it
        will be deleted from the cache.

Depending on the existence of files in the spool and other conditions, the mode
can be changed to one of several other modes.

RealNoCache - For requests for pages on the server machine or those specified
        not to be cached in the configuration file.

RealRefresh - Used by the refresh button on the index or the wwwoffle program
        to refetch a page while the system is online.

SpoolGet - Used when the page does not exist in the cache so a request needs to
        be stored for it in the outgoing directory.

SpoolRefresh - Used when the refresh button on the index or the wwwoffle program
        are used, the existing spooled page (if there is one) is not
        overwritten, but a request is stored.

Local - When the server is started as real or spool, but the URL is local.
        This generates the page and returns it to the browser.  It contains a
        list of all of the files that are cached.


WWOFFLE.CONF - Configuration file
---------------------------------

The configuration file (wwwoffle.conf) specifies all of the parameters that
control the operation of the proxy server.  The file is split into sections each
containing a series of parameters as described below.  The sections are
delimited in the file by having the section name alone on a line, a line
containing a single '{', the parameters in the section and a line containing a
single '}'.  Comments are marked by a '#' at the start of the line.


StartUp - This contains the parameters that are used when the program starts,
          changes to these are ignored if the configuration file is re-read
          while the program is running.

  http-port     = <port>   ; An integer specifying the port for the HTTP proxy
                             (default=8080).
  wwwoffle-port = <port>   ; An integer specifying the port for wwwoffle control
                             connections (default=8081).
  spool-dir     = <dir>    ; The name of the spool directory
                             (default=/var/spool/wwwoffle).
  use-syslog    = yes | no ; Whether to use the syslog facility as well as
                             standard error for messages (default=no).
  password      = <word>   ; The password to be used for authentication of the
                             control message (default=none).

  Notes: For the password to work the configuration file must be set so that
         only authorised users can read it.
       : Syslog support is only compiled in on Linux by default, see the note in
         the wwwoffled section above for more information.


Options - Options that control how the program works.

  fetch-images      = yes | no ; Whether to fetch the images that are contained
                                 in pages that are requested while offline and
                                 downloaded later (default=no).
  index-latest-days = <age>    ; The number of days to display in the index of
                                 the latest pages (default=7 days).
  add-info-refresh  = yes | no ; At the bottom of all of the spooled pages the
                                 date that the page was cached and a refresh
                                 button is to be added (default=no).

  Notes: add-info-refresh uses the first named host in the localhost section as
         the server name for the refresh button.


LocalHost - A list of hosts that the host running the wwwoffled server may
            be known by.  This is so that the proxy does not need to contact
            itself to get the server local pages.

  <host> ; A hostname or IP address that in connection with the port number (in
           the StartUp section) specifies the wwwoffle proxy HTTP server.

  Notes: All of these hosts are also used the same way as those in the
         AllowedConnect and DontCache sections.
       : The first named host is the one used by the add-info-refresh option and
         by the -c option to wwwoffle.


AllowedConnect - A list of hosts that are allowed to connect to the server.

  <host> ; A hostname or IP address that is allowed to connect to the server.

  Notes: The host name matches from the right so a domain name matches all hosts
         in the domain, IP addresses match from the left.
       : All of the hosts in LocalHost are also allowed to connect.


DontCache - A list of hosts that are not to be cached by wwwoffled.

  <host> ; A hostname or IP address that is not to be cached by the server.

  Notes: The host name matches from the right so a domain name matches all hosts
         in the domain, IP addresses match from the left.
       : All entries here are assumed to be reachable even when offline.
       : All of the hosts in LocalHost are also not cached.


Proxy - This contains the names of the HTTP proxies to use external to the
          local machine.

  default = <host[:port]> ; The hostname and port on it to use as the default
                            proxy for all pages (default=none).
  <host>  = <host[:port]> ; The hostname and port on it to use as the proxy for
                            the hostname or IP address on the left hand side.

  Notes: The host name (on the left) matches from the right so a domain name
         matches all hosts in the domain, IP addresses match from the left.
       : A hostname that matches more than one entry here uses the proxy of
         the longest matching one.
       : You can use none or no hostname to indicate that a default or
         particular host is not to use a proxy.


Purge - The method to determine which pages to purge and the default age and the
        host specific maximum age of the pages in days.

  use-mtime = yes | no ; The method to use to decide which files to purge, last
                         access time (atime) or last modification time (mtime)
                         (default=no).
  default   = <age>    ; The default maximum age of pages in days (default=28).
  <host>    = <age>    ; The maximum age of pages from the named host or IP
                         address in days.

  Notes: The host name matches from the right so a domain name matches all hosts
         in the domain, IP addresses match from the left.
       : A hostname that matches more than one entry here uses the age of the
         longest matching one.
       : An age of zero means not to keep, negative not to delete.


Author and Copyright
--------------------

The two programs wwwoffle and wwwoffled were written by Andrew M. Bishop in
1996,97 and are copyright Andrew M. Bishop 1996,97.

They can be freely distributed according to the terms of the GNU General Public
License (see the file `COPYING').

If you wish to submit bug reports or other comments about the programs then
email the author amb@gedanken.demon.co.uk and put wwwoffle in the subject line.
