.\" $Header: /home/amb/wwwoffle/RCS/wwwoffle.conf.man 2.38 1999/03/20 17:34:27 amb Exp $
.\"
.\"  WWWOFFLE - World Wide Web Offline Explorer - Version 2.4c.
.\"
.\"  Manual page for wwwoffle.conf
.\"
.\"  Written by Andrew M. Bishop
.\"
.\"  This file Copyright 1997,98,99 Andrew M. Bishop
.\"  It may be distributed under the GNU Public License, version 2, or
.\"  any higher version.  See section COPYING of the GNU Public license
.\"  for conditions under which this file may be redistributed.
.\"
.TH wwwoffle.conf 5 "March 13th, 1999"
.SH NAME
wwwoffle.conf \- The configuration file for the proxy server for the World Wide Web Offline Explorer.
.SH DESCRIPTION
The
.I wwwoffle.conf
file contains the configuration for the wwwoffled proxy HTTP server part of the
.I
World Wide Web Offline Explorer
program.
.LP
The file is split into sections, each of which has a format similar to a C
function (see the example below).  The section has a name, followed on the next
line by an open brace '{', followed on the next lines by a list of names and
values separated by an equals sign '=', the section ends with a close brace
'}' on a line.  Comments are marked by a '#' at the start of the line.
.LP
The
.B StartUp
section can contain the following:
.TP
.B http-port = <port>
The port number to use on the local host as the HTTP proxy (default=8080).
.TP
.B wwwoffle-port = <port>
The port number to use on the local host as the WWWOFFLE control port
(default=8081).
.TP
.B spool-dir = <dirname>
The name of the spool directory to use for the cache. A subdirectory is created
in this for each new web server that is contacted (default=/var/spool/wwwoffle).
.TP
.B run-uid = <username> | <uid> | none |
The username of numeric uid to run the WWWOFFLE server with.  To use this
option, the program must be started by root.
.TP
.B run-gid = <groupname> | <gid> | none |
The groupname of numeric gid to run the WWWOFFLE server with.  To use this
option, the program must be started by root.
.TP
.B use-syslog = yes | no
The syslog facility can be used to log the important error messages.
(default=yes).
.TP
.B password = <word> | none |
The authorisation password that is required to use the wwwoffle program to
configure the server or use the interactive control page (default=none).  If
this is not present, or set to an empty string or 'none' then there is no
password required.  If there is a password set, then the -c option to wwwoffle
must be used, and the file wwwoffle.conf should be made readable only by
authorised users.
.TP
.B max-servers = <integer>
.B max-fetch-servers = <integer>
The maximum number of servers processes that are started (default=8).  The
maximum number of server processes that are forked to fetch pages that were
requested in offline mode (default = 4).  The
.I max-fetch-servers
value must be less than
.I max-servers
or you will not be able to use WWWOFFLE interactively online while fetching.
.LP
The
.B Options
section contains other options that configure the server.
.TP
.B log-level = debug | info | important | warning | fatal
The error messages that have a priority the same as that specified or greater
are recorded on the output, either syslog or stderr (see wwwoffled(1)).
.TP
.B index-latest-days = <age>
The maximum age in days of pages to show in the index of the latest pages
(default=7).
.TP
.B request-changed = <time>
While online pages will only be fetched if the cached version is older than this
specified time in seconds (default=600).  A negative value will force the cache
to always be used in preference.
.TP
.B request-changed-once = yes | no
While online pages will only be fetched if the cached version has not already
been fetched once this session (default=yes).  This option takes precedence over
the request-changed option.
.TP
.B pragma-no-cache = yes | no
Whether to request a new copy of a page if the request has 'Pragma: no-cache'
(default=yes).  This option should be set to 'no' if when browsing offline all
pages are re-requested by a 'broken' browser.
.TP
.B confirm-requests  = yes | no
Whether to return a page requiring user confirmation instead of automatically
recording requests made while offline (default=no).
.TP
.B socket-timeout = <time>
The time in seconds that WWWOFFLE will wait for data to arrive on a socket
connection before timing out and giving an error (default=120 seconds).
.TP
.B connect-retry = yes | no
If a connection cannot be made to a remote server then try again after a short
delay (default=no).
.TP
.B ssl-allow-port = <integer>
A port number that can be used for Secure Socket Layer (SSL) connections,
e.g. https (default=none, for https use 443).  There can be more than one of
these entries to allow other ports.
.TP
.B no-lasttime-index = yes | no
Disables creation of the lasttime/prevtime indexes (default=no).
.LP
The
.B FetchOptions
section contains options that configure the automated downloading of pages.
When pages are requested offline and downloaded later, there is a choice of
whether to fetch stylesheets, images, frames, scripts or other objects
referenced in the HTML.
.TP
.B stylesheets = yes | no
Fetch the style sheets from these pages as well (default=no).
.TP
.B images = yes | no
Fetch the images from these pages as well (default=no).
.TP
.B frames = yes | no
Fetch the frames from these pages as well (default=no).
.TP
.B scripts = yes | no
Fetch the scripts from these pages as well (default=no).
.TP
.B objects = yes | no
Fetch the objects (e.g. Java class files) from these pages as well (default=no).
.LP
The
.B ModifyHTML
section contains options that control how the HTML that is provided from the
cache is modified.  They all rely on the HTML being syntactically correct HTML,
if it is not then the result is undefined.
.TP
.B enable-modify-html = yes | no
Enable the modifications in this section (has a speed penalty) (default=yes).
.TP
.B add-cache-info = yes | no
At the bottom of all of the spooled pages the date that the page was cached and
some buttons are to be added (default=no).
.TP
.B anchor-cached-begin =<HTML code>
Anchors (links) that are cached are to have the specified HTML inserted at the
beginning (default="").
.TP
.B anchor-cached-end = <HTML code>
Anchors (links) that are cached are to have the specified HTML inserted at the
end (default="").
.TP
.B anchor-requested-begin =<HTML code>
Anchors (links) that have been requested are to have the specified HTML inserted
at the beginning (default="").
.TP
.B anchor-requested-end = <HTML code>
Anchors (links) that have been requested are to have the specified HTML inserted
at the end (default="").
.TP
.B anchor-not-cached-begin = <HTML code>
Anchors (links) that are not cached or requested are to have the specified HTML
inserted at the beginning (default="").
.TP
.B anchor-not-cached-end = <HTML code>
Anchors (links) that are not cached or requested are to have the specified HTML
inserted at the end (default="").
.LP
The
.B LocalHost
section contains a list of possible names or IP addresses that the host running
wwwoffled may be known as.
.TP
.I hostname
The server may be known as
.I hostname
so does not need to contact itself to get pages.  The entries must match
exactly.  All of the entries here are also used as if they were in the LocalNet
and AllowConnect sections.  None of the entries here are fetched via a proxy.
.LP
The
.B LocalNet
section contains a list of host names or IP addresses that are not to be cached
because they are on the local network.
.TP
.I hostname
A server that matches
.I hostname
is on the local network and not to be cached.  The matching uses wildcards as
described in the WILDCARD section.  All entries here are assumed to be reachable
even when offline.  All of the entries in the LocalHost section are also not
cached as if they were here also.  None of the entries here are fetched via a
proxy.
.LP
The
.B AllowedConnectHosts
section contains a list of host names or IP addresses that are allowed to
connect to the server.
.TP
.I hostname
A server that matches
.I hostname
is allowed to connect to the server.  The matching uses wildcards as described
in the WILDCARD section.  All of the entries in the LocalHost section are also
allowed to connect.
.LP
The
.B AllowedConnectUsers
contains a list of the users that are allowed to connect to the server.
.TP
.I <username>:<password>
The username and password of the users that are allowed to connect to the
server.  The username and password are both stored in plaintext format.  This
requires the use of browsers that handle the HTTP/1.1 standard.
.LP
The
.B DontCache
section contains a way of recognising URLs not to be cached.  They will still
be cached however if it is fetched non-interactively.
.TP
.B URL-SPECIFICATION
Don't cache files that match
.B URL-SPECIFICATION.
See the URL-SPECIFICATION section for details of the
.B URL-SPECIFICATION
option.
.LP
The
.B DontGet
section contains a way of recognising URLs not to be got.  This can be used to
reject junk adverts for example.
.TP
The
.B URL-SPECIFICATION
is the same as in the DontCache section.
.TP
The
.B replacement = <URL>
option allows a replacement URL to be specified that will be used to replace all
URLs that match any of the URL-SPECIFICATIONS in this section.
.LP
The
.B DontGetRecursive
section contains a way of recognising URLs not to be got when getting
recursively.
.TP
See the DontCache section for a description of the options in this section.
.LP
The
.B DontRequestOffline
section contains a way of recognising URLs not to requested by users when offline
.TP
See the DontCache section for a description of the options in this section.
.LP
The
.B CensorHeader
section contains a list of the header lines that are to be removed from the
request sent from the browser to the server.
.TP
.I header = <string> | none |
The lines in the request that start with
.I header
, followed by a ':' are removed before being passed to the server if there is no
string on the right hand side, else that string replaces the one from the
browser.  This option does not allow you to add headers that were not present in
the browser request.
.TP
.B referer-self = yes | no
Sets the Referer header to the same as the URL (default = no).
.TP
.B referer-self-dir = yes | no
Sets the Referer header to the URL directory name (default = no).  This option
takes precedence over referer-self if both are set.
.LP
The
.B FTPOptions
section contains the information that is required to be able to do anonymous ftp.
.TP
.B anon-username = <string>
Specifies the username to use to fetch files using ftp (default is "anonymous",
"ftp" is another option).
.TP
.B anon-password = <string>
Specifies the password to use to fetch files using ftp (default is determined
from the user running wwwoffled and the hostname, this may not work reliably
especially if you are behind a firewall).
.TP
.B auth-hostname = <host[:port]>
.B auth-username = <string>
.B auth-password = <string>
Specifies a triplet of hostname, username and password that allow non-anonymous
access to a specific server. (These options must come in groups of three.)  The
auth-hostname must match exactly, no wildcards are used.
.LP
The
.B MIMETypes
section is a list of the mime type to associate with files that are not fetched
using HTTP.  This is required by browsers, most browsers come with a list that
can be used here.
.TP
.B default = <mime-type>/<subtype>
The default MIME type to use for files that do not match any of the other rules.
.TP
.I .<file-ext> = <mime-type>/<subtype>
The MIME type to use for files that match the file extension.
.LP
The
.B Proxy
section contains a list of the hosts that are to be served via specified proxy
servers.  If no proxy is required then use 'none' or leave the proxy name blank.
.TP
.B default = <hostname:[port]> | none |
Specifies the default proxy that all requests are to use.
.TP
.I URL-SPECIFICATION = <hostname:[port]> | none |
For URLs that match
.I URL-SPECIFICATION
use the specified proxy.
See the URL-SPECIFICATION section for details of the
.B URL-SPECIFICATION
option.
.TP
.B auth-hostname = <host[:port]>
.B auth-username = <string>
.B auth-password = <string>
Specifies a proxy server host that requires proxy authentication by username and
password to use it.  (These options must come in groups of three.)  The
auth-hostname must match exactly, no wildcards are used.
.TP
.B ssl = <hostname:[port]> | none |
A proxy server that should be used for Secure Socket Layer (SSL) connections
e.g. https (default = none).
.LP
None of the entries in the LocalHost or LocalNet section are fetched using a
proxy.
.LP
The
.B DontIndex
section contains a way of recognising URLs not to be indexed.
.TP
.B outgoing = URL-SPECIFICATION
Do not index any URLs that match
.I URL-SPECIFICATION
in the outgoing index.
.TP
.B latest = URL-SPECIFICATION
Do not index any URLs that match
.I URL-SPECIFICATION
in the lasttime/prevtime/latest indexes.
.TP
.B monitor  = URL-SPECIFICATION
Do not index any URLs that match
.I URL-SPECIFICATION
in the monitor index.
.TP
.B host     = URL-SPECIFICATION
Do not index any URLs that match
.I URL-SPECIFICATION
in the host indexes.
.TP
.B URL-SPECIFICATION
Do not index any URLs that match
.I URL-SPECIFICATION
in any of the indexes.
.LP
See the URL-SPECIFICATION section for details of the
.B URL-SPECIFICATION
option.
.LP
The
.B Alias
A list of aliases that are used to replace the server name and path with another
server name and path.  Also for servers known by two names.
.TP
.I URL-SPECIFICATION1 = URL-SPECIFICATION2
When a request matching
.I URL-SPECIFICATION1
is used the request is modified into a request for
.I URL-SPECIFICATION2
, the two are also considered identical for the purposes of indexing, purging
and recursive fetching.
The
.I URL-SPECIFICATION
must not be a wildcard match.
.LP
A symbolic link in the cache between 
.I protocol1/hostname1
and
.I protocol2/hostname2
will have the same effect as specifying protocol1://hostname1: =
protocol2://hostname2: in the config file.  These links are only checked when
WWWOFFLE starts or when 'wwwoffle -config' is run.
.LP
The
.B Purge
section controls how the cache is purged.  The method to determine which pages
to purge, the default age, the host specific maximum age of the pages in days,
and a maximum allowed cache size.  An age of zero means to always delete when a
purge is done, a negative age means never purge.  The maximum cache size and
minimum free space include the files that are from hosts that are marked never
to be purged but will not purge them.
.TP
.B use-mtime = yes | no
The decision of which pages to purge can be made on last access time (atime) or
last modification time (mtime) (default=no).
.TP
.B max-size = <size>
The maximum size of the cache in MB after purging, excluding the hosts that are
never to be purged, if this is zero then it does not apply (default=0).
.TP
.B min-free = <size>
The minimum amount of free disk space in MB after purging, excluding the hosts
that are never to be purged, if this is zero then it does not apply (default=0).
.TP
.B use-url = yes | no
If true then use the URL to decide on the purge age, otherwise use the protocol
and host only (default=no).
.TP
.B del-dontget = yes | no
If true then delete the files from hosts that are in the DontGet section
(default=no).
.TP
.B del-dontcache = yes | no
.TP
If true then delete the files from hosts that are in the DontCache section
(default=no).
.B default = <age>
The age to purge hosts that are not otherwise specified here (default=28).
.TP
.I URL-SPECIFICATION = ...
The age to purge hosts with URLs that match
.I URL-SPECIFICATION
this does not include the path and extension part which are ignored.
See the URL-SPECIFICATION section for details of the
.B URL-SPECIFICATION
option.
.LP
.SH WILDCARD
A wildcard match is one that uses the '*' character to represent any group of
characters.
.LP
This is basically the same as the command line file matching expressions in DOS
or the UNIX shell, except that the '*' can match the '/' character.  A maximum
of 2 '*' characters can be used in any wildcard.
.LP
For example
.LP
 *.gif      matches  foo.gif and bar.gif
 *.foo.com  matches  www.foo.com and ftp.foo.com
 /foo/*     matches  /foo/bar.html and /foo/bar/foobar.html
.SH URL-SPECIFICATION
When specifying a host and protocol and pathname in many of the sections a
.B URL-SPECIFICATION
can be used, this is a way of recognising a URL.
.LP
For the purposes of this explanation a URL is considered to be made up of four
parts.
.TP
.B proto
The protocol that is used (e.g. 'http', 'ftp')
.TP
.B host
The server hostname (e.g. 'www.gedanken.demon.co.uk').
.TP
.B port
The port number on the host (e.g. default of 80 for HTTP).
.TP
.B path
The pathname on the host (e.g. '/bar.html') or a directory name (e.g. '/foo/').
.LP
For example the WWWOFFLE homepage: http://www.gedanken.demon.co.uk/wwwoffle/
The protocol is 'http', the host is 'www.gedanken.demon.co.uk', the port is
the default (in this case 80), and the pathname is '/wwwoffle/'.
.LP
In general this is written as <proto>://<host>[:<port>]/<path>
.LP
Where [] indicates an optional feature, and <> indicate a user supplied name
or number.
.LP
Some common URL-SPECIFICATION options are the following:
.TP
.B *://*
Any protocol, Any host, Any port, Any path (This is that same as saying 'default').
.TP
.B *://*/<path>
Any protocol, Any host, Any port, Named path
.TP
.B *://*/*.<ext>
Any protocol, Any host, Any port, Named path.
.TP
.B *://<host>
Any protocol, Named host, Any port, Any path
.TP
.B <proto>://
Named protocol, Any host, Any port, Any path
.TP
.B <proto>://<host>
Named protocol, Named host, Any port, Any path
.TP
.B <proto>://<host>:
Named protocol, Named host, Default port, Any path
.TP
.B <proto>://<host>:<port>
Named protocol, Named host, Named port, Any path
.LP
The matching of the host and the path use the wildcard matching that is
described above.
.SH EXAMPLE
 StartUp
 {
  http-port     = 8080
  wwwoffle-port = 8081
  spool-dir     = /var/spool/wwwoffle
  use-syslog    = yes
  password      =
 }

 Options
 {
  index-latest-days = 14
  add-info-refresh  = no
  request-changed   = 3600
 }

 FetchOptions
 {
  images      = yes
  frames      = yes
 }

 LocalHost
 {
  wwwoffle.foo.com
  localhost
  127.0.0.1
 }

 LocalNet
 {
  *.foo.com
 }

 AllowedConnectHosts
 {
  *.foo.com
 }

 Proxy
 {
  http://foo.com/* = www-cache.foo.com:8080
 }

 Purge
 {
  default  = 28
  max-size = 10
  http://*.bar.com/*  = 7
 }
.SH FILES
CONFDIR/wwwoffle.conf The wwwoffled(8) configuration file.
.LP
SPOOLDIR The WWWOFFLE spool directory.
.SH SEE ALSO
wwwoffle(1), wwwoffled(8).
.SH AUTHOR
Andrew M. Bishop 1996,1997,1998,1999 (amb@gedanken.demon.co.uk)
