
Information regarding urlmonrc, the URL database for urlmon. 

(Skip down to the line containing the word "Syntax" if you are looking for
information on the new urlmonrc file format.) 

(In the following, whenever I say "checksum", I implicitly mean the
results of all filters, not just the checksum filter.  See FILTERS.txt.) 

Determining the urlmonrc file
-----------------------------

Throughout these documents, I refer to the term 'urlmonrc' file (or
sometimes 'last_modified database') as a generic name for the database of
URLs and their modifications.  Here's how you can determine that actual
file name. 

urlmon maintains a database (defaults to "~/.urlmonrc", but can be changed
on the command line with -f, and can be changed permanently by editing the
script and changing the variable "$mods_file") of URLs and timestamps and
checksums (collectively "checkstamps").

Actually, and this is new with version 2, if <progname> is whatever urlmon
is invoked as, then the database file will be

  "~/.<progname>rc"

This is so that if you can have multiple rc files, by either copying
urlmon to a different filename, or using a symbolic (or hard)  link. 
e.g., on UNIX: 

ln -s urlmon linuxmon

will, when 'linuxmon' is executed, look for the file '~/.linuxmonrc'.  But
when executing 'urlmon', it will look for '~/.urlmonrc' as its rc file. 

The '-f' flag still works as before, but this is cooler. 

Invoking urlmon on the urlmonrc database
----------------------------------------

Invoking urlmon as listed in the section entitled 'Basic usage' of the
README.txt file is a good way (well, the only way :) to add entries to the
database.  But the power comes with the '-l' switch.  This will cause
urlmon to read in the URLs in its database file and treat them as though
they were listed on the command line a la 'Basic usage'.  In its simplest,
the script

	urlmon -l | mail userid@host.foo

executed regularly will keep 'userid' aware of the status of the URLs in
his or her URL database.  However, it will also cause mail to be sent even
if there are no URLs changed, so something slightly more complicated is
needed. (See section 5 of README.txt for such a script.) 

If you have smart cron dameon, you can simply run 'urlmon -lc' and it will
mail you the result _only if_ there is any output. 


Viewing urlmonrc
----------------

'urlmon -p' will print out the contents of the database, printing the
timestamps in human readable format (as opposed to in seconds from the
epoch, readable by machines and UNIX Gurus), and printing "(time not
available)" if a checksum is used for a particular URL, and then print the
URL itself. 



Comments in urlmonrc file
-------------------------

Giving urlmon a '-C' switch allows you to specify a comment for whatever
URL(s) on the command line.  The comment will be placed in the urlmonrc
file, so that you can recall why the heck you had put that particular URL
in there in the first place.  Please be careful with the syntax.  You need
to specify a URL immediately followed by its comment.  If you want spaces
in the comment, be sure to enclose it (just the comment) in quote marks,
like this: 

urlmon -C http://foo.bar.edu/file.html "A foobared file" http://foo.baz.edu/file.html "A foobazzed file"


Syntax for the urlmonrc file (as of version 3.0)
----------------------------

The new syntax for the urlmonrc file is such that it should be easily
expanded and remain backwards and forewards compatible (at least to the
point that older versions (but still version 3.0 or newer) of urlmon can
read new versions of the format without choking).

It consists of one URL entry per line to be reported.  Each line consists
of some number of attributes, each separated by white space.  An attribute
has the form of

	ATTR=value

Where ATTR is the attribute name, and value is the value of that attribute
for this entry.  Do not put spaces in the value unless you surround the
value by quotes.  Haveing ATTR= (blank) is allowed. 

For version 4.0, these are the following attributes.  Any other attribute
will be ignored. 

URL		-- required attribute
---

This is the key for the whole line.  It defines what URL should be
monitored.  The only lines that won't have this are comments and CODE
lines (see below). 

ex.

URL=http://camelot.syr.edu


MOD		-- required (more or less)
---

This is the data used, combined with data pulled from the URL itself, to
determine when a URL has been modified.  It is required, but if it is not
present it will be set to 0 (or equivalent for that data type) and later
updated. 

ex.

MOD=889334867

COMM		-- not required
----

This is a comment if you want to give more info. about the URL.  If you
put double quotes around it, there can be spaces in it. 

ex.

COMM="Hey there hi there ho there!"


FILTER		-- not required
------

This attribute tells urlmon what logic to use to determine if the URL has
changed.  It is not required, but as a consequence of how the default case
code works, it will get filled in by urlmon, so don't be surprised when
you see it.  When no filter is specified, urlmon works as it always did. 
It tries to use the timestamp method, but if there is no timestamp, or
using checksums has been forced, it will use the checksum method. 

ex.

FILTER=none
FILTER=default
FILTER=timestamp
FILTER=checksum
FILTER=lines
FILTER=regexp

Note that value 'none' equals value 'default'.  In turn, 'default' equals
'timestamp' if the web server provides this information.  It equals
'checksum' if the web server does not. 

See FILTERS.txt for more information.



FARGS		-- not required
-----

This attribute allows you to pass arguments to your filter.  It is not
required.  Assuming the filters are well-written, having an FARGS
statement when the filter requires no arguments, or having no FARGS
statement when the filter does require arguments, shouldn't be fatal, but
will probably give inaccurate results. 

FARGS have the following syntax:

FARGS=arg

FARGS=arg1,arg2,arg3,arg4

See FILTERS.txt for more information.


DISPLAY		-- not required
-------

(Thanks to Bill Dyess for the patch and the good idea.)

It gives urlmon the ability to keep a string that will be printed out
whenever a certain URL has changed.  This is nice for those times when you
want to keep an eye on a certain part of a frameset, but when this part
changes, you want to be told that the whole frameset has changed, not just
one part. 

For example, say the following URL is one that has data on it you want to
watch: 

	http://camelot.syr.edu/foo/topoframe.html

And that this URL:

	http://camelot.syr.edu/foo/frameset.html

looks like this:

<HTML><HEAD><TITLE>Watch me</TITLE></HEAD>
<FRAMESET>
	<FRAME SRC="http://camelot.syr.edu/foo/topoframe.html">
	<FRAME SRC="http://camelot.syr.edu/foo/bottomoframe.html">
</FRAMESET>
</HTML>

Then, in your .urlmonrc file, you'd have the following line:

URL=http://camelot.syr.edu/foo/topoframe.html MOD=XXXX DISPLAY=http://camelot.syr.edu/foo/frameset.html

Now, whenever the URL in the URL= line changes, urlmon will print the
string "http://camelot.syr.edu/foo/frameset.html". 

You don't have to set DISPLAY to be a URL.  It could be any string, and
thus could be a simple reminder to do something.

CODE		-- not required
----

This tag appears on a line by itself.  There should be no other tags on
the same line as it.  It globally affects the execution of urlmon. 

This tag allows you to specify a file that urlmon will read in and execute
as perl code.  It could contain any valid perl code and it MUST end with
the following line: 

1;

(Or anything else the returns true, but put this line in just to be safe.) 

The argument to it is a file.  It can be an absolute or relative path
name, and the file it will be searched for in the current directory, the
user's home directory, and in the directories listed in @INC (see perl's
documentation for what @INC is). 

For information on how and why to use CODE tags, see FILTERS.txt

example:

CODE=code.urlmon
CODE=/home/jdimpson/code.urlmon
