hglogstat - create statistics about WWW-access to Hyper-G servers
hglogstat parameters
For a short description of the parameters try hglogstat -h. For more details see below.
Based on the logfiles produced by the WWW-gateway, hglogstat produces access-statistics by collecting various information (see Modes). Depending on the users' selection the tool may put the results into an HTML-document which is immediately inserted into a defined Hyper-G collection (along with some graphic representation), or it writes detailed information about requested objects, searches and failed searches to a file, or, as a third possibility, the tool may produce overall statistics, based on information gathered during either of the previous modes. (The first two actions may be combined in a single run, the third one needs an extra run.)
hglogstat may execute in three different modes, the first two of which can be combined into a single run.
In this mode, information in collected from the logfiles and presented in following categories:
In this mode, all requested objects, searches and failed searches (all along with the number of
occurrence) are written to a file. From there, this information may further be processed by other tools.
The script
collstat, for example, uses these files to produce statistics about single collections instead of the whole server.
When hglogstat executes in the first mode, it outputs the number of sessions to the file sessions.log, the number of requests to the file requests.log. (These files are in the same directory as the script.) Taking this information, daily and monthly summaries may be generated in this mode.
It has been mentioned above that unwanted items may be excluded from the summaries by adding their titles to the
hglogstat.rc file. This file must be located in the same directory as hglogstat.
The list of items in this file may
be divided into several categories, each headed by a line identifying the type of objects to follow. So far, requested objects,
entry pages and user agents may be skipped, the corresponding heading lines are _SKIP_OBJECTS_, _SKIP_ENTRIES_ and
_SKIP_AGENTS_.
Lines starting with # are considered to be comments.
It shall be emphasized, however, that the items that appear in hglogstat.rc are excluded from the top-n lists only; they
still count as requested objects or entry pages!
An example of an hglogstat.rc file:
# unwanted objects _SKIP_OBJECTS_ options.gif home.gif search.gif info.gif coll_open.gif coll_clos.gif # unwanted entry pages _SKIP_ENTRIES_ / identify.gif options.gif search.gif help.gif home.gif coll_open.gif coll_clos.gif text.gif info.gif
hglogstat is a perl script and takes advantage of the features new in perl 5. So, the first prerequisite is
perl 5 to be installed on your system.
The graphics are produced by Gnuplot, which is called by the script. So,
this has to be installed, too. Since Gnuplot does not produce gif outputs (at least my version 3.5 (pre 3.6) does
not), ppmtogif is called to do the translation. So, this, too, should be installed.
Finally, insertion of the HTML document into the database is done by hginstext. If you have this installed on your
system, too, nothing can keep you from working with hglogstat.
Of course, there are some minor bugs, but none of them is really serious.
These requests sometimes are the start of a new session, sometimes they are not. In the logfile, however, they
are simply declared as POST Requests. As a consequence, the exact number of sessions cannot be figured out, the result slightly
diverges from the result obtained by analyzing the dbserver's logfiles.
In numbers, the deviation within a month is a few
hundred, which is less than 0.5% and usually may be neglected.
To eliminate this bug, the logfile's format must be changed, which it will anyway soon.
It is a great graphics tool, but sometimes it behaves a bit strange.
I place two plots on one screen, and
although they start at the same x-position, the second plot is moved one unit to the right - but only on some architectures. There
is a simple remedy to this - forcing a plot at (0,0) which is invisible - but this produces faulty behaviour on other architectures.
Till now, I have not found an elegant solution.
Alfons Schmid (aschmid@iicm.tu-graz.ac.at) - April 2, 1996