Serving and Searching Structured Files

   Gn has the capability to serve a single large file consisting of a
   number of sections so that each section appears to the client as a
   separate file with its own title. This is a generalization of the
   "mailfile" feature available on the Minnesota server. To use this
   feature requires two additional fields in the menu file, called
   "Separator" and "Section". These are regular expressions as in grep
   which are used to match lines which will be used as separators of the
   parts of the large file and lines which will be used for menu section
   items. Thus for a mail file one would use the lines
   

     Separator=^From
     Section=^Subject:

   
   
   The first line, which should have a literal space at the end not the
   word , means that sections (in this case mail messages) are separated
   by lines starting with From and a space. The ^ matches the start of a
   line and the space is necessary because some lines begin with From and
   a colon.
   
   Here's another example. This document consists of sections with
   section headings lines written all in caps. Since I want to make a
   menu with each section a separate item I use the following entry in my
   menu file

     Name=Installation/Maintenance Guide Sections
     Path=1m/docs/Install
     Separator=^[A-Z][A-Z" ]*$
     Section=^
     Type=1

   
   
   The Separator is ^[A-Z][A-Z "]*$. This matches any line starting with
   a letter from A to Z (i.e. caps) followed by any number of characters
   which are between A and Z or equal to space or the quotation mark, and
   then the end of the line. This describes the section headings of this
   document. I need the initial [A-Z] so blank lines won't be matched.
   
   When the separator field is matched a new section is started which
   will have its own menu item. The title of the menu item is determined
   by the Section regular expression. In fact the section is searched,
   starting with the separator line, for a match for this second regular
   expression. When a match is found, everything on the line _after_ the
   matching pattern is taken as the title. Thus for mail everything after
   the word "Subject:" becomes the title. In the example of this
   document, the expression ^ matches the beginning of the separator line
   so that whole line becomes the menu title. To see this in use gopher
   to hopf.math.nwu.edu and look in the documentation directory for this
   document.
   
   Another example of how this might be used is for a directory. If a
   file consists of entries like

     Name: Franks, John
     Address:  Department of Mathematics, Northwestern University
     Phone: 708-491-5548
     etc., etc.

   
   
   then Separator=^Name: and Section=^Name: would give a menu with an
   item

     1.  Franks, John

   
   
   which when selected would would give the multiline record with my
   name, address, etc.
   
   There is a slight variation on this which is sometimes very useful If
   the Section= regular expression starts with $ then the $ is skipped
   and the remainder is used for the line AFTER the line matching the
   Separator rather than the line matching the separator itself. Thus if
   my address/phone book above had each the information for each
   individual separated by a blank line then I could use

     Separator=^$
     Section=$^Name:

   
   
   This means the separator is a blank line and to get the Section (i.e.
   title) go to the next line then match '^Name: ' and use everything
   after it as the title. The lines

     Separator=^$
     Section=$^

   will use a blank line as the separator and the entire line after the
   blank line as the title. Of course you will have to remember to put a
   leading blank line in your phone book or the first entry will not be
   shown.
   
   In the example above the "1m" at the beginning of the Path field
   indicates that this is a structured file. It is Type 1 because to
   clients it will look like a directory. If we add an additional menu
   entry like

     Name=Search Installation Guide
     Path=7m/docs/Install
     Type=7

   
   
   which is Type 7 and has a path beginning with "7m" the client will
   prompt the user for a search term which can be a regular expression.
   The _gn_ server will return a menu with only those sections containing
   a match for the regular expression. Thus for the directory example if
   the user searched for Northwestern she would get only those directory
   entries containing that word.
   
   Here's how this works. When mkcache is run with a menu file containing
   the "1m" entry above it produces the regular .cache file but also
   produces another file (in this case called Install..cache) which is a
   cache file for the sections of the file Install specified in this menu
   item. The lines in this cache file contain the menu titles obtained
   from the file by matching regular expressions and contain a selector
   which designates a range of bytes corresponding to a section of the
   document. Gn knows how to serve a single section of document when
   given one of these byte range selectors.
   
   Since the file Install..cache was made when the item with path
   1m/docs/Install was encountered we it is not necessary to remake when
   the item with path 7m/docs/Install is reached. We signal this by
   omitting the Separator and Section fields from this menu item. If
   these fields were in both items the cache file Install..cache would be
   made twice and the one done last would take effect if there was a
   difference in the regular expressions given. Of course if the regular
   expressions are omitted from both then the cache file will not be made
   and attempts to access either item will result in an error
   (cryptically reported as "Access denied"). For this reason, whenever
   an item of type 1m or 7m with no regular expressions is encountered by
   mkcache, a warning message is printed.
   
   It is easy to effectively use two different separator regular
   expressions or two different section expressions for the same file.
   You might for example want to have a mail file with menu by subject
   and another menu by author. To do this you must make a UNIX link (see
   the man page ln(1)) to give the mail file an additional name and use
   the two different names in the menu file Path entries. This is
   necessary so the cache files created will have different names.
   
   The two regular expressions for the separator and the menu titles are
   not put in to the selector string. Thus they are not available to the
   client to change.
   
   Note: All regular expressions given as search terms and all lines in
   which a match is sought are converted to lower case before the
   matching is attempted. This has the (desirable) effect of making all
   searches case insensitive. By contrast the regular expressions used to
   define separators and menu lines are case sensitive. Regular
   expressions which can be used for the separator and section strings
   are essentially the same as those allowed by grep with the addition of
   the special character ~ which matches word boundaries. To give special
   characters (including ^ ~ [ ] ( ) * . \ and $) their regular meaning
   they must be escaped with a \.
   
.