Serving and Searching Structured Files Gn has the capability to serve a single large file consisting of a number of sections so that each section appears to the client as a separate file with its own title. This is a generalization of the "mailfile" feature available on the Minnesota server. To use this feature requires two additional fields in the menu file, called "Separator" and "Section". These are regular expressions as in grep which are used to match lines which will be used as separators of the parts of the large file and lines which will be used for menu section items. Thus for a mail file one would use the lines Separator=^From Section=^Subject: The first line, which should have a literal space at the end not the word , means that sections (in this case mail messages) are separated by lines starting with From and a space. The ^ matches the start of a line and the space is necessary because some lines begin with From and a colon. Here's another example. This document consists of sections with section headings lines written all in caps. Since I want to make a menu with each section a separate item I use the following entry in my menu file Name=Installation/Maintenance Guide Sections Path=1m/docs/Install Separator=^[A-Z][A-Z" ]*$ Section=^ Type=1 The Separator is ^[A-Z][A-Z "]*$. This matches any line starting with a letter from A to Z (i.e. caps) followed by any number of characters which are between A and Z or equal to space or the quotation mark, and then the end of the line. This describes the section headings of this document. I need the initial [A-Z] so blank lines won't be matched. When the separator field is matched a new section is started which will have its own menu item. The title of the menu item is determined by the Section regular expression. In fact the section is searched, starting with the separator line, for a match for this second regular expression. When a match is found, everything on the line _after_ the matching pattern is taken as the title. Thus for mail everything after the word "Subject:" becomes the title. In the example of this document, the expression ^ matches the beginning of the separator line so that whole line becomes the menu title. To see this in use gopher to hopf.math.nwu.edu and look in the documentation directory for this document. Another example of how this might be used is for a directory. If a file consists of entries like Name: Franks, John Address: Department of Mathematics, Northwestern University Phone: 708-491-5548 etc., etc. then Separator=^Name: and Section=^Name: would give a menu with an item 1. Franks, John which when selected would would give the multiline record with my name, address, etc. There is a slight variation on this which is sometimes very useful If the Section= regular expression starts with $ then the $ is skipped and the remainder is used for the line AFTER the line matching the Separator rather than the line matching the separator itself. Thus if my address/phone book above had each the information for each individual separated by a blank line then I could use Separator=^$ Section=$^Name: This means the separator is a blank line and to get the Section (i.e. title) go to the next line then match '^Name: ' and use everything after it as the title. The lines Separator=^$ Section=$^ will use a blank line as the separator and the entire line after the blank line as the title. Of course you will have to remember to put a leading blank line in your phone book or the first entry will not be shown. In the example above the "1m" at the beginning of the Path field indicates that this is a structured file. It is Type 1 because to clients it will look like a directory. If we add an additional menu entry like Name=Search Installation Guide Path=7m/docs/Install Type=7 which is Type 7 and has a path beginning with "7m" the client will prompt the user for a search term which can be a regular expression. The _gn_ server will return a menu with only those sections containing a match for the regular expression. Thus for the directory example if the user searched for Northwestern she would get only those directory entries containing that word. Here's how this works. When mkcache is run with a menu file containing the "1m" entry above it produces the regular .cache file but also produces another file (in this case called Install..cache) which is a cache file for the sections of the file Install specified in this menu item. The lines in this cache file contain the menu titles obtained from the file by matching regular expressions and contain a selector which designates a range of bytes corresponding to a section of the document. Gn knows how to serve a single section of document when given one of these byte range selectors. Since the file Install..cache was made when the item with path 1m/docs/Install was encountered we it is not necessary to remake when the item with path 7m/docs/Install is reached. We signal this by omitting the Separator and Section fields from this menu item. If these fields were in both items the cache file Install..cache would be made twice and the one done last would take effect if there was a difference in the regular expressions given. Of course if the regular expressions are omitted from both then the cache file will not be made and attempts to access either item will result in an error (cryptically reported as "Access denied"). For this reason, whenever an item of type 1m or 7m with no regular expressions is encountered by mkcache, a warning message is printed. It is easy to effectively use two different separator regular expressions or two different section expressions for the same file. You might for example want to have a mail file with menu by subject and another menu by author. To do this you must make a UNIX link (see the man page ln(1)) to give the mail file an additional name and use the two different names in the menu file Path entries. This is necessary so the cache files created will have different names. The two regular expressions for the separator and the menu titles are not put in to the selector string. Thus they are not available to the client to change. Note: All regular expressions given as search terms and all lines in which a match is sought are converted to lower case before the matching is attempted. This has the (desirable) effect of making all searches case insensitive. By contrast the regular expressions used to define separators and menu lines are case sensitive. Regular expressions which can be used for the separator and section strings are essentially the same as those allowed by grep with the addition of the special character ~ which matches word boundaries. To give special characters (including ^ ~ [ ] ( ) * . \ and $) their regular meaning they must be escaped with a \. .