-----------------NOTE -----------------

This is a design spec; fileParser is 
vapourware at this point, existing only
in my mind. 

Due it being such a complex and vital
part of the future of Olympus it is 
necessary to have it designed in detail
before development begins.

This document outlines the future
design of fileParser. Development will
be commencing the week of July 24, 2000

---------------------------------------

fileParser
==========

A new class is being written in Olympus to make dealing with configuration files
easy, standardized and robust: fileParser.

fileParser takes two file, an input file and a template file. It 
applies the template file to the input file to produce a third file which it
indexes in RAM. The fileParser object can then be made to return individual items,
groups of items, iterate over items, change item values, add new items, 
delete existing items and [copy|cut]/paste items from one location to another.
Eventually there will also be an undo/redo capability built in. Once a program
is finished using all these great features, it can then make fileParser write
out a final plain-text version that incorporates all the changes.

fileParser is being created to address the following needs:

    o A means to parse files that is robust and can be reused.
        - Its a class that depends on nothing but itself. Any improvements
          to it affect all who use it.
    o A way to write a parser that doesn't break very easily.
        - Tag based data representation is extremely robust: if your target
          changes, your program will most probably not break. Worst case 
          scenario is that you will need to generate a new template file.
    o A way to share parsing efforts between plugins/projects.
        - A common parsing mechanism makes sharing file parsing efforts 
          trivial: just load a file with the appropriate template file.
    o A means to edit parsed files
        - This can be a tricky endevour, especially when it is someone
          else's file format and it needs to be 100% right.
    o A way to standardize what happens to config files after an edit
        - Comments _must_ be kept, order _must_ be preserved. fileParser will
          enforce a common set of policies.
    o A way to parse and use files that are so large we really don't want to load
      the whole thing into main memory.
        - Think sendmail, or a large Apache installation.
    o A way to be lazy when wanting to write something but need to parse a config
      file first.
        - People seem to produce more when they are encouraged to be lazy. Go figure.


fileParser Public Interface
===========================

There are three sets of functions in the fileParser interface: general methods, scope
methods, and instance methods. General methods are used to do things such as parse, 
write and get state information. Instance methods are the methods that allow editing,
fetching, ordering, etc of individual data items. Scope methods are the means to move
throughout the file in a structured manner.

fileParser generates a structured tree of the target file in memory by parsing it
according to definitions in the template file. Since it is often impossible to
know ahead of time how many branches and leaves this tree will have or how they 
will be arranged, it is equally impossible to be able to know pre run-time what
set of data a given edit/access method will act upon. This is the purpose of scope
methods: they allow one to set the scope within which an instance method is to be
applied. Scope methods allow navigation of the tree and other functions that affect
the set of data instance methods act upon.

A fileParser object has an internal cursor that marks its current position in the 
parsed tree. Scope methods move this cursor. The current cursor position can be 
bookmarked so that going back to it later on is trivial. This also allows for
better performance. For instance, a bookmark could be made at the head of a branch
that is expected to be quite long. Then, instead of making its way back up the branch
or starting from the top of the tree, the program can have the fileParser object
jump directly to the bookmarked position.

General Methods
---------------
fileParser(char* targetFile, char* templateFile) 
    Constructor; sets target and template file paths
    
char* targetFile() - Returns the path to the target file.
char* setTargetFile(char*) - Set the path to the target file.

char* templateFile() - Returns the path to the template file.
char* setTemplateFile(char*) - Set the path to the template file.

int parse() - Starts an attempt to parse target file. Returns 1 on success, 
    0 on failure. Upon success, it sets the cursor to "top" of the parsed tree.
    Any instance methods called at this point will act on the top level items.

int errorno() - Returns the current error, which are enum{} in fileParser.h


Instance Methods
----------------
void getValue(char* var) - Sets var to the value associated with the current element 
or NULL if there is either no value or no current applicable element. If var pointed
to anything before getValue(), it will no longer have a reference to it. This is an easy
way to create memory leaks if not careful.

void getValue(int index, char* var) - Similar to getValue(char* var). This variation is
used when accessing an element marked as MULTI in the template file. index is the 0-counted
index that one wishes to access.

const char* getType() - Returns the type of the element currently in scope.

void newElement(char* type, char* value) - Creates a new element of type type (as defined in 
the template file) immediately after the current elmeent in scope.

void newElement(int index, char* var) - Creates a new element in an item declared as MULTI
in the template file. This will fail to do anything if used on a non-multi item. The element
will be of the type defined in the template file for this MULTI item.


                    --- Clipboard Methods ---
int cut() - Cuts the current item for later pasting. It returns the index number of the
item in the clipboard (which can hold multiple items). Since only a pointer is made and a 
few int's modified this is a fairly efficient method.

int copy() - Makes a copy of the current item for later pasting. Returns the index number
of the item in the clipboard. Since only a pointer is made and a few int's modified
this is a fairly efficient method.

void paste(int index) - Pastes the item at index in the clipboard as the next item
after the one currently in scope.

void pasteAll() - Pastes all the items in the clipboard to the current location.

void clearClipboard() - Clears all items in the clipboard.

void clearClipboard(int index) - Clears the item at index from the clipboard.


Scope Methods
-------------
int firstChild() - Sets the cursor to the first child of the element currently in scope. 
Returns 1 if there were children, 0 if theren't any.

int nextItem() - Sets the cursor to the next sibling element, or NULL is there isn't one.
Returns 1 on success, 0 on failure (no next item).

void parent() - Sets the cursor to the parent element of the element currently in scope.
If it is a top level item, the cursor does not move.

int count() - Returns the number of elements in the current scope.

int count(char* type) - Returns the number of elements of type in the current scope.

int countSub() - Returns the number of sub elements of the current Item. Will always return
1 unless the element was tagged as MULTI in the template file.


Template File Syntax
====================

To parse a target file, fileParser requires a template file. A template file is a 
plaint text file that looks a lot like XML. A template file consists of two parts: 
type definitions (enclosed by <defs> </defs>) and the actual file description
(enclosed by <file> </file>). Here is an exploration of all the tags:

-> Definition tag: <defs> </defs>   
   Usage: Between this pair of tags all the types are defined. A definition block
          can not be nested in another block; it must be top level.

-> File tag: <file [comments="COMMENTCHARS"]> </file>
   Usage: Betwen the <file></file> tags appear lists of element tags. These element
          tags are used by fileParser to figure out what it should be looking for. 
          As with the Definition tag, File tags must be top level and can not be
          nested inside another set of tags.
          
          COMMENTCHARS is a regular expression which defines what the rules are
          for comments. If an item in a block matches this regular expression, it
          becomes a comment attached to the current Element.
          
-> Type tag: <type name="NAME" begin="STARTTOKEN" end="ENDTOKEN" [value="VALUE"]> 
                [<element>s] 
             [</type>]
   Usage: A Type defines a token that fileParser will use when parsing the target file.
          NAME is the unique identifier for this type. STARTOTKEN and ENDTOKEN are the 
          start and end markers within the target file for this Type. The optional VALUE 
          statement defines what to return as the value of this item. STARTTOKEN, 
          ENDTOKEN and VALUE and all regular expressions. 
          
          Optionally, element can be attributed to a Type (creating a conglomerate type)
          by using <element> tags between the opening <type> and closing </type> tags.
          
          Types can only be defined in a <defs></defs> block. They are meaningless 
          anywhere else.

-> Element Tag: <element name="NAME" type="TYPE" [multi] [anywhere]>
   Usage: The Element tag defines the placement of an instance of a Type within its
          container (either a Type or a File tag). TYPE is the exact name of the Type
          to use. If Type is not defined in the template file, this element will be 
          ignored. NAME is used to reference this element if it is found successfully
          in the file and must be unique within its block.
          
          The optional "anywhere" flag denotes that the element can appear anywhere
          within a given block and is not position dependant. Without it, where in
          the list of Elements it appears matters.
          
          The optional "multi" flag denotes that we can expect several instances
          of of this Type within this block. Without "multi", an element can hold one
          and only one value. Using "multi" in conjuction with "anywhere" will collect
          all instances of Type within the current block. Without "anywhere", it will
          only collect items of TYPE until the next non-anywhere item matches.

-> Include Tag: <include file="FILENAME">
   Usage: To include another template file in this one, use the Include tag. Makes 
          writing template files nice and modular where appropriate.
          
-> Comments: <!-- -->
   Usage: To provide comments, enclose with <!-- and -->