codemadness.org

       README - xmlparser - XML parser
 (HTM) git clone git://git.codemadness.org/xmlparser
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
       README (2303B)
       ---
            1 XML parser
            2 ----------
            3 
            4 A small XML parser.
            5 
            6 
            7 Dependencies
            8 ------------
            9 
           10 - C compiler (ANSI).
           11 
           12 
           13 Features
           14 --------
           15 
           16 - Relatively small parser.
           17 - Simple API using callback functions.
           18 - Fast
           19 - Portable
           20 - No dynamic memory allocation.
           21 
           22 
           23 Supports
           24 --------
           25 
           26 - Tags in short-form (<img src="lolcat.jpg" title="Meow" />).
           27 - Tag attributes.
           28 - Short attributes without an explicitly set value (<input type="checkbox" checked />).
           29 - Comments
           30 - CDATA sections.
           31 - Helper function (xml_entitytostr) to convert XML 1.0 / HTML 2.0 named entities
           32   and numeric entities to UTF-8.
           33 - Reading XML from a file descriptor, string buffer or any custom reader:
           34   see: XMLParser.getnext or GETNEXT() macro.
           35 
           36 
           37 Design choices and scope
           38 ------------------------
           39 
           40 - Compliance: it is not a fully compliant XML parser, but it supports reading
           41   XML data for many practical use-cases, some are:
           42   - RSS reader (sfeed)
           43   - HTML to plain-text converter (webdump).
           44   - HTML extractor for websites (idiotbox/tscrape/frontends).
           45 - The XML data is not checked for errors so it will continue parsing XML data.
           46   However the parser should not crash, hang, etc.
           47 - Performance: data is buffered even if a handler is not set: to make parsing
           48   faster change this code from xml.c.
           49 - Internally fixed-size buffers are used, callbacks like XMLParser.xmldata are
           50   called multiple times for the same tag if the data size is bigger than the
           51   internal buffer size (sizeof(XMLParser.data)). To differentiate between new
           52   calls for data the xml*start and xml*end handlers can be used.
           53 - It does not handle XML white-space rules for tag data. The raw values
           54   including white-space is passed. This is useful in some cases, like for
           55   parsing HTML <pre> tags.
           56 - The XML specification has no limits on tag and attribute names. For
           57   simplicity/sanity sake this XML parser takes some liberties. Tag and
           58   attribute names are truncated if they are excessively long.
           59 - Security: entity expansions are not handled (can cause "billion laughs
           60   attack").
           61 - DOCTYPE, ATTLIST or DTD declarations are ignored.
           62 
           63 
           64 Files used
           65 ----------
           66 
           67 xml.c and xml.h
           68 
           69 
           70 Interface / API
           71 ---------------
           72 
           73 Should be trivial, see xml.c and xml.h and the examples below.
           74 
           75 
           76 Examples
           77 --------
           78 
           79 See skeleton.c for a base program to start quickly.
           80 
           81 
           82 License
           83 -------
           84 
           85 ISC, see LICENSE file.