Subj : Committing file changes
To   : David Noon
From : Coridon Henshaw
Date : Mon Aug 21 2000 05:09 pm

On Sunday August 20 2000 at 08:00, David Noon wrote to Coridon Henshaw:

 DN> Since 4-to-256 bytes does not constitute a typical Usenet article, those
 DN> would not be your logical syncpoints. You should be physically writing the
 DN> data to disk at your syncpoints and only at your syncpoints.

I break up articles into 251 byte chunks and write the chunks as a linked list. 
 Since the database will reuse article chunks which have been freed as a result 
of article expiry, the article linked lists need not be sequential.  As such, 
when the DB engine writes an article, it writes 251 bytes, reads and updates 
the five byte control structure, then seeks to the next block.  This process 
continues until the entire article is written.  It's not really possible to 
break up these writes without giving up the linked list structure, and with it, 
either the ability to rapidly grow the database, or the ability of the DB to 
reuse existing space as articles are expired.

 CH>> My database format and engine implementations are robust enough to
 CH>> cope with applications dying unexpectedly without finishing write
 CH>> operations; they're not robust enough to handle boot-up CHKDSK
 CH>> removing 80Kb of data from the end of a 100Kb file.

 DN> So you do have a syncpoint architecture, then?

<snip>

While I appricate your comments, what you suggest is vast overkill for my 
application.  The NewsDB engine isn't sophisticated enough to support 
syncpoints or rollback.  Think along the lines of Squish MSGAPI rather than 
DB2: NewsDB is basically Squish-for-Usenet.  My intention is to produce a 
lightweight multiuser offline news system so small groups of users (1-25) can 
read Usenet offline without needing to install a full news server.  As a 
lightweight alternative to a local news server, NewsDB doesn't need the 
overhead of a fully-fledged SQL engine.

NewsDB is decidedly not intended for mission critical environments; surviving 
Anything and Everything isn't part of the design requirements.  Rather, my 
intention is to contain common errors to the extent that they can be repaired 
by automated repair tools.

 DN> This seems to me to be the type of activity you really want to perform. 
One
 DN> of your problems is that your input stream is not persistent, as it would
 DN> be a socket connected to a NNTP server [if I read your design correctly,
 DN> and assume you are coding from the ground up]. This means that you need to
 DN> be able to restart a failed instance of the application, resuming from its
 DN> most recent succesful syncpoint. The usual method to deal with this is to
 DN> use a log or journal file that keeps track of "in flight" transactions; 
the
 DN> journal is where your I/O remains unbuffered. If your NNTP server allows
 DN> you to re-fetch articles -- and most do -- you can keep your journal in 
RAM
 DN> or on a RAMDISK; this prevents performance hits for doing short I/O's.

Just to clarify things: NewsDB isn't a single application.  It's a RFC-based 
message base format similar in purpose to Squish and JAM.  I'm writing an 
access library (NewsDBLib) to work with the NewsDB format.  I'm also writing 
two applications (an importer and a reader) which use NewsDBLib.

At the moment, none of these applications download news.  The importer reads 
SOUP packets from disk just so I can avoid messing with NNTP.  Reading from 
disk also gives me the flexibility to, at a later date, import other packet 
formats such as UUCP news and FTN PKT.

--- GoldED/2 3.0.1
 * Origin: Life sucks and then you croak. (1:250/820)

.