Subj : Committing file changes
To   : Coridon Henshaw
From : David Noon
Date : Sun Aug 20 2000 01:00 am

Hi Coridon,

Replying to a message of Coridon Henshaw to David Noon:

 CH> I'm building an open-source databasing offline Usenet news system,
 CH> basically along the lines of standard Fidonet message tossers and
 CH> readers, except designed from the ground up for Usenet news.  As I
 CH> intend the system to be portable, I'd like to keep the number of
 CH> platform-specific API calls to an absolute minimum.

Thats poses some difficulties. A combination of safety, performance and 
platform-independence is a big ask. I would tend to compromise that last one 
before I compromised the first two.

 DN>> Firstly, you should not be doing buffered I/O if your updates must be
 DN>> committed immediately, so you should not use fopen() and fwrite()
 DN>> without a setbuf() call to suppress buffer allocation.  Better yet,
 DN>> you should consider using open() and write() instead, and use the
 DN>> UNIX-like unbuffered I/O routines.

 CH> I'm concerned that disabling buffering entirely is going to hurt
 CH> performance very badly as my application does lots of short (4 to 256
 CH> byte) IO calls.  Relying on the disk cache to handle this kind of
 CH> load seems a bit wasteful.

Since 4-to-256 bytes does not constitute a typical Usenet article, those would 
not be your logical syncpoints. You should be physically writing the data to 
disk at your syncpoints and only at your syncpoints.

 DN>> Moreover, if your data resources are critically important then you
 DN>> should be handling any traps that occur in your program and cleaning
 DN>> up the critical data resources in an orderly manner. This is far and
 DN>> away the most professional approach to the situation. About the only
 DN>> things you can't handle are kernel level traps and power outages.

 CH> The problem I ran into was that the kernel trapped (for reasons
 CH> unrelated to this project) a few hours after I wrote an article into
 CH> the article database.  Since database was still in open (I leave the
 CH> article reader running 24x7), the file system structures were
 CH> inconsistant enough that CHKDSK truncated the database well before
 CH> its proper end point.  As you say, catching exceptions wouldn't help
 CH> much here.

The flip side is that kernel traps are far less frequent than application 
traps, especially during development of the application. If your data integrity 
is critical you should not only be handling any exceptions that arise, but you 
should be rolling back to your most recent syncpoint when an error does arise.

 CH> My database format and engine implementations are robust enough to
 CH> cope with applications dying unexpectedly without finishing write
 CH> operations; they're not robust enough to handle boot-up CHKDSK
 CH> removing 80Kb of data from the end of a 100Kb file.

So you do have a syncpoint architecture, then?

 DN>> In your situation, I would have used the second facility before
 DN>> considering any intermediate commits.

 CH> It's not intermediate commits I need: what I need is some way to flush
 CH> out write operations made to files which might be open for days or
 CH> weeks at a time.

That's what an intermediate commit is.

The way industrial strength database management systems work [since at least 
the days of IMS/360, over 30 years ago] is that an application would have 
defined within it points in its execution where a logical unit of work was 
complete and the state of the data on disk should by synchronized with the 
state of the data in memory; this is how the term "syncpoint" arose, and the 
processing between syncpoints became known as a transaction. The process of 
writing the changes in data to disk became known as commiting the changes. The 
SQL statement that performs this operation under DB2, Oracle, Sybase and other 
RDBMS's is COMMIT.

These RDBMS's also have another statement, coded as ROLLBACK. This backs out a 
partially complete unit of work when an error condition has arisen. The upshot 
is that the content of the database on disk can be assured to conform to the 
data model the application is suposed to support. It does not mean that every 
byte of input has been captured; it means, instead, that the data structures on 
disk are consistent with some design.

This seems to me to be the type of activity you really want to perform. One of 
your problems is that your input stream is not persistent, as it would be a 
socket connected to a NNTP server [if I read your design correctly, and assume 
you are coding from the ground up]. This means that you need to be able to 
restart a failed instance of the application, resuming from its most recent 
succesful syncpoint. The usual method to deal with this is to use a log or 
journal file that keeps track of "in flight" transactions; the journal is where 
your I/O remains unbuffered. If your NNTP server allows you to re-fetch 
articles -- and most do -- you can keep your journal in RAM or on a RAMDISK; 
this prevents performance hits for doing short I/O's.

This design and implementation seem like a lot of work, and I suppose they are. 
But some old timers were doing this on machines with only 128KiB of RAM when I 
was in high school, so a modern PC should handle it easily. To save yourself a 
lot of coding, you might care to use a commercial DBMS; a copy of DB2 UDB 
Personal Developer Edition can be had free for the download, or on CD for the 
price of the medium and shipping. Start at:
    http://www.software.ibm.com/data/db2/
and follow the links to the download areas, or ask Indelible Blue about CD 
prices.

Using a multi-platform commercial product will provide you with platform 
independence, as well as safety. It is the simplest and most robust approach 
unless you are prepared either to do a lot of coding or compromise on the 
safety of your application.

Regards

Dave
<Team PL/I>

--- FleetStreet 1.25.1
 * Origin: My other computer is an IBM S/390 (2:257/609.5)

.