Subj : Re: huge dictionary
To   : comp.databases,comp.programming
From : moi
Date : Tue Oct 11 2005 01:01 am

Branimir Maksimovic wrote:

[removed crossposting to clc/clc++]

> "moi" <avk@localhost> wrote in message 

>>
>>Mmap() *can* be elegant, but gets ugly in the
>>presence of writes/appends to the same file (as in your
>>case).
> 
> 
> Not neccessarily: ftruncate then mmap additional page(s)
> (or lseek then write zeros to avoid fragmentation of file).
> But, better is to just preallocate large enough file,
> (write zero bytes, not ftruncate because of fragmentation)
> if not enough then expand.
> 

I think we agree. That's what I meant with mmap() losing it's elegance:
(instead of manually writing+reading) you still have to manually keep
control of the mmap()ed area, remapping it, if necessary.

> 
>>Also, there is *no performance benefit* over write().
> 
> 
> Disk write is disk write, but with mmap you don't
> have to manually read/write from file to app memory.
>

I agree. There are some more memcpy()s to/from userbuffers involved
in the read/write case. Given the delays by read/write (either way)
, I ignored these. (it won't hurt to burn some CPU while the disk is 
spinning)


> 
>>Mmap just maps a piece of your diskfile into memory,
>>but underneath just as much disk I/O is needed.
> 
> 
> See the difference. No need for memory to file buffering
> and data conversion in application code, therefore mmap
> is most natural way to implement persistent data structure.
> 

Again, I agree. see above. IMO the 'active area' in the file is rather 
small, basically row-at-a-time, but scattered.


>>In the ultimate case (unclustered read access) you end up with one read 
>>per record-fetch.
> 
> 
> Same as with read. Of course caching strategies can be different ,
> resulting  that either mmap or read can be faster depending on situation.
> In my experince mmap is better when dealing with large random access
> files (large swap file).
> For pure sequential reading, read() should be better.
> 

This is probably because it is hard to beat the system's LRU buffering.
(plus, maybe read ahead). Also: double buffering costs memory (and CPU 
for copying)

>>Writing/appending is always faster (since multiple records can fit into 
>>one one disk block)
>>Using a DB library does not change this, the library still has to
>>do the reading/writing on your behalf, and you en up with 10ms readtime
>>per block. It *can* save you some development effort.
> 
> 
> Of course, except that everything goes through db interface and application
> itself is limited by it. In this case this is not problem since db 
> eliminates
> the need for hash table implementation, one just uses db interface
> instead.
> 
> Greetings, Bane.


Yes, it's a tradeoff. I don't know bdb's hash implementation, but I can 
imagine the hashtable/+overflow-chains to sit on disk, too.
That *could* cause a typical row-fetch to take two or more page-fetches.
(Isam/btree would probably be worse)
See the Google design (in-core hashtable + on-disk records)

AvK

.