Subj : Re: huge dictionary To : comp.databases,comp.programming From : moi Date : Tue Oct 11 2005 01:01 am Branimir Maksimovic wrote: [removed crossposting to clc/clc++] > "moi" wrote in message >> >>Mmap() *can* be elegant, but gets ugly in the >>presence of writes/appends to the same file (as in your >>case). > > > Not neccessarily: ftruncate then mmap additional page(s) > (or lseek then write zeros to avoid fragmentation of file). > But, better is to just preallocate large enough file, > (write zero bytes, not ftruncate because of fragmentation) > if not enough then expand. > I think we agree. That's what I meant with mmap() losing it's elegance: (instead of manually writing+reading) you still have to manually keep control of the mmap()ed area, remapping it, if necessary. > >>Also, there is *no performance benefit* over write(). > > > Disk write is disk write, but with mmap you don't > have to manually read/write from file to app memory. > I agree. There are some more memcpy()s to/from userbuffers involved in the read/write case. Given the delays by read/write (either way) , I ignored these. (it won't hurt to burn some CPU while the disk is spinning) > >>Mmap just maps a piece of your diskfile into memory, >>but underneath just as much disk I/O is needed. > > > See the difference. No need for memory to file buffering > and data conversion in application code, therefore mmap > is most natural way to implement persistent data structure. > Again, I agree. see above. IMO the 'active area' in the file is rather small, basically row-at-a-time, but scattered. >>In the ultimate case (unclustered read access) you end up with one read >>per record-fetch. > > > Same as with read. Of course caching strategies can be different , > resulting that either mmap or read can be faster depending on situation. > In my experince mmap is better when dealing with large random access > files (large swap file). > For pure sequential reading, read() should be better. > This is probably because it is hard to beat the system's LRU buffering. (plus, maybe read ahead). Also: double buffering costs memory (and CPU for copying) >>Writing/appending is always faster (since multiple records can fit into >>one one disk block) >>Using a DB library does not change this, the library still has to >>do the reading/writing on your behalf, and you en up with 10ms readtime >>per block. It *can* save you some development effort. > > > Of course, except that everything goes through db interface and application > itself is limited by it. In this case this is not problem since db > eliminates > the need for hash table implementation, one just uses db interface > instead. > > Greetings, Bane. Yes, it's a tradeoff. I don't know bdb's hash implementation, but I can imagine the hashtable/+overflow-chains to sit on disk, too. That *could* cause a typical row-fetch to take two or more page-fetches. (Isam/btree would probably be worse) See the Google design (in-core hashtable + on-disk records) AvK .