Subj : Block based virtual harddisk library
To   : comp.programming
From : M.Barren
Date : Thu Sep 15 2005 09:38 am

Hi,

I'm trying to implement a library that will provide API to access a
file as if it is a harddisk with a fixed capacity of 281474976710656
(max. of 48bit uint) blocks of specific size (eg. 100B-4KB). The access
to the virtual disk is block based (not byte based).

Since the size of this virtual disk is beyond the phyisical limit of my
disk, I can only afford to save the useful data (eg. non-zero) to an
actual file (container) on disk.

**The only thing that I'm particularly trying to achieve is to make it
as storage efficient as possible. At this point, fragmentation (of
consequtive blocks in te virtual disk) is not a concern.**

so What I've come up with so far is:

To keep a table that contains pointers that map (non-zero) blocks from
virtual disk to blocks within the normal file. Everytime the container
is opened by the library, the table is loaded into a hashtable (or a
trie structure) for fast search. On each access to a certain block on
virtual disk, the hashtable is searched and upon finding a match, the
address of the corresponding block in the container will be known. Each
entry in the table will take up 12 bytes (6 for block address on
virtual disk, 6 for block address in the container). Since hashtable
and trie structures contain extra data for each entry, memory can
become a problem when having large number of blocks (1 million+).

Having 1000 blocks written to the container file, one might write the
first block with zeros which would then quilify that block as useless.
It then needs to be removed but since it resides at the begining of
file, we cannot just move all the other blocks in front of it to fill
its space. Hence, another table is needed to keep pointers to useless
(all-zero) blocks. So on each new block allocation in the container
file, the all-zero blocks will be used before extending the file size
of the container. Each entry in this table will take up 6 bytes.

*I definitly need some help/advice on the memory problem that the
address table will cause.

If you have any idea that you think I need to know before starting off,
please let me know. Maybe you can direct me to some other similar
implementations or papers that would help me understand the problem
better.

Michael
(Excuse my verbose method of writing. English is my 2nd language, so I
am not yet to compress my sentences to a sufficient level)

.