Subj : Block based virtual harddisk library To : comp.programming From : M.Barren Date : Thu Sep 15 2005 09:38 am Hi, I'm trying to implement a library that will provide API to access a file as if it is a harddisk with a fixed capacity of 281474976710656 (max. of 48bit uint) blocks of specific size (eg. 100B-4KB). The access to the virtual disk is block based (not byte based). Since the size of this virtual disk is beyond the phyisical limit of my disk, I can only afford to save the useful data (eg. non-zero) to an actual file (container) on disk. **The only thing that I'm particularly trying to achieve is to make it as storage efficient as possible. At this point, fragmentation (of consequtive blocks in te virtual disk) is not a concern.** so What I've come up with so far is: To keep a table that contains pointers that map (non-zero) blocks from virtual disk to blocks within the normal file. Everytime the container is opened by the library, the table is loaded into a hashtable (or a trie structure) for fast search. On each access to a certain block on virtual disk, the hashtable is searched and upon finding a match, the address of the corresponding block in the container will be known. Each entry in the table will take up 12 bytes (6 for block address on virtual disk, 6 for block address in the container). Since hashtable and trie structures contain extra data for each entry, memory can become a problem when having large number of blocks (1 million+). Having 1000 blocks written to the container file, one might write the first block with zeros which would then quilify that block as useless. It then needs to be removed but since it resides at the begining of file, we cannot just move all the other blocks in front of it to fill its space. Hence, another table is needed to keep pointers to useless (all-zero) blocks. So on each new block allocation in the container file, the all-zero blocks will be used before extending the file size of the container. Each entry in this table will take up 6 bytes. *I definitly need some help/advice on the memory problem that the address table will cause. If you have any idea that you think I need to know before starting off, please let me know. Maybe you can direct me to some other similar implementations or papers that would help me understand the problem better. Michael (Excuse my verbose method of writing. English is my 2nd language, so I am not yet to compress my sentences to a sufficient level) .