Subj : Re: Block based virtual harddisk library
To   : comp.programming
From : M.Barren
Date : Sun Sep 18 2005 01:59 am

Ed Prochak wrote:

> This seems closely related to the virtual memory problem of operating
> systems. Maybe you can get some ideas from searching that topic?

Yes, I think you're right. I've read one paper on that so far, but I
will continue to research in that area.

> Meanwhile, consider the extreme case: ALL of your physical space is
> allocated. You need enough entries in your structure to handle that and
> you "free" structure is empty. If you have enough memory to handle that
> case, ... hmmm  You are promising access to 256TERA Blocks! yikes that
> is BIG!
>
>
> Are you really sure you need to keep ALL the information in memory (in
> a hash table or trie)?? can you keep at least part of it on the disc?
> Block chain pointers, parts of the hash table itself, stored out on the
> disc can help.

Well, after a few hours of inefficient thinking, I decided to go for a
Trie structure as follows:

Virtual Block pointer:
Length: 6Bytes
 - - - - - -
|a|b|c|d|e|f|
 - - - - - -

Trie Entry:

For bytes of Virtual Block integer from a-e:
Length:[1] +  [6]    +    [6]     =  13 Bytes
        - ---------- ------------
       |x|Child Node|Sibling Node|
        - ---------- ------------

For last byte of Virtual Block integer (f):
Length:[1]    +      [6]         +     [6]      =  13 Bytes
        - ----------------------- ------------
       |f|Container Block Pointer|Sibling Node|
        - ----------------------- ------------

The only trivial problem with this design is the fact that entries
which represent a-e bytes from the pointer will use up to almost 1 Tera
Entries and that will stop us from being able to allocate all 281 Tera
blocks and instead 280 Tera blocks. In that magnitude, this will not be
a problem at all.
With a trie structure completely written along with block data in the
container file, there will be no need to load ANY data to memory.
However, **the current problem** I'm working/thinking about now is how
to manage the Trie structure in the file. If all blocked are used, this
trie will take up 3666TB. I definitly can NOT JUST write that at the
end of data blocks in series. My idea so far is to split up the
structure and fit them in the same blocks where we store data. That
way, we can treat them the same way as data blocks. The blocks will be
chained by pointers together. At this stage, I'm trying to find a way
to avoid having to follow the pointers all the way from the the block
where the head trie entry resides! So I definitly appriciate any help
on this area.

> In fact this solves your free block problem. All you need in memory is
> the location of the first free physical disc block. Since these blocks
> are free, you can store freelist pointers in each one, chaining them
> together. When the API frees a block, you insert it on the head of the
> free list. When the API allocates a block, you return the current head
> of the list and set the new head to the next one on the list. So you do
> NOT need to keep the list of ALL free blocks in memory.

Interesting. Thank you for mentioning it. Quite a basic technique which
i'd completely forgotten about! However after a bit of more thought to
it, I came up with this: Whenever a block is freed, we will move the
last block in the container to fill its space. This method will
definitly cause performace loss, but as I stated before, my main goal
is to keep the container file as storage efficient as possible. Using
this method, we won't have any free blocks to worry about. I'm 90% sure
I'll go with this, but I'm still open for ideas in that 10%.

> I still have no easy solution  for the allocated block list. with
> 6+6bytes per block ID, you need 3072TERABytes, if all Blocks are
> allocated. That doesn't leave much room in memory for program code. 8^)
>
> Let us know how you are coming along. I'd love to hear more about this.
> 
> HTH,
>  ed

.