Subj : Re: store collections of strings in tdb or gdbm
To   : comp.programming
From : jburgy
Date : Sun Oct 09 2005 07:55 am

Lasse Kliemann wrote:
> Yes. However, as I see, the ordering of the records in the table is
> important. For instance, if I need to access the second string in the
> second record, I have to look for the second entry in the table which
> has as its key the record_id 1. I do not know if tdb always keeps the
> ordering (it does not look like it, and it also cannot give me all
> records with the same key, see below), so maybe a field describing which
> string is stored there would be appropriate:
>
>  record_id | string_type | string_id | string_text
>  -------------------------------------------------
>  0         | 0           | 0         | 'foo'
>  0         | 1           | 1         | 'bar'
>  0         | 2           | 2         | 'baz'
>  1         | 0           | 3         | 'qux'
>  1         | 1           | 4         | 'quux'
>  1         | 2           | 5         | 'quuux'
>
> In this setting, I would have to iterate over all entries with a given
> record_id to find the one with the desired string_type.
>

I'm not convinced that this is necessary: you can obtain string_type
from substracting the first string_id in the record from the current
string's string_id.

> Ok, and now I see that this is not possible with tdb (at least not
> efficiently; I would have to traverse the whole database to accomplish
> that). Obviously, keys better are unique in a tdb. Now, I could make
> the key up of record_id and string_type. But then I am back at my
> encoding-approach, because in practice, the record_id is a stralloc,
> which means it is a string of arbitrary characters. I could do the
> following more simple encoding, however:
>
> string_type\0record_id
>
> Because, string_type contains no special characters. So I know that
> everything up to the first null character is the string_type, and after
> that first null character, everything else is the record_id. (tdb uses
> the same struct for keys as for the records themselved, which I posted
> last time.)
>

Aaargh, no, stop it with the esoteric encoding already! What you need
is to tables in that case (as an aside, this is how C stores matrices:
an array of pointer to the rows + a great big array of all the entries,
the rest is pointer arithmetic):

record_id | first_string_id
---------------------------
0         | 0
1         | 3

string_id | string_text
-----------------------
0         | 'foo'
1         | 'bar'
2         | 'baz'
3         | 'qux'
4         | 'quux'
5         | 'quuux'

Now let's take your example again: second key in record with record_id
1.

* first you look up the first_string_id for said record in the first
table: 3
* then you look up the string in the second table with string_id 3 + 1:
'quux'

Et voila! Note also that you can leave gaps in the string_id's if you
might need to insert more strings later on.

> I guessed so. Ok.
>
>
> Thanks for your advice so far!
> Lasse

You're welcome. Ich merke nur jetzt, dass Du in Kiel studierst. Ich
wuensche Dir dann viel Glueck mit diesem Projekt. Was versuchst zu
erreichen?

Jan

.