Subj : Re: store collections of strings in tdb or gdbm To : comp.programming From : jburgy Date : Sun Oct 09 2005 07:55 am Lasse Kliemann wrote: > Yes. However, as I see, the ordering of the records in the table is > important. For instance, if I need to access the second string in the > second record, I have to look for the second entry in the table which > has as its key the record_id 1. I do not know if tdb always keeps the > ordering (it does not look like it, and it also cannot give me all > records with the same key, see below), so maybe a field describing which > string is stored there would be appropriate: > > record_id | string_type | string_id | string_text > ------------------------------------------------- > 0 | 0 | 0 | 'foo' > 0 | 1 | 1 | 'bar' > 0 | 2 | 2 | 'baz' > 1 | 0 | 3 | 'qux' > 1 | 1 | 4 | 'quux' > 1 | 2 | 5 | 'quuux' > > In this setting, I would have to iterate over all entries with a given > record_id to find the one with the desired string_type. > I'm not convinced that this is necessary: you can obtain string_type from substracting the first string_id in the record from the current string's string_id. > Ok, and now I see that this is not possible with tdb (at least not > efficiently; I would have to traverse the whole database to accomplish > that). Obviously, keys better are unique in a tdb. Now, I could make > the key up of record_id and string_type. But then I am back at my > encoding-approach, because in practice, the record_id is a stralloc, > which means it is a string of arbitrary characters. I could do the > following more simple encoding, however: > > string_type\0record_id > > Because, string_type contains no special characters. So I know that > everything up to the first null character is the string_type, and after > that first null character, everything else is the record_id. (tdb uses > the same struct for keys as for the records themselved, which I posted > last time.) > Aaargh, no, stop it with the esoteric encoding already! What you need is to tables in that case (as an aside, this is how C stores matrices: an array of pointer to the rows + a great big array of all the entries, the rest is pointer arithmetic): record_id | first_string_id --------------------------- 0 | 0 1 | 3 string_id | string_text ----------------------- 0 | 'foo' 1 | 'bar' 2 | 'baz' 3 | 'qux' 4 | 'quux' 5 | 'quuux' Now let's take your example again: second key in record with record_id 1. * first you look up the first_string_id for said record in the first table: 3 * then you look up the string in the second table with string_id 3 + 1: 'quux' Et voila! Note also that you can leave gaps in the string_id's if you might need to insert more strings later on. > I guessed so. Ok. > > > Thanks for your advice so far! > Lasse You're welcome. Ich merke nur jetzt, dass Du in Kiel studierst. Ich wuensche Dir dann viel Glueck mit diesem Projekt. Was versuchst zu erreichen? Jan .