Subj : Re: store collections of strings in tdb or gdbm To : comp.programming From : Lasse Kliemann Date : Sun Oct 09 2005 02:40 am jburgy wrote: > Lasse Kliemann wrote: >> my data is organized as follows: >> >> - There are several records. >> - Each record consists of a fixed number of strings. [snip] >> - Get the decimal representation of all the lengths of the strings in >> the record. >> - Write these into dptr, as null-terminated strings. >> - Then write all the strings into dptr. [snip] >> - How elegant is my approach? > > It sounds just plain wrong. I know nothing about tdb so feel free to > diss my comment but the way you would normally do this in a relational > database is like this > > record_id | string_id | string_text > ----------------------------------- > 0 | 0 | 'foo' > 0 | 1 | 'bar' > 0 | 2 | 'baz' > 1 | 3 | 'qux' > 1 | 4 | 'quux' > 1 | 5 | 'quuux' > > Hopefully you get the picture. If tdb is as trivial as the struct you > showed leads me to believe, you may need two tables to achieve this: > one stores the strings (with one string per record, no worries, these > records have nothing to do with yours), the other the correlation > between string_id and record_id. Something like > > char *string_text[] = { "foo", "bar", "baz", "qux", "quux", "quuux", > ... }; > int record_id = { 0, 0, 0, 1, 1, 1, .... }; > > You catch my drift? Yes. However, as I see, the ordering of the records in the table is important. For instance, if I need to access the second string in the second record, I have to look for the second entry in the table which has as its key the record_id 1. I do not know if tdb always keeps the ordering (it does not look like it, and it also cannot give me all records with the same key, see below), so maybe a field describing which string is stored there would be appropriate: record_id | string_type | string_id | string_text ------------------------------------------------- 0 | 0 | 0 | 'foo' 0 | 1 | 1 | 'bar' 0 | 2 | 2 | 'baz' 1 | 0 | 3 | 'qux' 1 | 1 | 4 | 'quux' 1 | 2 | 5 | 'quuux' In this setting, I would have to iterate over all entries with a given record_id to find the one with the desired string_type. Ok, and now I see that this is not possible with tdb (at least not efficiently; I would have to traverse the whole database to accomplish that). Obviously, keys better are unique in a tdb. Now, I could make the key up of record_id and string_type. But then I am back at my encoding-approach, because in practice, the record_id is a stralloc, which means it is a string of arbitrary characters. I could do the following more simple encoding, however: string_type\0record_id Because, string_type contains no special characters. So I know that everything up to the first null character is the string_type, and after that first null character, everything else is the record_id. (tdb uses the same struct for keys as for the records themselved, which I posted last time.) >> - Is the conversion into the decimal representation necessary, or can I >> write the lengths of the strings as unsinged ints into dptr? Would >> this be portable? > > If you insist on going with your idea: showing unsigned ints into a > char* is not portable, google endianness. I guessed so. Ok. Thanks for your advice so far! Lasse .