Subj : Re: store collections of strings in tdb or gdbm To : comp.programming From : Lasse Kliemann Date : Sun Oct 09 2005 10:05 pm jburgy wrote: > Lasse Kliemann wrote: >> Ok, and now I see that this is not possible with tdb (at least not >> efficiently; I would have to traverse the whole database to accomplish >> that). Obviously, keys better are unique in a tdb. Now, I could make >> the key up of record_id and string_type. But then I am back at my >> encoding-approach, because in practice, the record_id is a stralloc, >> which means it is a string of arbitrary characters. I could do the >> following more simple encoding, however: >> >> string_type\0record_id >> >> Because, string_type contains no special characters. So I know that >> everything up to the first null character is the string_type, and after >> that first null character, everything else is the record_id. (tdb uses >> the same struct for keys as for the records themselved, which I posted >> last time.) >> > > Aaargh, no, stop it with the esoteric encoding already! What you need > is to tables in that case (as an aside, this is how C stores matrices: > an array of pointer to the rows + a great big array of all the entries, > the rest is pointer arithmetic): > > record_id | first_string_id > --------------------------- > 0 | 0 > 1 | 3 > > string_id | string_text > ----------------------- > 0 | 'foo' > 1 | 'bar' > 2 | 'baz' > 3 | 'qux' > 4 | 'quux' > 5 | 'quuux' > > Now let's take your example again: second key in record with record_id > 1. > > * first you look up the first_string_id for said record in the first > table: 3 > * then you look up the string in the second table with string_id 3 + 1: > 'quux' I see now. What I have to do is 'key arithmetic' (analogous to the pointer arithmetic of C's matrices). > Et voila! Note also that you can leave gaps in the string_id's if you > might need to insert more strings later on. That brings me to another question: What if the number of strings in each record is not fixed, and even no upper bound is known? How would you go about that? (Allow me to mention that my encoding approach already covers this case.) > You're welcome. Ich merke nur jetzt, dass Du in Kiel studierst. Ich > wuensche Dir dann viel Glueck mit diesem Projekt. Was versuchst zu > erreichen? Es geht um ein System zur Integritätsprüfung von heruntergeladenen Dateien. URLs der Dateien werden zusammen mit Prüfsummen gespeichert. Benutzer des Systems können die Prüfsummentabellen untereinander austauschen. Jeder Datensatz wird signiert von demjenigen, der ihn hinzufügt. Unstimmigkeiten werden automatisch aufgespürt und gemeldet. So kann das ganze auch als sehr einfache Heuristik zum Aufspüren von Manipulationen (durch einen Eindringling auf den jeweiligen Servern) angesehen werden. Zusätzliche Informationen können auch eingetragen werden, etwa wenn jemand ein Audit der Quellen einer Software gemacht hat. .