Subj : talking to myself
To   : Maurice Kinal
From : mark lewis
Date : Sun Feb 13 2005 09:10 pm

 ml>> they aren't supposed to be meaningful... they're just
 ml>> a serial number assigned to a message, for diety's sake...

 MK> Well then they can safely be ignored and stripped.  Right?
 MK> Why waste the bytes and processing time for absolutely no good
 MK> reason to man or machine?

no, they are to identify that specific message within a 3 year time period... 
the address postion of the MSGID is also significant ;)

 ml>> you're still looking too deeply... why go thru all that
 ml>> when you can seed a counter at 0 (zero) and increment
 ml>> until it hits 2147483647 and then roll it and start over
 ml>> at 0 (zero) again...

 MK> Yeah.  I thought of that but then reconsidered.  A complete
 MK> waste of time seeing it is totally meaningless anyhow,
 MK> networkingly-speaking.

nah, not really... i know of at least one individual who designed and wrote a 
MSGID "server" for his stuff... granted, only his homebrew software uses it... 
his server is a simple daemon that sits waiting for a request for a new serial 
number... it spits one out, increments it and waits for another request... at 
some point, it stores the current number in a small datafile on the drive...

 ml>> based on 365 days in a year, three years is 1095 days...
 ml>> dividing, that gives us 1961172 messages per day...

 MK> Right.

 ml>> surely that's enough and the method easy enough???

 MK> Where is the joy in that?  ;-)

hehe, ya know? i think part of the problem with the MSGID spec is that the 
author put in the notation about "leaving it to the implementor to figure out 
how to generate the serial number" ;) many folk took that as a challenge and 
tried all kinds of ways of generating serial numbers... some even went down the 
wrong trail and used CRC32's of something without even thinking that there's a 
limited number of CRC32s /and/ that there is a very real possibility of 
creating a duplicate from two very different sources ;)

 ml>> the "problem" is storing the serial counter...

 MK> Right.  Again we're creating extra variables to the equation.
 MK> Personally I'd like to keep everything restricted to what
 MK> absolutely HAS to happen and extract any additional
 MK> information from variables one MUST have.

nothing has been said about when to store the memory contents of the serial 
counter to had media... that doesn't have to be done every time, TTBOMK... 
sendmail and others do this very thing... have you thought to look in their 
code and see what they are doing? O:)

 ml>> another "problem" is that even with non-duped MSGID serial
 ml>> numbers, it is possible for some software to see a
 ml>> false-dupe...

 MK> Right again.  IDs are no assurance of catching true dupes,
 MK> unique or otherwise.

that's solely because the spec wasn't made mandatory as well as the "rubbish" 
about "leaving it to the implementor" and such... if the spec had been made 
mandatory and the method of generating the number had been hammered down as 
well as what, exactly, is meant to be the "address of the originating machine", 
then we'd not be having this particular problem (or discussion, for that 
matter) O:)

 ml>> because they aren't always looking at the MSGID but only at the
 ml>> header of the message... some of them will go a tad further by
 ml>> looking at the header plus some (maybe 30 or 40) additional bytes to
 ml>> try to see if the message body is different...

 MK> How about quoting?  Excessive quoting could potentially cause
 MK> a false dupe especially when quotes precede the reply.

possible... let's also not forget that some dupe detection routines are a CRC32 
on the header and possibly some of the first XX bytes of the message body... we 
already know, and i mentioned it above, that CRC32s are limited and are able to 
be duplicated with very different input ;)

 ml>> for some reason, i'm also thinking that the message headers that
 ml>> contain seconds are stuck with a 2 second granularity in the same
 ml>> vein that billy's file systems have been from the beginning...

 MK> Could be.  Also what is the "correct" time at any given
 MK> moment, especially on a fast machine?

hehe, i guess that would depend on the definition of "it", ROTFL!!

 ml>> however, i've not the time nor inclination to go rooting thru the
 ml>> archives to confirm this "memory"...

 MK> Heh, heh.  I don't blame you one bit for that.

oh, i could probably go right to it... it's probably right in the project 
directory with the code for my FTN message tool that takes raw ASCII text files 
and generates messages from them... one of the beta cycles actually took 3 
years... one reason was impending burnout... another was a very strong desire 
to see realworld usage of the implemented MSGID routines that i posted to you 
the other day... that code hasn't been touched in many years and is still in 
use every day, on my system... granted, though, that tool only posts up to 50 
messages a day... in testing, though, it has posted a 2Meg message (JAM format, 
largest fidonet nodelist) as well as posting many smaller messages per 
second... IIRC, on one test machine, with "empty" message bodies, it approached 
some 200 messages a second... on another, much faster machine, the speed for 
the same test clocked upwards of 500 or so messages per second... granted, they 
were all empty bodies but the header stuff and such all took time to create and 
stuff into the message abse format... i guess i should also mention that 
originally, the tool was a fire once afair where you had to do a "for %f in 
*.txt postit some.paramters %f" to post batches of messages... a quick addition 
was to implement a batch afair where you'd list the parameters in a @bulk file 
and specify that on the command line (ie: postit @elist1) and it'd draw all the 
parameters from there... it still has to load and process the message body text 
file from the disk, though...

should i mention that the above tool is written in pascal? i have no idea what 
it'd take to "port" it to perl but i am confident that it'd be quite a bit 
slower on the same boxes ;)

)\/(ark

 
* Origin: (1:3634/12)

.