Subj : talking to myself To : Maurice Kinal From : mark lewis Date : Sun Feb 20 2005 04:42 pm RT>> with his "gate" a while back, that I know of that is. MK> That was one situation I as thinking of. Got under everyone's MK> radar despite all the sysops in that echo noticing it. So MK> much for MSGID eh? Everyone of those messages was a true dupe MK> that not one tosser caught. MSGID isn't the magic bullet that some seem to want to think it is... some of your comments appear to be saying that it should/could be and that it isn't and thus should be thrown in the bitbucket... while i do /tend/ to agree, i also tend more to not agree... detecting dupes in fidonet is not magic nor is it tied to one thing... for various reasons, fidonet can't even use md5 checksums on the message body to determine if a message is a duplicate... message headers, existing control lines, origin and seenby and path lines can all be stripped or otherwise modified or corrupted... the only real way to tell a dupe would be by enforcing some sort of message body formatting and md5'ing the message body much the same way that PGP can be used to sign a message to show if it has been modified since sending... ie: when the body is generated and the message saved, md5 the body and store that in a control line that travels with the message. i still don't recall if there are message processors out there that alter the message body (ie: by replacing CRLF with LF) even then, you then have the problem of crossposted messages... is it a dupe because it is exactly the same message in more than one area? i don't think so... detecting duplicates in fidonet is a tricky science, to say the least... checking the header info and message control lines (including the origin line) is about the only way... still this can fail due to the way some systems have been retrofitted for fidonet messaging... wildcat, pcboard and wwiv systems are the first three that come to mind as having shoehorned retrofits for participation in fidonet... quite simply, their message bases were not designed with fidonet in mind... actually, not just fidonet but more without any sort of thought to control lines within messages... it is long past the time when this stuff can truely be fixed and enforced... all we can do now is to play the game and hope for the best... that said, there are things that can be done to try to ensure that messages generated by your software do make it past the various and sundry dupe checking schemes out there... one of the first and easiest is to implement MSGID and ensure that it is the first control line after the message header... this may or may not help with very braindead dupe checking that looks to the header only with no regard for the message body at all as that system was developed with a myopic view of users creating messages and not with the thought that an automated process like text file posting or offline mail doors may post more than one message per second... most of the software that did that braindead method of dupe checking have been tossed or upgraded for something that does the same but also takes into consideration the first 20+ bytes of the message body... there is still the problem of dupe checkers that use a CRC16 or CRC32 method of storing an "ID" of a message based on the header and 20+ bytes of the message... this is due to the simple fact that there are a limited number of CRC16 and CRC32 results and that it is fairly trivial to find more than one dataset that generates the same CRC16 or CRC32 value... that takes us to the question of how to build a dataset of messages and what to use as the duplicate trigger... remembering that many things are done in binary in fidonet because of limited storage space as well as for speed of processing, we have to ask what method would ultimately be the best for quick processing, small storage, and generating truely unique IDs for the local duplicate detection system? the first thing i can think of is to record the header info and the entire MSGID... the question is, then, how to record the header info? would one use the actual fields or would one run the header fields thru a formula like md5 or something else?? i can see possibly a two fold method involving recording the actual header data as well as running it thru md5 or some such and recording the MSGID if it exists... that would likely be the utmost method but it wouldn't be the smallest data record per message... there's also the question of speed... how much time are you willing to spend rummaging thru a duplicate dataset looking for a match before deciding if a message is a duplicate or not? considering your high desire for speed, i can see small datasets (one per message area al la squish?) to ease the search time... interesting problem, this is... i'm already visualising multiple dupe dataset files based on the AREA line, locally carried areas notwithstanding due to the processing of passthru areas, or one large or even multiple large datafiles containing AREA grouped datasets of header and MSGID data... )\/(ark * Origin: (1:3634/12) .