Subj : talking to myself To : mark lewis From : Maurice Kinal Date : Sun Feb 20 2005 03:05 pm Hey mark! Feb 20 16:42 05, mark lewis wrote to Maurice Kinal: ml> MSGID isn't the magic bullet that some seem to want to think it is... ml> some of your comments appear to be saying that it should/could be and ml> that it isn't and thus should be thrown in the bitbucket... Yes and no. What I am trying to say, or what I think I am saying, is that without rhyme or reason ALL that is good for won't make any difference whether it is thrown into the bitbucket or not. Without any meaningful logic it is a complete waste of bytes and processing it causes. With logic it has potential as a viable accounting flag/tag/whatever. I still have doubts about it's dupechecking abilities but at least it has some potential. Currently I have doubts about any real usefulness to man or machine. ml> while i do /tend/ to agree, i also tend more to not agree... Sounds reasonable. ml> even then, you then have the problem of crossposted messages... is it ml> a dupe because it is exactly the same message in more than one area? ml> i don't think so... Nor do I. As long as it is accountable in the area it shows up in then I can't see a problem with it, even if it shows up in other areas. However, having said that, I'd think there may be a better way to archive crossposted messages where one carries more then one area where that message is "posted" to. A single message could fly more then one area tag. Thus some redundancy could effectively be eliminated. No? ml> detecting duplicates in fidonet is a tricky science, I would agree with that assessment. ml> messaging... wildcat, pcboard and wwiv systems are the first three ml> that come to mind as having shoehorned retrofits for participation in ml> fidonet... quite simply, their message bases were not designed with ml> fidonet in mind... actually, not just fidonet but more without any ml> sort of thought to control lines within messages... Right. Having a trimmed down archiving system where all stored messages only contain what is absolutely needed to successfully be deemed a "message" - say "To", "From", "Date" - and then tack on whatever else is required depending on the target, would greatly reduce the amount of information any archived base or area needs to know. For instance a dynamic cgi script could take this information and "convert" it to html display to the end user without affecting the archive in any meaningful way, and that exact same archive could be employed to construct outbound Fido compliant pkts. ml> it is long past the time when this stuff can truely be fixed and ml> enforced... Probably but that doesn't mean we can't discuss, and/or employ, any of this "stuff" to our advantage. Chances are by doing that we may all find ourselves complying out of choice as opposed to enforcement ... or so the theory goes. ml> all we can do now is to play the game and hope for the ml> best... That is one way. ml> that takes us to the question of how to build a dataset of messages ml> and what to use as the duplicate trigger... Right. ml> things are done in binary in fidonet because of limited storage space ml> as well as for speed of processing, we have to ask what method would ml> ultimately be the best for quick processing, small storage, and ml> generating truely unique IDs for the local duplicate detection ml> system? That is a toughy for sure. Again I would think a standard method of generation of MSGID would be of great assistance to all. It isn't foolproof (is anything?) but it would help. ml> i can see possibly a two fold method involving recording the actual ml> header data as well as running it thru md5 or some such and recording ml> the MSGID if it exists... Possibly. It sounds like it has potential. ml> speed... how much time are you willing to spend rummaging thru a ml> duplicate dataset looking for a match before deciding if a message is ml> a duplicate or not? Heh, heh. It depends on how big a problem dupes really are. Not many REAL dupes and then I would say zero "rummaging", but if I were Rusty and seeing hundreds of REAL dupes then I'd really wish my uplink was doing better quality control. But then that of course brings up the question whether or not the uplink isn't filtering out messages that aren't really dupes but instead MSGID dupes. I've seen those and have seriously wondered if the few I do manage to see aren't representative of a far greater and unseen problem regarding the whole MSGID situation as it stands today. ml> considering your high desire for speed, i can see ml> small datasets (one per message area al la squish?) to ease the ml> search time... Possibly. I have been pondering what I wish to do locally for myself all the way around, not just Fido. ml> interesting problem, this is... i'm already visualising multiple dupe ml> dataset files based on the AREA line, locally carried areas ml> notwithstanding due to the processing of passthru areas, or one large ml> or even multiple large datafiles containing AREA grouped datasets of ml> header and MSGID data... Interesting to ponder. Life is good, Maurice --- Msged/LNX 6.1.2 * Origin: Coffin Point - Ladysmith, BC Canada (1:153/401.1) .