Subj : talking to myself
To   : mark lewis
From : Maurice Kinal
Date : Sun Feb 20 2005 03:05 pm

Hey mark!

Feb 20 16:42 05, mark lewis wrote to Maurice Kinal:

 ml> MSGID isn't the magic bullet that some seem to want to think it is... 
 ml> some of your comments appear to be saying that it should/could be and 
 ml> that it isn't and thus should be thrown in the bitbucket...

Yes and no.  What I am trying to say, or what I think I am saying, is that 
without rhyme or reason ALL that is good for won't make any difference whether 
it is thrown into the bitbucket or not.  Without any meaningful logic it is a 
complete waste of bytes and processing it causes.  With logic it has potential 
as a viable accounting flag/tag/whatever.  I still have doubts about it's 
dupechecking abilities but at least it has some potential.  Currently I have 
doubts about any real usefulness to man or machine.

 ml> while i do /tend/ to agree, i also tend more to not agree... 

Sounds reasonable.

 ml> even then, you then have the problem of crossposted messages... is it 
 ml> a dupe because it is exactly the same message in more than one area? 
 ml> i don't think so...

Nor do I.  As long as it is accountable in the area it shows up in then I can't 
see a problem with it, even if it shows up in other areas.  However, having 
said that, I'd think there may be a better way <patent pending> to archive 
crossposted messages where one carries more then one area where that message is 
"posted" to.  A single message could fly more then one area tag.  Thus some 
redundancy could effectively be eliminated.  No?

 ml> detecting duplicates in fidonet is a tricky science,

I would agree with that assessment.

 ml> messaging... wildcat, pcboard and wwiv systems are the first three 
 ml> that come to mind as having shoehorned retrofits for participation in 
 ml> fidonet... quite simply, their message bases were not designed with 
 ml> fidonet in mind... actually, not just fidonet but more without any 
 ml> sort of thought to control lines within messages...

Right.  Having a trimmed down archiving system where all stored messages only 
contain what is absolutely needed to successfully be deemed a "message" - say 
"To", "From", "Date" - and then tack on whatever else is required depending on 
the target, would greatly reduce the amount of information any archived base or 
area needs to know.  For instance a dynamic cgi script could take this 
information and "convert" it to html display to the end user without affecting 
the archive in any meaningful way, and that exact same archive could be 
employed to construct outbound Fido compliant pkts.

 ml> it is long past the time when this stuff can truely be fixed and 
 ml> enforced...

Probably but that doesn't mean we can't discuss, and/or employ, any of this 
"stuff" to our advantage.  Chances are by doing that we may all find ourselves 
complying out of choice as opposed to enforcement ... or so the theory goes.

 ml> all we can do now is to play the game and hope for the 
 ml> best...

That is one way.

 ml> that takes us to the question of how to build a dataset of messages 
 ml> and what to use as the duplicate trigger...

Right.

 ml> things are done in binary in fidonet because of limited storage space 
 ml> as well as for speed of processing, we have to ask what method would 
 ml> ultimately be the best for quick processing, small storage, and 
 ml> generating truely unique IDs for the local duplicate detection 
 ml> system?

That is a toughy for sure.  Again I would think a standard method of generation 
of MSGID would be of great assistance to all.  It isn't foolproof (is 
anything?) but it would help.

 ml> i can see possibly a two fold method involving recording the actual 
 ml> header data as well as running it thru md5 or some such and recording 
 ml> the MSGID if it exists...

Possibly.  It sounds like it has potential.

 ml> speed... how much time are you willing to spend rummaging thru a 
 ml> duplicate dataset looking for a match before deciding if a message is 
 ml> a duplicate or not?

Heh, heh.  It depends on how big a problem dupes really are.  Not many REAL 
dupes and then I would say zero "rummaging", but if I were Rusty and seeing 
hundreds of REAL dupes then I'd really wish my uplink was doing better quality 
control.  But then that of course brings up the question whether or not the 
uplink isn't filtering out messages that aren't really dupes but instead MSGID 
dupes.  I've seen those and have seriously wondered if the few I do manage to 
see aren't representative of a far greater and unseen problem regarding the 
whole MSGID situation as it stands today.

 ml> considering your high desire for speed, i can see 
 ml> small datasets (one per message area al la squish?) to ease the 
 ml> search time...

Possibly.  I have been pondering what I wish to do locally for myself all the 
way around, not just Fido.

 ml> interesting problem, this is... i'm already visualising multiple dupe 
 ml> dataset files based on the AREA line, locally carried areas 
 ml> notwithstanding due to the processing of passthru areas, or one large 
 ml> or even multiple large datafiles containing AREA grouped datasets of 
 ml> header and MSGID data...

Interesting to ponder.

Life is good,
Maurice

--- Msged/LNX 6.1.2
 * Origin: Coffin Point - Ladysmith, BC Canada (1:153/401.1)

.