Subj : Double postings To : Joe Martin From : mark lewis Date : Sun Sep 15 2019 11:26 am On 2019 Sep 15 08:31:20, you wrote to me: JM> -> MSGID is the main way but older software doesn't generate MSGID so JM> -> other methods need to be used... JM> My mailer/tosser uses a combined approach. If the message contains a JM> MSGID then use its value, otherwise CRC the header and message body JM> including control lines but never the SEEN-BY/PATH lines (considering JM> they change all the time). this is good but for one small thing... there is a package that is known to be reformatting messages in transit which is going to throw the message body CRC out the door... there is no estimate on when this but will be fixed as the developer is apparently quite busy with RL outside of FTNs... JM> The tosser never duplicates an MSGID either as it maintains a file JM> with the last used value seeded upon creation by the current JM> date/time. This prevents issues should that file get deleted. sounds similar to what my MSGID code does... i've shared that information with several folks... not sure if you were one of those or not... i still have the original 1994 (i think) post that described it, too :) JM> To provide speed and limit disk space, I also have an expiration JM> mechanism (user configurable) that will purge CRC entries after a given JM> amount of time (ie: 2 weeks but not more than 30 days). So while it's JM> efficient catching dupes in that time period, if someone does a rescan JM> and dumps everything back into the echo a month later, it won't catch JM> them. It's a trade off, but back in the day when we had 40mb drives and JM> 8088/80286 processors, it was extremely important. yeah and that's gonna likely be a problem since the spec states three years... in this day in time, retaining three years worth of dupe detection data should be a small drop in the bucket of available drive space and processing power needed to perform a lookup... JM> -> instead of CRC... the problem then comes from those systems that JM> -> mistakenly reformat the messages as they process them and write the JM> -> reformatted messages to new PKTs... now the message body is JM> Yeah this is and always will be an issue. not if the message body is not CRC'd ;) i really like (IIRC) the d'bridge method of taking the header and first 40 bytes (i think) of the message body to get those few initial control lines and using that... that'll take care of the different dates as well as the MSGID but i would also grab the MSGID if it exists and store it in the database as well... basically i'm thinking of at least two or three fields in each record... JM> -> is apparent on systems that only get, for example, one posting of JM> an JM> -> echos rules each month and only accept new postings of those rules JM> It would seem to me, (me mind you) that if you're moderating an echo, JM> your software "should" be able to generate a MSGID to prevent this issue JM> entirely. But hey... that depends on the software used... some text file posting tools are really old and do not have any concept of MSGID... i'm thinking of the old Harvey's Robot in at least one case... JM> -> what i would do would be to ask other tosser devs what they use in JM> -> their code... JM> -> JM> -> listed in no particular order: JM> -> JM> -> tobias burchhardt - fastecho JM> -> rob swindell - sbbsecho JM> -> nick andre - d'bridge JM> -> vince coen - mbse's tosser JM> -> kim heino - bbbs' tosser JM> -> wilfred van velzen - fmail JM> -> james coyle - mystic JM> Thanks Mark... you're welcome... i hope that you've also seen the other two posts about HPT and intermail which should also be added to the above list... )\/(ark Once men turned their thinking over to machines in the hope that this would set them free. But that only permitted other men with machines to enslave them. .... You may never know who's right but you always know who is in charge! --- * Origin: (1:3634/12.73) .