Subj : ALLFIX dupes? To : Tommi Koivula From : mark lewis Date : Sun Jun 14 2015 10:42:20 14 Jun 15 06:41, you wrote to Nicholas Boel: NB>> Does HPT default to dupechecking by MsgID and some kind of hash check? NB>> I don't have anything specific specified, so it's using the default NB>> (which I thought was MsgIDwithHashCheck or something in those lines - NB>> I didn't look it up so I'm probably on the right track but it's NB>> probably the wrong config option). Then in my area definitions, I use: NB>> -dupecheck move -dupehistory 365 -tooold 365 -sbkeepall TK> It is almost the same here, only "-TooOld 365" missing from my conf. i don't use -tooold but i do have dupehistory set to 1100 to cover a full three years of dupes as per the FTS specs... "[...] The serial number may be any eight character hexadecimal number, as long as it is unique - no two messages from a given system may have the same serial number within a three years. [...]" 365.25 * 3 = 1,095.75 so i pick up an extra 4.25 days :shrug: ;) TK> "EchoAreaDefaults -SBkeepAll -dupeCheck move -dupeHistory 31 -b JAM" i also keep three years of messages just because i can... here's the defaults line for one of my feed's... areafixAutoCreateDefaults -d "Automatically added area" -b jam -a 1:3634/12.73 -g Z -p 1100 -dupeCheck move -dupeHistory 1100 -sbkeepall NB>> Is there something I may be missing? TK> I'm not that famimiar with hpt, so I don't know about its dupe detection TK> mechanism. i found this in some old docs... [quote] DupeBaseType Syntax: dupeBaseType Example: dupeBaseType HashDupesWMsgId TextDupes stores from, to, subj & msgid as text lines. HashDupes stores src32 of from + to + subj + msgid. HashDupesWMsgId same as HashDupes, but stores also msgid as text. CommonDupeBase stores hashes of from + to + subj + areatag + msgid in one file (hpt_base.dpa) Default is HashDupesWMsgId. This statement cannot be repeated. [/quote] which seems accurate when looking at this from huskylib/fidoconf/fidoconf.h [quote] typedef enum typeDupeCheck { hashDupes, /*Base bild from crc32*/ hashDupesWmsgid, /*Base bild from crc32+MSGID*/ textDupes, /*Base bild from FromName+ToName+Subj+MSGID*/ commonDupeBase /*Common base for all areas bild from crc32*/ } e_typeDupeCheck; [/quote] one improvement i can see would possibly be to also use the timestamp in the calculations... especially for those systems that don't put MSGID but do use complete time stamps including the seconds... but even that may not be good enough since some can post numerous messages in one second... i'm trying to remember the other ways that dupe detection is done... this one is "good enough" but there are others... one takes the entire message header plus the following 40 bytes, IIRC... that way it gets most all of the control lines as well as possibly some of the message body text... another strips the message body of CR and LF and i think white space and does a crc on that to be used along with the crc on the header... the message body stops at the tear line (not inclusive) if one exists or the origin line if there is no tear line... for netmails, the message body stops at the tear line (not inclusive), or the origin line if one exists or the first path control line... it has been a while and i can't find all of my notes... i've dug this out of some old source code, though... )\/(ark .... That darned Tom, this is ALL his fault for inventing this beast anyhow. --- * Origin: (1:3634/12.73) .