[HN Gopher] Forensic linguists use grammar, syntax and vocabular...
       ___________________________________________________________________
        
       Forensic linguists use grammar, syntax and vocabulary to help crack
       cold cases
        
       Author : gcoleman
       Score  : 47 points
       Date   : 2024-12-02 08:53 UTC (4 days ago)
        
 (HTM) web link (www.thedial.world)
 (TXT) w3m dump (www.thedial.world)
        
       | lqet wrote:
       | > Spellings such as "wilfully" for "willfully" and "clew" for
       | "clue" pointed to someone from the Chicago area, for example.
       | Eventually, the linguistic evidence was strong enough to issue a
       | search warrant for the home of a reclusive mathematician named
       | Theodore Kaczynski, raised in Chicago but living in rural Montana
       | 
       | One thing Kaczynski's brother noticed as particularly
       | idiosyncratic was the consistent use of the phrase "you can't eat
       | your cake and have it too", which is usually phrased as "you
       | can't have your cake and eat it too".
        
         | wizzwizz4 wrote:
         | > Indeed, this used to be the most common form of the
         | expression until the 1930s-1940s, when it was overtaken by the
         | have-eat variant.
         | 
         | --
         | https://en.wikipedia.org/w/index.php?title=You_can%27t_have_...
        
           | yunruse wrote:
           | I wonder if this has some sort of preference for ablaut
           | reduplication [0]? I don't have the vowel phonics off by
           | hand, but "have your cake and eat it" seems to flow a little
           | more smoothly than "eat your cake and have it".
           | 
           | [0] https://en.wikipedia.org/wiki/Reduplication#English
        
             | lcnPylGDnU4H9OF wrote:
             | It sounds like reduplication is about individual words
             | being repeated rather than a phrase.
             | 
             | > In linguistics, reduplication is a morphological process
             | in which the root or stem of a word, part of that or the
             | whole word is repeated exactly or with a slight change.
             | 
             | The ablaut section also seems to suggest that "eat" should
             | come before "have" anyway.
             | 
             | > In ablaut reduplications, the first vowel is almost
             | always a high vowel or front vowel (typically I as in hit)
             | and the reduplicated vowel is a low vowel or back vowel
             | (typically ae as in cat or a as in top).
             | 
             | I suspect you feel that it flows more smoothly because it's
             | more familiar. You have to stop a brain process that's
             | become a bit automatic to say that phrase and instead say
             | it slightly differently.
        
         | xandrius wrote:
         | It does make more sense to me though: of course I can have my
         | cake and then eat it. But I can't eat it and still have it
         | afterwards.
         | 
         | So the order implies the temporality of the actions.
        
           | seanhunter wrote:
           | ^ unabomber posts on hackernews.
           | 
           | Joking aside, the key word which is sometimes implied rather
           | than included is "too". The order isn't important. The saying
           | is both things can't simultaneously be true.
        
             | jonathanlb wrote:
             | > The order isn't important.
             | 
             | I would argue that it is, given the semantic shift that "to
             | have" has undergone.
             | 
             | "To have" historically has had a more tangible sense of
             | holding or owning something in a lasting, physical way. For
             | example, in medieval and early modern English documents,
             | "to have" frequently referred to holding physical property
             | or goods in a manner implying true, ongoing possession. For
             | instance, the formula "to have and to hold," found in
             | English property grants and other legal charters dating
             | back to at least the 13th century, specifies that the
             | grantee possesses the land not just in theory, but in
             | continuing, tangible stewardship. This phrase does not
             | simply mean ownership on paper--it affirms the right to
             | keep and maintain the property indefinitely.
             | 
             | Today, "to have" is more abstract, and implies enjoying a
             | condition or availability. In modern English we often use
             | "to have" for intangible states, experiences, or
             | conditions, rather than strictly physical possession. We
             | say we "have time" or "have a headache," meaning we
             | experience or hold a certain condition, not that we own a
             | concrete object. Saying "I have an idea" frames "idea" as
             | something you possess, but it's more about the existence of
             | that thought rather than controlling a physical thing. We
             | "have a meeting," which implies an event scheduled for us
             | to attend, not an object we keep. Over time, "to have"
             | evolved to mark various states--emotional, temporal,
             | conceptual--thus shedding some of its older, property-
             | focused sense and becoming a flexible verb denoting
             | conditions or availability.
             | 
             | So, because we interpret "to have" as less tied tangible
             | possession, the original logic--that once you eat the cake,
             | you cannot still "have it"-- doesn't strike the ear as
             | sharply when we switch the word order.
             | 
             | I would suggest adding a time marker like "then" (e.g.,
             | "You cannot eat your cake and then have it, too.")
             | emphasizes the sequence and delineates that the action of
             | eating precedes the attempt at possession.
        
       | sgarland wrote:
       | I wonder if distros like Tails [0] are going to start shipping
       | lightweight LLMs specifically to reword messages. Though I'm also
       | not sure how low of a spec you can go and still run an LLM decent
       | enough to not be excruciatingly slow.
       | 
       | [0]: https://tails.net/
        
         | seanhunter wrote:
         | You don't need a special distro to run a local llm and
         | something like ollama running llama3 7b would be just fine to
         | reword a message on a normal laptop. Inference (ie actually
         | using a model) is much much less compute intensive than
         | training or finetuning.
         | 
         | I strongly suggest people try ollama - it takes a few minutes
         | to set up, download a local model and you're up and running.
         | https://ollama.com/
        
       | khafra wrote:
       | Relatedly, if you have a decently sized writing sample on the
       | Internet, LLMs can do this to you at scale:
       | https://www.lesswrong.com/posts/doPbyzPgKdjedohud/the-case-f...
        
       | blakesterz wrote:
       | This type of work was used to find Satoshi at least once. This is
       | the one I remember:
       | 
       | https://likeinamirror.wordpress.com/2013/12/01/satoshi-nakam...
        
       | Temporary_31337 wrote:
       | Cool XX century story but mass use of LLM generated text will
       | obfuscate any such individual differences. My kids already use
       | LLMs extensively to verify and sometimes generate homework
       | completely.
        
       | alanh wrote:
       | Here's your clew (sic) that the article may not be so reliable:
       | It credits the FBI's linguistic analysis for locating the
       | Unabomber, when in reality, it was _his own brother_ who said
       | "hey, it sounds like Ted."
        
         | giarc wrote:
         | I don't totally agree with you but I see where you are coming
         | from. The article states "Eventually, the linguistic evidence
         | was strong enough to issue a search warrant..." which could be
         | referring to the brother pointing out the similar writing style
         | or the FBI's assessment pointing to someone from Chicago. It's
         | not clear.
        
       | ChiMan wrote:
       | I think we can expect sophisticated criminals to start using AI
       | to rewrite their correspondence.
        
         | erehweb wrote:
         | Perhaps similarly to how ransom notes would be written with
         | letters from newspapers.
        
         | runamuck wrote:
         | I think so. Along those lines, did the underworld learn the
         | lesson from Bernardo Provenzano (who got caught due to reliance
         | on a Cesar Cipher) and step up their encryption?
        
         | alganet wrote:
         | That is by itself a linguistic choice that can be analyzed.
         | 
         | "Hey, that guy only communicates using cutouts from magazines,
         | what a strange choice"
         | 
         | LLMs introduce all kinds of linguistic choices, and you can
         | focus on those choices.
        
       | hnbad wrote:
       | Popular crime shows like CSI (and arguably detective novels even
       | before that) have created a false impression that forensics are
       | an exact science that can perfectly narrow down the list of
       | suspects to the exact person who did it.
       | 
       | But of course in reality it is much more complex. All forms of
       | forensic evidence are vulnerable to noise: sure, that one
       | interesting artifact may be evidence but it also may not be, and
       | the inverse may be true for that perfectly ordinary thing
       | everyone missed. A linguistic quirk _can_ be a piece of evidence
       | but it can also be accidental (e.g. nowadays it might also result
       | from bad predictive typing or autocomplete, or even more
       | recently, as others have pointed out, LLMs).
       | 
       | So all of these are in effect just probabilistic filters. And
       | filters are only useful when your sample size includes your
       | target (i.e. if the actual perpetrator is a suspect and you have
       | adequate data about them). And even then they may not only
       | produce false positives but also false negatives and these may
       | interact the more filters you attempt to combine.
       | 
       | Forensic linguistics can be useful when you have a small set of
       | suspects that you absolutely know includes the actual
       | perpetrator. But otherwise they can send you on a wild goose
       | chase or hurt the innocent.
        
         | psunavy03 wrote:
         | Trial lawyers have written about the "CSI Effect," where the
         | existence of such shows produces jurors who now expect trials
         | to contain the types of flashy scientific evidence they see on
         | TV, and become less likely to convict even obviously-guilty
         | people when this type of evidence can't be realistically
         | produced.
        
       | z3t4 wrote:
       | Does anyone have any recommendation on software that can analyses
       | text messages and then tell if two users are the same person?
       | 
       | The use case is for competitive gaming where a player can get a
       | major advantage by using several accounts. So the software can be
       | used for screening and detect accounts that are suspicious alike.
        
       | tgv wrote:
       | > According to forensic linguists, we all use language in a
       | uniquely identifiable way that can be as incriminating as a
       | fingerprint.
       | 
       | That's a bold and unproven statement, made worse because we can't
       | really see that fingerprint.
        
         | ret54 wrote:
         | there was this that unmasked alt HN users identity 2 years back
         | using stylometric analysis from a previous comment dump
         | 
         | AFAIU the more people know of it the better expectations are
         | set about real account privacy
         | 
         | https://news.ycombinator.com/item?id=33755016
        
         | crote wrote:
         | It sounds like a fairly accurate statement to me, considering
         | that there isn't a solid scientifically-based foundation behind
         | fingerprint matching either. They aren't quite as unique as
         | we've often been led to believe, and matching them is highly
         | subjective with the same expert often interpreting the same
         | comparison differently when provided with a different story for
         | context.
         | 
         | Fingerprint matching of course isn't completely _useless_ , but
         | it's not as solid as you'd hope either.
        
           | tgv wrote:
           | But when two sets of fingerprints, are different, you can be
           | fairly sure they're from different people. But when the
           | percentage of some features is 20% in one text, and 30% in
           | another, you still can't conclude anything. I write in
           | different registers in contexts such as personal emails,
           | professional emails to a large group, professional emails to
           | a direct colleague, a quick post on the internet, an 'app' to
           | a friend in another country, a text message on a phone, etc.
           | I even write them in different languages. It's hard to
           | imagine there's a well-defined, properly grounded model that
           | can unite those yet distinguish them from written output by
           | other people.
           | 
           | And now LLMs are going to add more noise to these features...
        
       | TruffleLabs wrote:
       | Reminds me of
       | 
       | "Eats, shoots, and leaves"
       | 
       | Or is it
       | 
       | "Eats shoots and leaves"?
       | 
       | ;)
       | 
       | Book "Eats, Shoots & Leaves: The Zero Tolerance Approach to
       | Punctuation"
       | 
       | https://en.wikipedia.org/wiki/Eats,_Shoots_&_Leaves
        
         | Suppafly wrote:
         | or the hilarious "help your uncle jack off a horse" vs "help
         | your uncle, Jack, off a horse" example. although it does also
         | involve caps.
        
           | seanhunter wrote:
           | Or the famous:
           | 
           | Time to eat grandma
           | 
           | Vs
           | 
           | Time to eat, grandma
        
       | llm_nerd wrote:
       | A fun related tool someone posted on here once-
       | 
       | https://news.ycombinator.com/item?id=33755016
       | 
       | Tool seems to be dead, but link to it for the related discussion.
        
       | karaterobot wrote:
       | Yes, a comma can solve a crime. You see, the panda didn't
       | actually shoot anyone, it was just _eating_ shoots and leaves.
        
       | vzaliva wrote:
       | The article touches on the impact of AI in this field but doesn't
       | mention a potential issue: the possibility of AI being used to
       | rewrite text in a way that makes it unrecognisable and impossible
       | to trace.
        
       | sidewndr46 wrote:
       | nice, between this and bite mark science no criminal is going to
       | be able to escape punishment
        
       ___________________________________________________________________
       (page generated 2024-12-06 23:01 UTC)