[HN Gopher] Forensic linguists use grammar, syntax and vocabular...
___________________________________________________________________
Forensic linguists use grammar, syntax and vocabulary to help crack
cold cases
Author : gcoleman
Score : 47 points
Date : 2024-12-02 08:53 UTC (4 days ago)
(HTM) web link (www.thedial.world)
(TXT) w3m dump (www.thedial.world)
| lqet wrote:
| > Spellings such as "wilfully" for "willfully" and "clew" for
| "clue" pointed to someone from the Chicago area, for example.
| Eventually, the linguistic evidence was strong enough to issue a
| search warrant for the home of a reclusive mathematician named
| Theodore Kaczynski, raised in Chicago but living in rural Montana
|
| One thing Kaczynski's brother noticed as particularly
| idiosyncratic was the consistent use of the phrase "you can't eat
| your cake and have it too", which is usually phrased as "you
| can't have your cake and eat it too".
| wizzwizz4 wrote:
| > Indeed, this used to be the most common form of the
| expression until the 1930s-1940s, when it was overtaken by the
| have-eat variant.
|
| --
| https://en.wikipedia.org/w/index.php?title=You_can%27t_have_...
| yunruse wrote:
| I wonder if this has some sort of preference for ablaut
| reduplication [0]? I don't have the vowel phonics off by
| hand, but "have your cake and eat it" seems to flow a little
| more smoothly than "eat your cake and have it".
|
| [0] https://en.wikipedia.org/wiki/Reduplication#English
| lcnPylGDnU4H9OF wrote:
| It sounds like reduplication is about individual words
| being repeated rather than a phrase.
|
| > In linguistics, reduplication is a morphological process
| in which the root or stem of a word, part of that or the
| whole word is repeated exactly or with a slight change.
|
| The ablaut section also seems to suggest that "eat" should
| come before "have" anyway.
|
| > In ablaut reduplications, the first vowel is almost
| always a high vowel or front vowel (typically I as in hit)
| and the reduplicated vowel is a low vowel or back vowel
| (typically ae as in cat or a as in top).
|
| I suspect you feel that it flows more smoothly because it's
| more familiar. You have to stop a brain process that's
| become a bit automatic to say that phrase and instead say
| it slightly differently.
| xandrius wrote:
| It does make more sense to me though: of course I can have my
| cake and then eat it. But I can't eat it and still have it
| afterwards.
|
| So the order implies the temporality of the actions.
| seanhunter wrote:
| ^ unabomber posts on hackernews.
|
| Joking aside, the key word which is sometimes implied rather
| than included is "too". The order isn't important. The saying
| is both things can't simultaneously be true.
| jonathanlb wrote:
| > The order isn't important.
|
| I would argue that it is, given the semantic shift that "to
| have" has undergone.
|
| "To have" historically has had a more tangible sense of
| holding or owning something in a lasting, physical way. For
| example, in medieval and early modern English documents,
| "to have" frequently referred to holding physical property
| or goods in a manner implying true, ongoing possession. For
| instance, the formula "to have and to hold," found in
| English property grants and other legal charters dating
| back to at least the 13th century, specifies that the
| grantee possesses the land not just in theory, but in
| continuing, tangible stewardship. This phrase does not
| simply mean ownership on paper--it affirms the right to
| keep and maintain the property indefinitely.
|
| Today, "to have" is more abstract, and implies enjoying a
| condition or availability. In modern English we often use
| "to have" for intangible states, experiences, or
| conditions, rather than strictly physical possession. We
| say we "have time" or "have a headache," meaning we
| experience or hold a certain condition, not that we own a
| concrete object. Saying "I have an idea" frames "idea" as
| something you possess, but it's more about the existence of
| that thought rather than controlling a physical thing. We
| "have a meeting," which implies an event scheduled for us
| to attend, not an object we keep. Over time, "to have"
| evolved to mark various states--emotional, temporal,
| conceptual--thus shedding some of its older, property-
| focused sense and becoming a flexible verb denoting
| conditions or availability.
|
| So, because we interpret "to have" as less tied tangible
| possession, the original logic--that once you eat the cake,
| you cannot still "have it"-- doesn't strike the ear as
| sharply when we switch the word order.
|
| I would suggest adding a time marker like "then" (e.g.,
| "You cannot eat your cake and then have it, too.")
| emphasizes the sequence and delineates that the action of
| eating precedes the attempt at possession.
| sgarland wrote:
| I wonder if distros like Tails [0] are going to start shipping
| lightweight LLMs specifically to reword messages. Though I'm also
| not sure how low of a spec you can go and still run an LLM decent
| enough to not be excruciatingly slow.
|
| [0]: https://tails.net/
| seanhunter wrote:
| You don't need a special distro to run a local llm and
| something like ollama running llama3 7b would be just fine to
| reword a message on a normal laptop. Inference (ie actually
| using a model) is much much less compute intensive than
| training or finetuning.
|
| I strongly suggest people try ollama - it takes a few minutes
| to set up, download a local model and you're up and running.
| https://ollama.com/
| khafra wrote:
| Relatedly, if you have a decently sized writing sample on the
| Internet, LLMs can do this to you at scale:
| https://www.lesswrong.com/posts/doPbyzPgKdjedohud/the-case-f...
| blakesterz wrote:
| This type of work was used to find Satoshi at least once. This is
| the one I remember:
|
| https://likeinamirror.wordpress.com/2013/12/01/satoshi-nakam...
| Temporary_31337 wrote:
| Cool XX century story but mass use of LLM generated text will
| obfuscate any such individual differences. My kids already use
| LLMs extensively to verify and sometimes generate homework
| completely.
| alanh wrote:
| Here's your clew (sic) that the article may not be so reliable:
| It credits the FBI's linguistic analysis for locating the
| Unabomber, when in reality, it was _his own brother_ who said
| "hey, it sounds like Ted."
| giarc wrote:
| I don't totally agree with you but I see where you are coming
| from. The article states "Eventually, the linguistic evidence
| was strong enough to issue a search warrant..." which could be
| referring to the brother pointing out the similar writing style
| or the FBI's assessment pointing to someone from Chicago. It's
| not clear.
| ChiMan wrote:
| I think we can expect sophisticated criminals to start using AI
| to rewrite their correspondence.
| erehweb wrote:
| Perhaps similarly to how ransom notes would be written with
| letters from newspapers.
| runamuck wrote:
| I think so. Along those lines, did the underworld learn the
| lesson from Bernardo Provenzano (who got caught due to reliance
| on a Cesar Cipher) and step up their encryption?
| alganet wrote:
| That is by itself a linguistic choice that can be analyzed.
|
| "Hey, that guy only communicates using cutouts from magazines,
| what a strange choice"
|
| LLMs introduce all kinds of linguistic choices, and you can
| focus on those choices.
| hnbad wrote:
| Popular crime shows like CSI (and arguably detective novels even
| before that) have created a false impression that forensics are
| an exact science that can perfectly narrow down the list of
| suspects to the exact person who did it.
|
| But of course in reality it is much more complex. All forms of
| forensic evidence are vulnerable to noise: sure, that one
| interesting artifact may be evidence but it also may not be, and
| the inverse may be true for that perfectly ordinary thing
| everyone missed. A linguistic quirk _can_ be a piece of evidence
| but it can also be accidental (e.g. nowadays it might also result
| from bad predictive typing or autocomplete, or even more
| recently, as others have pointed out, LLMs).
|
| So all of these are in effect just probabilistic filters. And
| filters are only useful when your sample size includes your
| target (i.e. if the actual perpetrator is a suspect and you have
| adequate data about them). And even then they may not only
| produce false positives but also false negatives and these may
| interact the more filters you attempt to combine.
|
| Forensic linguistics can be useful when you have a small set of
| suspects that you absolutely know includes the actual
| perpetrator. But otherwise they can send you on a wild goose
| chase or hurt the innocent.
| psunavy03 wrote:
| Trial lawyers have written about the "CSI Effect," where the
| existence of such shows produces jurors who now expect trials
| to contain the types of flashy scientific evidence they see on
| TV, and become less likely to convict even obviously-guilty
| people when this type of evidence can't be realistically
| produced.
| z3t4 wrote:
| Does anyone have any recommendation on software that can analyses
| text messages and then tell if two users are the same person?
|
| The use case is for competitive gaming where a player can get a
| major advantage by using several accounts. So the software can be
| used for screening and detect accounts that are suspicious alike.
| tgv wrote:
| > According to forensic linguists, we all use language in a
| uniquely identifiable way that can be as incriminating as a
| fingerprint.
|
| That's a bold and unproven statement, made worse because we can't
| really see that fingerprint.
| ret54 wrote:
| there was this that unmasked alt HN users identity 2 years back
| using stylometric analysis from a previous comment dump
|
| AFAIU the more people know of it the better expectations are
| set about real account privacy
|
| https://news.ycombinator.com/item?id=33755016
| crote wrote:
| It sounds like a fairly accurate statement to me, considering
| that there isn't a solid scientifically-based foundation behind
| fingerprint matching either. They aren't quite as unique as
| we've often been led to believe, and matching them is highly
| subjective with the same expert often interpreting the same
| comparison differently when provided with a different story for
| context.
|
| Fingerprint matching of course isn't completely _useless_ , but
| it's not as solid as you'd hope either.
| tgv wrote:
| But when two sets of fingerprints, are different, you can be
| fairly sure they're from different people. But when the
| percentage of some features is 20% in one text, and 30% in
| another, you still can't conclude anything. I write in
| different registers in contexts such as personal emails,
| professional emails to a large group, professional emails to
| a direct colleague, a quick post on the internet, an 'app' to
| a friend in another country, a text message on a phone, etc.
| I even write them in different languages. It's hard to
| imagine there's a well-defined, properly grounded model that
| can unite those yet distinguish them from written output by
| other people.
|
| And now LLMs are going to add more noise to these features...
| TruffleLabs wrote:
| Reminds me of
|
| "Eats, shoots, and leaves"
|
| Or is it
|
| "Eats shoots and leaves"?
|
| ;)
|
| Book "Eats, Shoots & Leaves: The Zero Tolerance Approach to
| Punctuation"
|
| https://en.wikipedia.org/wiki/Eats,_Shoots_&_Leaves
| Suppafly wrote:
| or the hilarious "help your uncle jack off a horse" vs "help
| your uncle, Jack, off a horse" example. although it does also
| involve caps.
| seanhunter wrote:
| Or the famous:
|
| Time to eat grandma
|
| Vs
|
| Time to eat, grandma
| llm_nerd wrote:
| A fun related tool someone posted on here once-
|
| https://news.ycombinator.com/item?id=33755016
|
| Tool seems to be dead, but link to it for the related discussion.
| karaterobot wrote:
| Yes, a comma can solve a crime. You see, the panda didn't
| actually shoot anyone, it was just _eating_ shoots and leaves.
| vzaliva wrote:
| The article touches on the impact of AI in this field but doesn't
| mention a potential issue: the possibility of AI being used to
| rewrite text in a way that makes it unrecognisable and impossible
| to trace.
| sidewndr46 wrote:
| nice, between this and bite mark science no criminal is going to
| be able to escape punishment
___________________________________________________________________
(page generated 2024-12-06 23:01 UTC)