Post B2gdbFLcd9PUpcSzj6 by munin@infosec.exchange
 (DIR) More posts by munin@infosec.exchange
 (DIR) Post #B2gdbFLcd9PUpcSzj6 by munin@infosec.exchange
       2026-01-26T19:33:08Z
       
       0 likes, 0 repeats
       
       LLMs do not "think"The LLM instantiation methodology* correlates patterns in the data that the developers provide to build a database** of linkages between collections of words and phrases*** that appear in that corpus. The way in which this database is used is to inform a probabilistic selector process by seeding it with a set of probabilities**** associated with a given word or phrase; that set of probabilities has pointers to related words or phrases.If a given word or phrase is found in close proximity in the original data consistently, then those probabilities will be higher. When a query***** is made to this database, a randomization process is used to drop certain parts****** of the query being sent into the lookup process. The remainder is divided into segments† and passed into the database for query.So.With all this in mind, it should be -screamingly obvious- why this story, of how it's entirely feasable to get an LLM to rederive copyrighted works out of the database that was seeded with those works, happens: https://futurism.com/artificial-intelligence/ai-industry-recall-copyright-books* I am deliberately not using the word 'training'. You can train dogs; you can train employees; you can train chimpanzees; what you do to an LLM is not training - it is building a database to feed into another process.** I am deliberately not using the word "model" here, so as to restate the process in plain language absent the jargon these dipshits insist on using to obfuscate their techniques.*** "Tokens" is another jargon word here.**** "weights" is less objectionable as jargon, given it's used for a number of things with this approximate conceptual shape, but it's fucking annoying to me in this context.***** "prompt" is their fucking bullshit term for a natural-language database query****** "zero weighting" is jargon for "we drop it on the floor" - this is why I keep referring to people doing "prompt engineering" as playing games instead of doing actual security; if the fucking thing drops random parts of your shit on the ground, then inherently you have no way to enforce a policy that is subject to that process.† "tokenized", see ***
       
 (DIR) Post #B2gdbGZ86PoEbnnIQ4 by mkljczk@pl.fediverse.pl
       2026-01-26T19:38:44.493752Z
       
       0 likes, 0 repeats
       
       @munin tfw even fucking Gemini can write like JKR
       
 (DIR) Post #B2giyBFeWfkaYgQhFY by Rom13AncapHypoc
       2026-01-26T20:39:01.556161Z
       
       0 likes, 0 repeats
       
       @munin Faggot.Blocked!