Post AoihbZXIcwFr64SbOy by Dervishpi@mastodon.social
(DIR) More posts by Dervishpi@mastodon.social
(DIR) Post #AoiXQw67P7ZSLM2JEW by futurebird@sauropods.win
2024-12-04T23:29:41Z
0 likes, 1 repeats
When you have encountered *text* and suspected it was AI generated what caused you to feel something was off?
(DIR) Post #AoiXa15uCEvQRjGaZ6 by jmax@mastodon.social
2024-12-04T23:31:18Z
0 likes, 0 repeats
@futurebird Is stupid in ways that aren't typical for humans.
(DIR) Post #AoiXgo8Gmpy2GybvCS by brhfl@digipres.club
2024-12-04T23:32:31Z
1 likes, 0 repeats
@futurebird unfortunately, it's usually the 'tone' (that is, the repetition, the lack of sense, or just like... stringing 'thoughts' together in ways a human wouldn't) that i notice _before_ i start scrutinizing the factual errors.
(DIR) Post #AoiXmBmsGXJH5LAYDo by ryanjyoder@techhub.social
2024-12-04T23:33:16Z
0 likes, 0 repeats
@futurebird When I suspect text is AI-generated, it's often due to subtle inconsistencies or patterns that don't quite match human writing styles. Here are some common giveaways:1. Overly formal or stilted language2. Repetitive or overly complex sentence structures3. Lack of contractions or colloquialisms4. Inconsistent tone or emotional expression5. Overuse of buzzwords or trendy phrases6. Unnatural or forced transitions between ideas7. Absence of personal anecdotes or experiences8. Overly precise or formulaic languageThese traits can indicate that the text was generated by a machine rather than a human.
(DIR) Post #AoiXphslOTYJUjbKfQ by carrideen@c18.masto.host
2024-12-04T23:34:07Z
1 likes, 1 repeats
@futurebird I'm a literature and philosophy professor, and the eeriest thing to me is reading analytical writing about a text that has no subjective perspective. When you can't feel who is the mind thinking about a text, I get nauseous.
(DIR) Post #AoiXzV9PaGXQu93hOC by futurebird@sauropods.win
2024-12-04T23:35:55Z
1 likes, 0 repeats
@carrideen Just authoritative sentence after authoritative sentence... but NOTHING is said. It's creepy. Uncanny valley stuff.
(DIR) Post #AoiY2wb5zkFAtgeJc0 by AlexKourvo@zirk.us
2024-12-04T23:36:32Z
0 likes, 0 repeats
@futurebird It's the unnecessary details that always tip it off. Way too much information.
(DIR) Post #AoiY5vPSrnrc5Bd8fQ by richpuchalsky@mastodon.social
2024-12-04T23:37:04Z
0 likes, 0 repeats
@futurebird I didn't want to click "other" when I mean "all of the above"
(DIR) Post #AoiYK4ib3RGX0UKIkK by ShredderFeeder@shredderfood.com
2024-12-04T23:39:37Z
0 likes, 0 repeats
@futurebird Most AI sounds like a 12 year old doing a research paper who has been given a rubrik to follow and followed it just a LITTLE too closely.
(DIR) Post #AoiYNK5aMY5LXzczz6 by futurebird@sauropods.win
2024-12-04T23:40:14Z
0 likes, 1 repeats
@jmax Never thought I'd find good ol human stupidity ... wholesome. If you have run into students using AI in assignments it's so disappointing. There's no voice. And none of the goofy way that young people write that just isn't exactly... "right" but frankly, not being an English teacher. I like it better. Give me the raw deep thoughts of a 5th grader any day over this stuff.
(DIR) Post #AoiYTmgGyL6Xxbrwhc by mattmcirvin@mathstodon.xyz
2024-12-04T23:41:22Z
0 likes, 0 repeats
@futurebird Often it's just the utter banality-- this studiedly friendly voice expressing a view from nowhere.
(DIR) Post #AoiYWhbaIdtx5U4G1Y by michael_w_busch@mastodon.online
2024-12-04T23:41:54Z
0 likes, 0 repeats
@futurebird All three.Plus the various text generators each having identifiable styles.
(DIR) Post #AoiYbnNgFkXcRrzcDw by platypus@glammr.us
2024-12-04T23:42:44Z
0 likes, 0 repeats
@futurebird when I start to realize that it is essentially going for thoroughly middle of the road, bland, and a particular kind of “balance” feel that makes me sense the machine is trying to stick to some mean ideal
(DIR) Post #AoiYmZYQ082okc9chc by MattMerk@mastodon.social
2024-12-04T23:44:46Z
0 likes, 0 repeats
@futurebird In music parlance, I’d say it lacks swing. It lacks all the “flaws” that imbue text with emotion.
(DIR) Post #AoiYnxKD6hmMlSAUUq by VampiresAndRobots@writing.exchange
2024-12-04T23:45:02Z
0 likes, 1 repeats
@futurebird Other: we were offered a free year of virtual PT through our insurance. My first "visit" was with someone over the phone, but all subsequent consultants were text based. At first it just gave me the exercises and asked me to rate my pain, etc, but then the questions started getting uncanny, asking me to tell them what I liked best, what exercises worked best for what, and none of it was conversational anymore. I stopped answering when I realized I was training the app.
(DIR) Post #AoiYswKwD0xn9Bsw1w by elilla@transmom.love
2024-12-04T23:45:56Z
0 likes, 0 repeats
@futurebird boring bland beigeness
(DIR) Post #AoiYuBlALshBGpkT6O by jmax@mastodon.social
2024-12-04T23:46:03Z
0 likes, 0 repeats
@futurebird - Whatever their flaws, the kids are humans. AI isn't.
(DIR) Post #AoiZ0Ee2XQoVxKIZYe by jrdepriest@infosec.exchange
2024-12-04T23:47:13Z
0 likes, 0 repeats
@futurebird Lots of articles I read seem like a real person wrote maybe a paragraph or some bullet points and asked an genAI, "turn this into an article".It's mostly weird repetition that doesn't reinforce or emphasize anything, it just takes random bits and says the same thing again using slightly different words.
(DIR) Post #AoiZQof5FcrTBnSfz6 by eloquence@social.coop
2024-12-04T23:45:21Z
0 likes, 1 repeats
@futurebird Certainly! Let's delve into the complex and multifaceted reasons to meticulously consider the tapestry of AI-generated text in this digital age. It is important to note, however–
(DIR) Post #AoiZTBszsGKrnW6Ma0 by futurebird@sauropods.win
2024-12-04T23:52:31Z
0 likes, 0 repeats
@eloquence kiiiiiillll meee now
(DIR) Post #AoiZipoxgBSrztfpxY by nblr@chaos.social
2024-12-04T23:55:16Z
0 likes, 0 repeats
@futurebird @eloquence
(DIR) Post #AoiaaYvsgbDnFxWthg by rehana@mastodon.social
2024-12-05T00:05:00Z
0 likes, 0 repeats
@futurebird I should also have checked "other": lots of vagueness, and X but on the other hand Y types of claims instead of asserting one thing.
(DIR) Post #AoibLg7DnWtr90IhPs by falcennial@mastodon.social
2024-12-05T00:13:32Z
0 likes, 0 repeats
@futurebird for me it is:- lack of self-awareness- non-existent macro context- not having anything to do with the query- self contradiction (not always but almost always) - non-existent narrative elements - crimes against common sense- stylistic chaos - nonsensically passive language - vague about important bits, specific about the trivial bits categorically useless. more "processing power" is not going to solve even one of those defects. LLMs are garbage producing machines.
(DIR) Post #AoicMNy6qxnD6dwK0G by TerryHancock@realsocial.life
2024-12-05T00:24:50Z
0 likes, 0 repeats
@futurebird What really tips me off is the way it seems to meander around the point, with irrelevant asides and diversions and needlessly wordy phrasing.Most of the time, unless they're trying to screw with you, a human will get to the point. Or at least their side points will have a point.
(DIR) Post #AoicOhUfELnkF1Gq6C by krans@mastodon.me.uk
2024-12-05T00:25:17Z
0 likes, 1 repeats
@futurebird Under, “other,” I would include a type of tone-deafness: text in a style that's grammatically correct and almost entirely sensical, but just… has little stylistic relationship to the way a human would idiomatically write in that context. Does that make sense? It's very, “You know it if you see it.”
(DIR) Post #AoiccREUU76jraFxB2 by nerpulus@mastodon.online
2024-12-05T00:27:46Z
0 likes, 0 repeats
@futurebird Grammatical incoherence, like not knowing how something early in the sentence is supposed to resolve with something later in the sentence, or the same sort of thing on a larger level.
(DIR) Post #Aoid0yfhoLd2WRgfGS by ThreeSigma@mastodon.online
2024-12-05T00:32:12Z
0 likes, 0 repeats
@futurebird Mainly it looks trite and obsequious.
(DIR) Post #Aoid5Kn9Vk0yHjFQMC by raganwald@social.bau-ha.us
2024-12-05T00:32:59Z
0 likes, 0 repeats
@futurebird For me, it was a factual error on a subject in which I consider myself an authority: me.I asked it who I was, and it included a list of books I'vd written. And it did well, I had written most of the books on the list.
(DIR) Post #AoidVeVWpZcgxfYOSu by skjeggtroll@mastodon.online
2024-12-05T00:37:44Z
0 likes, 0 repeats
@futurebird "Waffling" and "flatness." It reads like someone trying to push out an article or copy on a subject they don't really understand, so they just jot down whatever facts they can find in a loose structure, all with the exact same emphasis on every single piece of information.
(DIR) Post #Aoidihnvsu2Qx1O0y8 by NikT@mastodon.nz
2024-12-05T00:40:05Z
0 likes, 0 repeats
@futurebird I read a lot of user-submitted product reviews in the course of my work. Without fail, the AI ones read "[product] is a [thing]. Its pros are [x]. Its cons are [y]. Overall, it's [z]". Entirely bland and without a smidgeon of personality, even vs. generic "this sucks!" or "this rules!" submissions. At least it's easy to tell, for now, simply by reading enough of them to vibe check.
(DIR) Post #Aoie0LY9MhRv8sPrFo by devkitsune@furry.engineer
2024-12-05T00:43:17Z
0 likes, 0 repeats
@futurebird repetitive in weird ways and also a tendency towards specific text formatting/layout. ChatGPT really loves:/HEADER:/*Block of text*/**Bold Point 1-sub points/maybe small block of text**Bold Point 2-sub pointsEtc... Between the repetition, that layout, a tendency for sub points to not match very specifically with their bold points I feel like undoctored chatGPT text is pretty distinct. Of course once people start to reformat it, correct it, remove repetition, and maybe slice it up around human written text it gets more difficult to definitively spot
(DIR) Post #Aoie1daNnElnMy4lqC by ijk@mathstodon.xyz
2024-12-05T00:43:19Z
0 likes, 0 repeats
@futurebird It's difficult to articulate precisely what I perceive when I think this, but AI generated prose is soulless; exactly what you would expect an unthinking machine to produce.
(DIR) Post #AoieTGznNNvrPq8km8 by Eliot_L@social.coop
2024-12-05T00:00:39Z
0 likes, 0 repeats
@futurebird I think for me it's mostly been *vibes*? There's something about some text that just *feels* like slop generated by an LLM? Sometimes any of the reasons listed in your poll apply though.
(DIR) Post #AoielnDIKWqsY9HkCO by blogdiva@mastodon.social
2024-12-05T00:51:51Z
0 likes, 0 repeats
@futurebird all of the above but also:1. lack of style: boring, textbook like grammar and orthography or articles written like high school exercises: thesis statement, a few points, conclusion. 2. good writers make good use of crescendo and cliffhangers, especially in reporting of unfolding events. it’s rare in plagiarized gloms of content, no matter if spewed by a database or slapped together by a human.3. writers without any socials but published by one pub alone are absolutely suss.
(DIR) Post #AoietwTNmsmbaHhDt2 by merling@sciences.social
2024-12-05T00:53:20Z
0 likes, 0 repeats
@futurebird I like how all the comments to your post have each their unique point of view on how gen AI text appears to them, as well as their own way to phrase it.There's a meta to it: it serves to prove your point by underlining the richness of human writing, and the emptiness of gen AI production...(I'm really baffled that many people whose job is writing (e.g. in academia) turn easily to gen AI, and it's refreshing to see many others resisting it!)
(DIR) Post #Aoif24yg3DHeHIDKdc by negative12dollarbill@techhub.social
2024-12-05T00:54:50Z
0 likes, 0 repeats
@futurebird I came across an entire site which appeared to be about guitars (I had some technical questions about Kurt Cobain) and reading the content made me doubt my own sanity for a minute or so. It was so repetitive and actual-content-free that I wondered if I was having some kind of episode. It appeared the whole site was slop. I looked for some way to report it but that doesn't seem to be a thing.
(DIR) Post #AoifOI77tJEPOFl1U0 by yonder@spacey.space
2024-12-05T00:58:50Z
0 likes, 0 repeats
@futurebird @carrideen I used to consider a rule about abstract nouns.i.e. deduct marks for every abstract noun. Maybe I'd give the writer some for free. I wouldn't want to be cruel. But they'd have to consider if it's really necessary.I never actually did that... but now...
(DIR) Post #AoifPWfpFC94psZo7U by jgrg@mstdn.science
2024-12-05T00:59:01Z
0 likes, 0 repeats
@futurebird @carrideen Yes. I would say that the sentences flow, are idiomatic and easy to read, but don't say anything.
(DIR) Post #AoifWzXKKZSCNbA3ea by TindrasGrove@infosec.exchange
2024-12-05T01:00:24Z
0 likes, 0 repeats
@futurebird other: vague and hand-wavy while sounding authoritative. If I’ve gotten a couple paragraphs in and there’s no point anywhere on the horizon, I assume it’s either generated or older SEO garbage.
(DIR) Post #Aoify3guzYAF5GYPZI by mcc@mastodon.social
2024-12-05T01:05:18Z
0 likes, 0 repeats
@futurebird Text that feels like it was written by a corporation. Sentences that are longer than they need to be but not in the normal way that people write runon sentences. Like someone was writing a sentence that needed to be understood by a six year old. I don't know if I'm describing this well but there three sentences are trying to describe a single thing.
(DIR) Post #Aoig2rH0aYVO1CivYW by epicdemiologist@wandering.shop
2024-12-05T01:06:00Z
0 likes, 0 repeats
@futurebird A complete lack of grammatical connection between phrases. My late mother would have offered a piece of chalk and a section of the blackboard and said, "Diagram that sentence."
(DIR) Post #AoigG4NNBVEQQ704gK by moira@mastodon.murkworks.net
2024-12-05T01:08:33Z
0 likes, 0 repeats
@futurebird I have actually encountered the "switches seamlessly from meaning 1 of a word to unrelated meaning 2 of a word" in the wild. I don't remember what I was looking for but it was amazing.
(DIR) Post #Aoigtf80TUerhNgy4O by billyjoebowers@mastodon.online
2024-12-05T01:15:41Z
0 likes, 0 repeats
@futurebird Just doesn't sound like anything a real adult would say. Sounds like a child giving a book report.
(DIR) Post #Aoigy194YbwL8sGdDU by wronglang@bayes.club
2024-12-05T01:16:29Z
1 likes, 0 repeats
@futurebird there's a pervasive vanilla self-absorbed never-left-middle-school writing style that's hard to describe
(DIR) Post #Aoih7gQ8SyVJlFrDTU by dpnash@c.im
2024-12-05T01:18:14Z
1 likes, 0 repeats
@futurebird I picked “other”, but…really, I’ve been clued in by all three options at some time or another, plus more. Biggest tell for me these days is a sort of forced blandness, like corporate market-speak with the buzzwords toned down a bit. It’s really obvious in LLM-generated product reviews, which always seem to be three or four “pros” and a similar number of “cons” with little if any discernment of, or context for, which ones really matter.
(DIR) Post #AoihbZXIcwFr64SbOy by Dervishpi@mastodon.social
2024-12-05T01:23:38Z
0 likes, 0 repeats
@futurebird I just read an article with a paragraph-long caption on an image. It repeated the same info in the body text. And then it put the same text in a sidebar.The reason I read the whole article is because I was trying to make sense of the technospeak making up that caption.
(DIR) Post #AoiiWvsAFjrEsXZTcW by etherdiver@ravenation.club
2024-12-05T01:33:58Z
0 likes, 0 repeats
@futurebird all the AI "writing" I read sounds just like an ignorant high schooler's essay on the topic. Facts cribbed from Wikipedia and maybe a BuzzFeed article, thesaurus word salad, shallow and repetitive with zero character.
(DIR) Post #AoijPKRt5y14joDcS8 by matt@proud.social
2024-12-05T01:43:50Z
0 likes, 0 repeats
@futurebird Low density of substance or inappropriately casual tonality.
(DIR) Post #Aoijc9GhyURtpM6Ptw by nazokiyoubinbou@urusai.social
2024-12-05T01:46:09Z
0 likes, 0 repeats
@futurebird I won't lie. I utilize a LLM for personal reasons (like doing stories and stuff with characters for fun, not anything anyone else will ever see. None of it gets saved or written down.)There are definitely a lot of things that they tend to produce that become telltale signs. The biggest one is something called "ministrations" where they keep repeating certain common phrases (probably the most common you see in everything is when a character experiences "shivers down their spine" like 50 times in a short period of time, lol.)Of course much of that would be harder to detect in the sort of text you're talking about, but I guess the closest would be "repetitive on macro level."
(DIR) Post #AoikK1WM8p8sXYLJvE by qwynnyx@mastodon.social
2024-12-05T01:54:04Z
0 likes, 0 repeats
@futurebird I actually don't know the English equivalent for German "Duktus" (it's not ductus, I looked that up at least) but something like "phrasing & flow". The choice of words and their flow is, at least for the few times I had to read long paragraphs of ChatGPT very revealing as it has its"own" pseudo-scientific "style" that can imo be recognized when come across. I don't know whether it can be prompted to write in a different style. And otherwise factual errors play into it but not solely.
(DIR) Post #Aoimgp5SjfsYXvlMg4 by aprilfollies@mastodon.online
2024-12-05T02:02:38Z
0 likes, 1 repeats
@falcennial @futurebird“- not having anything to do with the query” I think I “caught” (decided not give credit for) a majority of LLM-generated answers with the simple feedback, “This does not answer the question that was asked.”I don’t trust myself to be a LLM- detector when it comes to grading, and the automated detectors have a wretched number of false positives. So I look for “what’s WRONG with this answer” in a specific way I can point to.
(DIR) Post #Aoimx009vKbN1vznA8 by minxdragon@wandering.shop
2024-12-05T02:23:27Z
0 likes, 0 repeats
@futurebird @timnitGebru All of the above, plus sweeping generalizations and sentences that are functionally meaningless but sound fancy.
(DIR) Post #AoinOGP538hliceYdc by pencilears@mastodon.eternalaugust.com
2024-12-05T02:28:26Z
0 likes, 0 repeats
@futurebird a human writing a thing will vary their tempo and the pace of their text, but will overall tell a coherent story. (Even if they're a bad writer)LLM's pull out little chunks of text that are ok, but don't flow within the larger piece.They'll also put in details that don't go anywhere or mean anything, but mostly the giveaway is if trying to make sense of the text makes my eyes cross.
(DIR) Post #AoinllsCDEibyqTsno by foolishowl@social.coop
2024-12-05T01:48:48Z
0 likes, 0 repeats
@futurebird It's something about the tone.
(DIR) Post #AoipPkg6Av3doMS0Ia by RnDanger@infosec.exchange
2024-12-05T02:51:08Z
0 likes, 0 repeats
@futurebird I voted for both macro level responses, and other. Other:Sometimes i think the content doesn't seem leveled appropriately for the intended audience. Complex ideas are glossed over while simple things that should be taken for granted will have unnecessary explanations.
(DIR) Post #AoiqZE5UIhBnaFPLY8 by Thad@brontosin.space
2024-12-05T03:04:03Z
0 likes, 0 repeats
@futurebird A how-to guide that has ten different sections, each with its own subheading, explaining the concepts and history of the topic before getting to the how-to part.
(DIR) Post #AoiqeMEhQpVYIuAC3c by MrLee@aus.social
2024-12-05T03:04:58Z
0 likes, 0 repeats
@futurebird It had no soul.
(DIR) Post #AoirmJn2VgwizKaMD2 by joannaholman@aus.social
2024-12-05T03:17:37Z
1 likes, 0 repeats
@futurebird there’s also that it tends to use vocab and grammatical constructions that are technically correct and clunky. It also has a tendency to use generalisations, vague statements and over-explanation that a human who knows their audience probably wouldn’t
(DIR) Post #AoixtJsr9vhR201ncG by MedeaVanamonde@chaosfem.tw
2024-12-05T04:25:58Z
0 likes, 0 repeats
@futurebird @jmax they will be reduced to grunts, meows and winnie’s come June
(DIR) Post #Aoj4omrmfr8h9PJQCO by 1HommeAzerty@mamot.fr
2024-12-05T05:43:45Z
0 likes, 0 repeats
@futurebird No typos, fluent style, 0% copy/paste detected. Ail this are very unusual for the kind of students I have.
(DIR) Post #Aoj6G4Lxrn559r0BBw by trollkatt@mastodon.online
2024-12-05T05:59:52Z
0 likes, 0 repeats
@futurebird too well written, especially for a non-English speaker. Unnecessarily "embellished".Also, lacking character. Knowing someone, I also know how they express themselves. Receiving a text from them in a completely different "voice" is uncanny. (And completely unnecessary IMO)
(DIR) Post #AojFQIDE5p8mclreim by cavyherd@wandering.shop
2024-12-05T01:00:47Z
0 likes, 0 repeats
@VampiresAndRobots @futurebird Oh ew. We've got "free PT" in our "health plan." I haven't tried it, but wonder if that's what it is? Ew ew ew....
(DIR) Post #AojFQIo5skdCT776Bc by VampiresAndRobots@writing.exchange
2024-12-05T04:08:48Z
0 likes, 0 repeats
@cavyherd @futurebird I feel like at this point, all "Free virtual ____" is either shitty AI or shitty AI training.
(DIR) Post #AojFQJiSVBiZHvU72e by cavyherd@wandering.shop
2024-12-05T04:37:11Z
0 likes, 1 repeats
@VampiresAndRobots @futurebird I've heard it posited that all those captchas "click all instances of motorcycles" are secretly AI training, & I find this disturbingly plausible.
(DIR) Post #AojO5P1cT1jBGBY6T2 by rhamphorhynchus@mastodon.nz
2024-12-05T09:19:37Z
0 likes, 0 repeats
@futurebird I can always spot LLM alt text, it's when you see "the background of the image features a textured grey wall". Clearly it doesn't know what part of the image is worth looking at. The word "features" is a red flag all by itself when the right word was "has". Real advertising flim-flam.
(DIR) Post #AojSkxq0L4nMv3BpMe by vanderZwan@vis.social
2024-12-05T10:11:59Z
0 likes, 0 repeats
@futurebird I selected all of the options.The "other" choice is mostly that the text makes no sense on a meta-level, where I interpret macro-level as "the interpretation of the text as a whole", and meta-level as "the context within which the text exists".
(DIR) Post #AojTPWho7XbidvhIG0 by nazokiyoubinbou@urusai.social
2024-12-05T07:45:53Z
0 likes, 0 repeats
@cavyherd @VampiresAndRobots @futurebird 99.999% sure it is.Synthetic training ultimately results in a breakdown of a model in a generation or two. They need non-synthetic input. And what better way than to mass deploy a system where users identify the outputs so they don't have to.Except it's still pretty synthetic. We're being fed some pretty poorly made results and we don't understand our instructions. (Should I click this one? Will it count against me if I click it?) And of course they're low resolution too, which makes it that much harder for us to even determine what they are even supposed to be.So I guess the good news is we're giving it at least some bad data. 😁
(DIR) Post #AojTPXnA57U60pD5F2 by nazokiyoubinbou@urusai.social
2024-12-05T08:01:38Z
0 likes, 1 repeats
@cavyherd @VampiresAndRobots @futurebird This gives me an idea for a new type of image compression and very possibly the worst even imaginable: store images as a seed, prompt, and model hash. You could make a huge image something like a kilobyte. And impossible to 100% reproduce. (The ultimate low quality JPEG, lol.) Plus it would require draining a lake somewhere and cutting power to several houses to show the image. But, all in the name of science or something.
(DIR) Post #AojWPlj8ZShF5s7txI by jeana@triangletoot.party
2024-12-05T10:52:46Z
0 likes, 0 repeats
@futurebird other: fluff, for lack of a better word. LLM text seems to do a lot of empty phrasing (not dissimilar from SEO spam "blogs", really), and I usually see that before I get to the other things you mention.
(DIR) Post #AojYhBtajIqp5MS3YO by david_chisnall@infosec.exchange
2024-12-05T11:10:32Z
1 likes, 1 repeats
@nazokiyoubinbou @cavyherd @VampiresAndRobots @futurebird There are quite a few papers using various ML approaches for compression (because they basically are lossy compression systems).The earliest one that I'm aware of is the hyphenation algorithm in TeX, from the '70s. Trying to encode hyphenation rules is really hard. For example, English and American differ because English hyphenates on stem boundaries (which requires knowing where the word came from) whereas American hyphenates on syllable boundaries. Other languages have different rules.Rather than trying to write these rules, TeX takes a corpus of correctly hyphenated words and build a short Markov chain that predicts the breaking probability between two-letter chunks. This is small (and, importantly, is a fixed size irrespective of the corpus size). Then they run it on each of the words in the corpus and keep a list of the words for which it gives the wrong answer. As I recall, with a corpus of ten thousand English words, the list of outliers was around 70.This kind of approach is used more broadly to transform lossy compression into lossless: store the lossily compressed data and then store the delta. The approach is nice because usually there's far less information in the delta between the lossy compressed version and the source than there is in the source and the approach can be made fairly agnostic to the lossy encryption algorithm.This makes it very useful for ML-based approaches where the behaviour of the underlying lossy compression is variable. You can train a neural network of a fixed size on some input and then make it predict the input and record the deltas. Fabrice Bellard wrote some fun tools for doing this.A lot of non-ML compression algorithms now use dictionaries. Zstd lets you process some representative data to create a dictionary that you then record separately and can then use for different things that are likely to be common to a load of sequences (for example, if you're exchanging a load of small JSON files with the same schema, a pre-shared dictionary that learns the names of all of the keys will likely save a lot of space, and gracefully fall back to larger file size if there are new keys. Using a pre-shared neural network trained on representative data is the same idea.
(DIR) Post #AojYmersRFuKODFyRk by futurebird@sauropods.win
2024-12-05T11:19:31Z
0 likes, 0 repeats
@david_chisnall @nazokiyoubinbou @cavyherd @VampiresAndRobots This was fascinating. Thanks for taking the time to break it down.
(DIR) Post #AojZwQOfCGeSMFoTj6 by Torstein@mastodon.social
2024-12-05T11:32:28Z
0 likes, 0 repeats
@futurebird I picked "other". LLM output tend to be more "boilerplatey" than what normal humans produce. Stock phrases, cookie cutter structure, reads as SOE vetted filler.Attended a conference this summer, and it was fairly obvious who had used LLMs to polish their talks, because ultimately they all sounded alike (even if the topics differed) as if they all had copied from the same article on "how to write a conference talk".
(DIR) Post #AojfFBvNjD2qkIsv5s by futurebird@sauropods.win
2024-12-05T12:31:53Z
0 likes, 0 repeats
@charlotte @vikxin @nazokiyoubinbou @cavyherd @VampiresAndRobots If we have enough processing power and memory to waste training LLMs then … why do we need compression? Ok I can kind of answer my own question something about transferring data— but IDK it just seems like it’s solving an old problem in a way that implies we ought not to have that problem anymore.
(DIR) Post #AojovBs5DPzmKZRYbg by marymessall@mendeddrum.org
2024-12-05T14:20:15Z
0 likes, 0 repeats
@futurebird I feel like I notice when the choice about which details to include seems wrong. In student writing, I get suspicious when there are indicators that someone either did a lot of background research not required by the assignment, or used AI. When I am looking for advice about how to fix a hair dye disaster, finding a web page which digresses into the history of hair dying while ostensibly helping me deal with an emergency makes me lose trust.
(DIR) Post #AojsL5vCG6r7GlwYTI by dickon@splodge.fluff.org
2024-12-05T14:58:36Z
0 likes, 0 repeats
@futurebird Other. Weird, flat phrasing is a dead giveaway, as is receiving far too many Americanisms from something purportedly British.
(DIR) Post #AojsMT9fMZw3cGMoKG by JoshuaACNewman@xeno.glyphpress.com
2024-12-05T14:58:48Z
0 likes, 0 repeats
@futurebirdWhat gets me is that it avoids saying anything. There’s no narrative arc, no irony, no sense of humor, and it doesn’t make an argument. It just says polite things that someone might say, surrounding a paragraph that is, at best, vague; most likely unilluminating on the subject; and at worst, wrong in a truly dumb way, like the article is about a synonym of the topic.
(DIR) Post #AojsVD8KIguWX3uOUC by VVitchtoria@mastodon.social
2024-12-05T15:00:16Z
0 likes, 0 repeats
@futurebird other: there’s an emptiness to the language and it isn’t cohesive. Even it’s correct it feels wrong or off. Like a different “voice” is writing each sentence or parts of sentences.
(DIR) Post #AojuWpqa35nqnaVhi4 by asakiyume@wandering.shop
2024-12-05T15:03:58Z
0 likes, 0 repeats
@futurebird All of them, but also, for nonspecialist information websites, a kind of same-y pattern that doesn't take any account of the subject matter. It sounds the same whether it's a recipe for barley bread, information on a dinosaur, or information on cat ear problems.
(DIR) Post #AojvNkda9HuzzOf9Sy by asakiyume@wandering.shop
2024-12-05T15:19:11Z
0 likes, 0 repeats
@futurebird Here's an example of what I mean. It starts with a jarring gee-whiz! intro ("I get excited about food. Do you? I hope so! ") then has these sentences: "The Bible is filled with references to food, not the least of which is barley. This fascinating grain is among the earliest known and most nourishing grains ever to be cultivated." --that "not the least of which" is a big tell for me. The whole is incoherenthttps://thebiblicalnutritionist.com/barley-in-the-bible/
(DIR) Post #AojvOBZxyq4LY0B94y by asakiyume@wandering.shop
2024-12-05T15:21:02Z
0 likes, 0 repeats
@futurebird (That one has a woman who puts her name on it, but I bet she was AI assisted. I'm an editor, and I've edited people who used AI to generate their text.)
(DIR) Post #Aojx0g3pCNi52i3xYm by murtaugh@mastodon.social
2024-12-05T15:50:56Z
0 likes, 0 repeats
@futurebird AI text often has a very strong “Webster's dictionary defines…" vibe to it
(DIR) Post #Aojxbj296mCiyfwYLY by Rhube@wandering.shop
2024-12-05T15:44:56Z
0 likes, 0 repeats
@futurebird repetitive and really badly written. Usually much, much longer than it needed to be.
(DIR) Post #AojzIYqWxb6DmwV3fU by pbinkley@code4lib.social
2024-12-05T16:16:36Z
0 likes, 0 repeats
@futurebird the confident purposelessness - the "I don't know why I'm telling you this, but here's everything I know about X" tone of it
(DIR) Post #AokLZfclDAL0ixcezQ by nazokiyoubinbou@urusai.social
2024-12-05T20:26:12Z
0 likes, 0 repeats
@futurebird @charlotte @vikxin @cavyherd @VampiresAndRobots Considering that getting the same output as intended requires having the same model, the transfer of data part gets a bit weaker since it would likely mean first transferring a model that probably would weigh in at a few gigabytes.Though I guess it would have to still be kind of small and run on CPU only with small context blocks to produce a reliable enough output. When I was helping someone work out a formula for something we did a lot of testing back and forth of perplexity (which was essentially already a measure of how accurately it reproduced the expected data) and we discovered my hardware (AMD) was producing different results from their hardware (nVidia.) Not by much, but again, that would be lossiness.
(DIR) Post #ApTFp4E4iNqv0PRywC by cavyherd@wandering.shop
2024-12-05T15:43:07Z
1 likes, 0 repeats
@mekkaokereke @nazokiyoubinbou @VampiresAndRobots @futurebird > Danger: everyone measures the success of sketch artists by convictions. Not by accuracy.I could happily have gone all year without that thought in my brain. I mean, it's obvious when you think about it, but still 😬
(DIR) Post #ApTFpD5Bmn5mTSMVpw by Tattooed_Mummy@beige.party
2024-12-08T12:05:45Z
0 likes, 0 repeats
@futurebird I've tried it, and you ALWAYS need to tweak it to make it sound human. It has no human feel. Like ai art it's soulless. It's hard to define but it doesn't sound "right". And yes there are often repetitions
(DIR) Post #ApTFpEEnUYN83XrhS4 by futurebird@sauropods.win
2024-12-08T12:13:29Z
1 likes, 0 repeats
@Tattooed_Mummy Once I tried to get GPT to write one of the reports I have to write about students. I was very tired, and gave it a bulleted list of what should go in the report asked for two paragraphs.Then I read the paragraphs and they were so bad, and made me so angry at how terrible they were that suddenly I had the energy to write the thing myself. Somehow seeing my task "done" in that way reignited my ability to write. I should ask it to write a summary of one of my book ideas.
(DIR) Post #ApTFpL6twJDPN5DzcG by vt52@ioc.exchange
2024-12-08T18:27:41Z
1 likes, 0 repeats
@futurebird it's similar to one of my favorite tricks to fight indecision... you have two seemingly equal options and can't decide which to choose.flip a coin for it and see how you feel. sometimes you really just needed something to break the tie, and sometimes you'll find that you disagree with the result (even if you can't say why)... either way, you'll have solved the problem @Tattooed_Mummy
(DIR) Post #ApTFpLYYHVKikqA4iO by futurebird@sauropods.win
2024-12-08T12:14:52Z
0 likes, 0 repeats
@Tattooed_Mummy This is by far the most "useful" LLMs have ever been to me. They can provide an example that makes what *really* ought to be said snap into sharp clarity. Is that worth all the wasted energy? seems dubious.
(DIR) Post #ApTFpegLAlqw4Ux3zc by devkitsune@furry.engineer
2024-12-05T11:57:00Z
1 likes, 0 repeats
@dhobern @futurebirdI largely agree with this, but my original post is not making a statement about the practicality or morality of AI. (I think humans are very capable of writing text that doesn't say anything of merit without AI assistance, and frequently AI doesn't actually save any time or effort compared to just throwing some nonsense down on the page) I think there is a larger discussion to be had about the amount of low quality text that people are expected to write to fill quotas. But my original post is again not discussing that, it's just saying it isn't always easy to spot AI text when humans can also write soulless nonsense.