Post B11eiMWS3YNtD0RvUG by smn@l3ib.org
(DIR) More posts by smn@l3ib.org
(DIR) Post #B11cplZh6wdJ1fMwPA by futurebird@sauropods.win
2025-12-08T02:54:56Z
0 likes, 0 repeats
This guy generally does interesting work, but he's used an LLM to analyze the trends in a "creation science" journal over time, and I just don't think LLMs are effective for this kind of statistical task. Or have a missed something and they can count now?Thought I'd ask before leaving a comment about the possible issue. https://www.youtube.com/watch?v=RmHT-wAUYI0
(DIR) Post #B11d19vOuzUNjQxFFw by futurebird@sauropods.win
2025-12-08T02:56:59Z
0 likes, 0 repeats
I mean LLMs are based on statistics, and they will produce results that look like frequency charts. But these charts only attempt to approximate the expected content. They aren't based on counting articles that meet any set of criteria. It's... nonsense, and not even people who pride themselves on spotting nonsense seem to understand this.
(DIR) Post #B11d6KFfwxqTCytxHk by futurebird@sauropods.win
2025-12-08T02:57:56Z
0 likes, 0 repeats
The companies offering these products seem to be delighted that people are confused and using them to do things they simply aren't really doing.
(DIR) Post #B11dBqNdvNQeIjG9BY by castanea_jo@ni.hil.ist
2025-12-08T02:58:53Z
0 likes, 0 repeats
@futurebird there are definitely automated/ML/NLP systems that can count and produce trend data, but LLMs are not one. they can only produce somethign that seems confident it has done so
(DIR) Post #B11deTkj8HZGWKi968 by futurebird@sauropods.win
2025-12-08T03:04:05Z
0 likes, 0 repeats
@castanea_jo LLMs really just shouldn't output that kind of data.
(DIR) Post #B11didpxa53KxH8y4e by castanea_jo@ni.hil.ist
2025-12-08T03:04:49Z
0 likes, 0 repeats
@futurebird definitionally they can only be precise and not accurate, so I agree heh
(DIR) Post #B11dpBAWZF984Ksbr6 by futurebird@sauropods.win
2025-12-08T03:05:54Z
0 likes, 0 repeats
@alienghic "I suspect in the hands of someone who knows what they're doing it might be possible to extract interesting insights from how the model is grouping terms."This is totally possible. But I don't think this is what that would look like?
(DIR) Post #B11eCt3bYd3kqYwq9o by Moss@beige.party
2025-12-08T03:10:17Z
0 likes, 0 repeats
@futurebird He even starts with a disqualifying lie: “I analyzed”. No you (he) didn’t—he entered a parcel of text into a statistics-based machine that is not itself capable of statistical analysis.
(DIR) Post #B11eaLOSMzgFOHt1Wa by futurebird@sauropods.win
2025-12-08T03:14:33Z
0 likes, 0 repeats
@Moss Damn thing will sit there and tell you that's what it's doing. But it can't count! It still can't count. I feel like I'm going crazy. Am I the only person who cares that the machine can't even count?
(DIR) Post #B11eiMWS3YNtD0RvUG by smn@l3ib.org
2025-12-08T03:15:57Z
0 likes, 0 repeats
@futurebird LLMs not only can't count but they can't even stop from lying about being able to count because they don't know what counting is and they're just trained to shit out text that resembles what text looks like when someone has counted something.https://community.openai.com/t/serious-issues-with-accuracy-repeated-character-count-failures/1254636This is just the first practical example I could find but there are other examples similar to this. Like you cannot get LLMs to reliably play a game of chess without making an illegal move because they can't learn rules
(DIR) Post #B11eontdC50LqRMZVo by superflippy@mastodon.xyz
2025-12-08T03:17:08Z
0 likes, 0 repeats
@futurebird Scientists are using AI to extract data from very large, noisy datasets. I do not know enough about the technology to know exactly what they’re doing, but I can tell you they’re not just feeding their data into ChatGPT. So in one sense, it’s possible to get information about a large set of articles using AI, but the accuracy of the results would really depend on how he’s doing it.
(DIR) Post #B11f5k88RuUx5UDytc by Smoljaguar@spacey.space
2025-12-08T03:20:11Z
0 likes, 0 repeats
@futurebird if you're talking about using LLMs as a classifier for arbitrary text, I've seen yougov do it for some polls where they ask people about what they've read in the news recently and the LLM classifies what topics were mentioned, this ability is advertised here https://yougov.com/business/products/ai-qualitative-explorerAlso I've seen data science articles from the economist using basically the same idea on larger corpuses of text. I think empirically the best LLMs today are very good at modelling humans so this is ~fine?
(DIR) Post #B11fE0ru6JKXJhZlJo by futurebird@sauropods.win
2025-12-08T03:21:43Z
0 likes, 0 repeats
@Smoljaguar Wouldn't you need to ask it about each article individually and track the results?Not just give it a stack of articles and ask "how many of the articles mentioned X" ?
(DIR) Post #B11fisjEPlet34ErWy by grimacing@luzeed.org
2025-12-08T03:25:40+00:00
0 likes, 0 repeats
Seems like the more people pride themselves on spotting nonsense, the more they seem to be advocating this shit these days. People have entered into this weird phase of mass AI hysteria and only those that don't use it at all are sitting here like... Am I crazy or are the hoards of AI enthusiasts crazy? It's gotta be one or the other.
(DIR) Post #B11fitlkXtGcHAQO5w by futurebird@sauropods.win
2025-12-08T03:26:56Z
0 likes, 0 repeats
@grimacing I don't think this guy is an enthusiast, he's just using a tool in a way that seems reasonable and that seems to give the results he wants without really knowing what those results really represent.
(DIR) Post #B11hSefN86tmi5p8D2 by Smoljaguar@spacey.space
2025-12-08T03:46:44Z
0 likes, 0 repeats
@futurebird yeah, that's what the correct thing to do would be, but it is still plausible that it could do the second, it's just more likely to make a mistake (though I think a task of this difficulty is pretty doable for current models with huge contexts (1M tokens), unlike older/cheaper models which had severe quality drop offs after maybe 10k tokens
(DIR) Post #B11jMk2YR8S7Y1yWFk by futurebird@sauropods.win
2025-12-08T04:08:07Z
0 likes, 0 repeats
@Smoljaguar If it says there are 67 articles that mention topic X, but you don't know if that number is correct, it's just a guess based on context and the bulk of text (and LLMs are also bad at following commands such as "consider only these sources" ... ) what is the point of saying the number. Maybe could you ask if a topic is mentioned "frequently" or "infrequently" but beyond that I think it's deceptive and useless.
(DIR) Post #B11m1K0wJ0CObvhJPE by grimacing@luzeed.org
2025-12-08T03:30:22+00:00
0 likes, 0 repeats
That's not even the point of what I said at all, but nm.
(DIR) Post #B11m1KpzFD23AFa4yO by futurebird@sauropods.win
2025-12-08T04:37:48Z
0 likes, 0 repeats
@grimacing Sorry I thought you were referencing the original post.
(DIR) Post #B126zCOZl7ShRKaADo by david_chisnall@infosec.exchange
2025-12-08T08:32:43Z
0 likes, 0 repeats
@futurebird @castanea_jo Part of the problem with LLMs is that there’s no usable way of encoding the boundaries of what they are useful for in the models. It’s easy to say they shouldn’t output that kind of data, but for the model it’s just taking some text and producing a picture. The system has no classification that says ‘this is a picture of data and so I shouldn’t output it’. A graph with nonsense data is no different from an ant with the wrong number of legs: it’s just the most plausible output based on the training set and the prompt.If you can solve this, you can also solve the ‘hallucination’ problem.
(DIR) Post #B12JbFf3GRlg1SCT5M by IngaLovinde@embracing.space
2025-12-08T10:54:04Z
0 likes, 0 repeats
@futurebird LLMs are based on statistics, and there is a way (at least in theory) to use that for getting some other statistics out of some source text.Asking ChatGPT "give me the statistics" is not that way!It's like, just because LLMs are running on computers, and computers are good at multiplying 20-digit numbers, doesn't mean that you can ask ChatGPT to multiple two 20-digit numbers and expect to get a correct answer; you'd need to use computers in a different way for that.
(DIR) Post #B12wtCRDsmUlS2HU6y by crumbleneedy@aus.social
2025-12-08T18:14:15Z
0 likes, 0 repeats
@futurebird i must've missed the memo about creationism being a thing again. basic sql and regex skills would be the go here. but sure let's flood the zone with lazy delusional bullshit. these people.
(DIR) Post #B12y4htTN5mY9HXtJY by futurebird@sauropods.win
2025-12-08T18:27:35Z
0 likes, 0 repeats
@Virginicus @Smoljaguar I wonder if there is an API for any of the free models. Although I hate interacting with cloud APIs
(DIR) Post #B12yPnIqSHfr0ST6oq by Smoljaguar@spacey.space
2025-12-08T18:31:27Z
0 likes, 0 repeats
@futurebird @Virginicus I think openrouter might serve some models for free?
(DIR) Post #B13CqWonFfJd3b3PhQ by dahukanna@mastodon.social
2025-12-08T21:13:07Z
0 likes, 1 repeats
@futurebird @Moss “ But it can't count! It still can't count. I feel like I'm going crazy. Am I the only person who cares that the machine can't even count?” -I also feel deep incredulity towards this corporate-grade “confabulation”.
(DIR) Post #B13ET0r080q3AY0956 by gildilinie@beige.party
2025-12-08T21:31:16Z
0 likes, 0 repeats
@futurebird @Moss I keep saying this too
(DIR) Post #B13SGPw2xoLMYFFXY8 by juliangonggrijp@ieji.de
2025-12-09T00:05:51Z
0 likes, 0 repeats
@futurebird @Moss It cannot really describe what it's doing, either, because it doesn't know that about itself. All it can really do is make up a story about what it's doing. It's not even a guess; that already attributes too much intention to what is actually happening. The machine is just putting word after word, based on what it was rewarded for during training. It's a bluff story.In that sense, the counting is not really different. It can do neither. All it can do is put word after word in a way to looks plausible.
(DIR) Post #B13SrPrQChhN2CjEpM by futurebird@sauropods.win
2025-12-09T00:12:35Z
0 likes, 0 repeats
@danilo Is that what the guy in the video is doing?
(DIR) Post #B13TDjPGDJWM1eCI52 by graydon@canada.masto.host
2025-12-09T00:16:35Z
0 likes, 0 repeats
@futurebird I take it as strong evidence for the "intelligence doesn't exist" position; there's a bunch of capabilities, but general intelligence is not a supportable concept. The capabilities are specific and have gaps. (E.g., people are terrible at probability.)It becomes "you get what you reward" and hardly anyone is trying to approximate correctness; they're trying not to be late. Making it easier to not be late is attractive without correctness. "Get it right" is a minority goal. @Moss
(DIR) Post #B13XBLUQZ1rh5RjRJI by futurebird@sauropods.win
2025-12-09T01:01:00Z
0 likes, 0 repeats
@danilo OK but he's saying things about it counting articles (frequency) and when I used the same tool it could not do that accurately. It couldn't even follow a command to restrict the dataset. It do not sound like he used some kind of API to make this kind of task possible.
(DIR) Post #B14IzZkAXQVNcSY3BQ by david_chisnall@infosec.exchange
2025-12-09T08:26:43Z
0 likes, 2 repeats
@dahukanna @futurebird @Moss It’s a shame that it lists summarisation as something LLMs are good at, when all of the studies that measure this show the opposite. LLMs are good at turning text into less text, but summarisation is the process of extracting the key points from text. LLMs will extract things that are shaped in the same way as a statistically large number of key points in the training set but they don’t understand either the text of the document or your context for requesting a summary and so are very likely to discard the thing that you think is most important. They also have a habit of inverting the meaning of sentences when shrinking them.
(DIR) Post #B14Izb9NJCGHyvLh6u by futurebird@sauropods.win
2025-12-09T09:56:43Z
0 likes, 0 repeats
@david_chisnall @dahukanna @Moss Why do I have to write the software guide for Google and Sora?
(DIR) Post #B14Jhy9r7yF653jQIK by futurebird@sauropods.win
2025-12-09T10:04:43Z
0 likes, 0 repeats
@david_chisnall @dahukanna @Moss Likewise the second question is what the guy in the video at the start of the post *thought* he was doing. But, by introducing counting articles into the task it became something else.
(DIR) Post #B14LDgorEBExWhbCpk by futurebird@sauropods.win
2025-12-09T10:21:42Z
0 likes, 0 repeats
@david_chisnall @dahukanna @Moss I'm not an AI prohibitionist or "hater" however I keep finding the effective use case is much much much more narrow than the UI we have been shown to use these tools would suggest. And a lot of people really seem to find it "easier" than searching the web, which, given the current state of the web isn't saying very much. Has web search been broken to push everyone to the chatbots? (adjusting my tin foil cap here)
(DIR) Post #B14LjfoOVc8LfVdKvA by david_chisnall@infosec.exchange
2025-12-09T10:27:27Z
0 likes, 0 repeats
@futurebird @dahukanna @Moss Has web search been broken to push everyone to the chatbots? You don't need much of a tinfoil hat. Google's search metric for measuring internal success for about the past decade has been the exact opposite of what users want. I spend almost zero time on a good search engine: I type the search term, it finds a site that contains the thing I want, I go there. But Google gets money for each ad the show, so they want you to spend more time on the search page. Keeping you on the search page has been the metric.Answers in summaries from search have been part of this, the AI summaries are just the later part.It's not (I think) deliberate that most search results are now slop, that's just an effect of the incentives: it's very cheap to create things that look like useful web pages with LLMs and you get ad revenue every time someone goes to your slop page, so there's an incentive to completely flood the web with this nonsense. This isn't new, except in scale. People were trying to fill Google with low-effort content ('best ten X' lists, for example) to get a cut of the ad revenue for ages.
(DIR) Post #B14LkT7ZB7aIb9yq6i by futurebird@sauropods.win
2025-12-09T10:27:33Z
0 likes, 0 repeats
@david_chisnall @dahukanna @Moss Imagine inventing electricity and you just give people a live wire to play with. "people are killing themselves""but look, some of them use the wire carefully to power cool and useful machines. why are you a hater?"
(DIR) Post #B14Q49LCpTY0rO04TA by f4grx@chaos.social
2025-12-09T11:15:57Z
0 likes, 0 repeats
@futurebird @Moss no, I care.I care that computers were known for being reliable at calculus, which resulted in useful software tools that could be trusted (modulo bugs written by humans). But computers were always RIGHT.Now LLMs have introduced software which is NOT reliable and will lie to you indetectably as a feature.But the output looks cool and thats all that seem to matter.It blows my mind that ANYONE finds this basic fact acceptable!!
(DIR) Post #B14xIu9Atpq6ZG13iq by mikemol@pony.social
2025-12-09T17:28:53Z
0 likes, 0 repeats
@futurebird I mean, they can count to a degree, kinda, but what they really do here is get trained on what functions solve what problems, and then get handed a python environment they can write code into to solve the problem as they understand it.The LLM proper is only a very small piece of what comprises an "AI" now; it's more the orchestrator for specialized subcomponents and callouts.
(DIR) Post #B153tiiIyG5dXzO0O0 by Landa@graz.social
2025-12-09T18:42:15Z
0 likes, 0 repeats
@futurebird if you try a tool and it works badly, you ditch it. But if you tinker with the thing and it kind of works sometimes you‘ll get invested*, believing that with just a bit more effort you‘ll get it to work. Making as many people as possible chase that illusion seems to be those companies‘ current gameplan. 😐*) not everyone of course@david_chisnall @dahukanna @Moss