fsebugoutzone.org:9999

       Post AUNeqiPPUFysBaeZ1M by simon@fedi.simonwillison.net
 (DIR) More posts by simon@fedi.simonwillison.net
 (DIR) Post #AUNaZJozFYAPBp48rQ by simon@fedi.simonwillison.net
       2023-04-06T15:32:06Z
       
       0 likes, 1 repeats
       
       I agree that confabulation/hallucination/lying is a huge problem with LLMs like ChatGPT, Bard etcBut I think a lot of people are underestimating how difficult it is to establish &quot;truth&quot; around most topicsHigh quality news publications have journalists, editors and fact checkers with robust editorial processes... and errors still frequently slip throughExpecting a LLM to perfectly automate that fact checking process just doesn&#39;t seem realistic to me
       
 (DIR) Post #AUNalfB5flOgtuI61Q by simon@fedi.simonwillison.net
       2023-04-06T15:32:41Z
       
       0 likes, 0 repeats
       
       What does feel realistic is training these models to be MUCH better at providing useful indications as to their confidence levelsThe impact of these problems could be greatly reduced if we could counteract the incredibly convincing way that these confabulations are presented somehowI also think there&#39;s a lot of room for improvement here in terms of the way the UI is presented, independent of the models themselves
       
 (DIR) Post #AUNazT5LQhQBRyBauO by billjanssen@writing.exchange
       2023-04-06T15:35:09Z
       
       0 likes, 0 repeats
       
       @simon Indeed.  Eyewitness accounts are notoriously unreliable -- witnesses frequently hallucinate.The key for AI systems is to do some consistency checking, I think.  Run the conclusions through a physics model, for instance.  But that requires understanding, not just translation.#AI
       
 (DIR) Post #AUNbDRC1GJK3dRy9Xk by matt@toot.cafe
       2023-04-06T15:36:44Z
       
       0 likes, 0 repeats
       
       @simon What I disagree with are people who say that LLMs are _designed_ to produce bullshit; I recall someone saying that, particularly in an educational context, the one lesson we need to be teaching is to not use LLMs for that reason. I&#39;m not so sure we should write them off like that. It&#39;s a new tool; we should look for the good things it can be used for as well as acknowledging the bad.
       
 (DIR) Post #AUNbddOEXK2qzcAq36 by marcoshuerta@vmst.io
       2023-04-06T15:37:52Z
       
       0 likes, 0 repeats
       
       @simon The lack of tying the information back to the source seems a bigger problem. Google results for better or worse I can scrutinize the site the info comes from. LLMs invent references if asked and otherwise don’t surface where/how in its vast training data it has stiched together its answers.
       
 (DIR) Post #AUNbqyRhYvCy3r9mls by simon@fedi.simonwillison.net
       2023-04-06T15:39:37Z
       
       0 likes, 0 repeats
       
       @matt I&#39;m struggling with this: on the one hand, based on my own experience I think LLMs are one of the most powerful tools for self-learning I&#39;ve ever encountered - but I don&#39;t know how to teach people to use them productively for that thanks to the hallucination problem
       
 (DIR) Post #AUNc2U7W8bJDU1CkRE by John@socks.masto.host
       2023-04-06T15:39:48Z
       
       0 likes, 0 repeats
       
       @simon It would be nice to have a confidence level come out of the black box, but I&#39;m not sure how we do it with a box that doesn&#39;t know what truth is.There is no truth, there is only probability with respect to a training set.
       
 (DIR) Post #AUNcDbxRL3EYyZnRWi by simon@fedi.simonwillison.net
       2023-04-06T15:41:10Z
       
       0 likes, 0 repeats
       
       @marcoshuerta that can be addressed to an extent using the trick Bing and Bard implement, where the langauge model can run searches against a real search engine and provide citation links back to those sources (Bing does this much better than Bard at the moment)
       
 (DIR) Post #AUNcOfGFOgIEPGrdRY by dys_morphia@sfba.social
       2023-04-06T15:41:15Z
       
       0 likes, 0 repeats
       
       @simon Yes, it’s hard for humans, too. Even when I’m writing about strictly technical topic where truth should be objective and easy to verify, there is still a huge amount of judgement. Sometimes I have to generalize to introduce a new topic, or give just part of the explanation that’s appropriate for a certain audience because the full truth would only cause confusion. An expert reading the thing written for a novice might object it’s inaccurate. So many judgement calls
       
 (DIR) Post #AUNcOgN1GzIvqZ2Ydc by dys_morphia@sfba.social
       2023-04-06T15:47:32Z
       
       0 likes, 0 repeats
       
       @simon at one job (as a technical writer) I had to write a “truthiness standard” because I opened documentation authoring up to the whole company and quickly learned my internalized judgement calls about determining the accuracy of technical sources were in no way intuitive to most people. And again, this is technical information where (theoretically) it is possible to verify objective facts. If we get into more fuzzy areas where humans don’t agree? So much harder.
       
 (DIR) Post #AUNcb7RcRC69I67cLg by carlton@fosstodon.org
       2023-04-06T15:41:35Z
       
       0 likes, 0 repeats
       
       @simon they’re entirely *generative* no? (They construct any answer as they go, is that right?) That doesn’t seem even lined up for truth, which on most accounts has required at least some element of “checking to see”. 🤔
       
 (DIR) Post #AUNcb9Z8YDIfrzeeLQ by carlton@fosstodon.org
       2023-04-06T15:44:36Z
       
       0 likes, 0 repeats
       
       @simon i.e. without that extra step, what would a confidence assessment look like?
       
 (DIR) Post #AUNclrczaEvBQEZ6XY by simon@fedi.simonwillison.net
       2023-04-06T15:42:44Z
       
       0 likes, 0 repeats
       
       @carlton it&#39;s weird how good they are at &quot;truth&quot; though - and how much they&#39;ve improved. GPT-4 makes things up far less frequently than GPT-3 in my experienceTurns out statistics can get you a really long way!
       
 (DIR) Post #AUNcyqASK9PaeqPBK4 by simon@fedi.simonwillison.net
       2023-04-06T15:44:16Z
       
       0 likes, 0 repeats
       
       @dys_morphia I just helped a journalist fact check a story and it really helped emphasize to me how subtle this stuff is, and how much care you have to take even over individual words - &quot;much more likely&quot; vs &quot;more likely&quot; for example
       
 (DIR) Post #AUNdFK7yhWgUD76dLk by PrincexOfCups@mastodon.xyz
       2023-04-06T15:49:11Z
       
       0 likes, 0 repeats
       
       @simon this isn&#39;t the whole story.Some types of fact checking are hard. But ChatGPT&#39;s results include things such as made up URLs and citations.It is much easier to establish the *provenance* of isolated facts like this, than to establish the accuracy of much wider statements.Further, checking that accuracy has correctly been established is not much easier than establishing accuracy in the first place. Validating provenance claims are correct is even easier than making them correct.
       
 (DIR) Post #AUNdR7RPh9ZuL4oydE by sophieschmieg@infosec.exchange
       2023-04-06T15:49:36Z
       
       0 likes, 0 repeats
       
       @simon I&#39;m not sure how possible this is. They try to imitate texts written by experts, and they do so quite well, but *accuracy* isn&#39;t really their objective function, *style* is.
       
 (DIR) Post #AUNdRHJH0H0awj5cKe by sophieschmieg@infosec.exchange
       2023-04-06T15:51:27Z
       
       0 likes, 0 repeats
       
       @simon we have a ton of machine learning done to train for accuracy, and it can do quite well. It just doesn&#39;t present the answer in the style of a confident expert, which is the thing you need to go viral.
       
 (DIR) Post #AUNdcxiTg3MFeYRy8O by jonn@social.doma.dev
       2023-04-06T15:54:01Z
       
       0 likes, 0 repeats
       
       @simon but can we also please reflect upon the fact that banning technology on the grounds that some people fail to understand the disclaimer that LLMs generate false information sometimes and not banning, I don&#39;t know, Facebook or YouTube that do not provide a warning that much of the content there is spreading misinformation is beyond inconsistent.But also the notion of &quot;banning&quot; technology without public hearing is a bit cringe to say the least. Kinda anti-democratic and censor-y.I really appreciate how moderate your response is to the ongoing fearmongering, I&#39;ll keep being more direct because freedoms are at stake.
       
 (DIR) Post #AUNdqS0oLkl9NVSPmC by basler@mastodon.social
       2023-04-06T15:57:59Z
       
       0 likes, 0 repeats
       
       @simon ... Wholeheartedly agree! Even citations of some of the data points they used to formulate their response would be 👨🏻‍🍳😘👌
       
 (DIR) Post #AUNeIzGwHblkdnbm0u by simon@fedi.simonwillison.net
       2023-04-06T16:03:45Z
       
       0 likes, 0 repeats
       
       @dys_morphia that&#39;s fascinating! I&#39;d love to read a document like that
       
 (DIR) Post #AUNeUVkNslZKXssN2O by simon@fedi.simonwillison.net
       2023-04-06T16:05:01Z
       
       0 likes, 0 repeats
       
       @PrincexOfCups yeah, training it NOT to invent citations and URLs does feel to me like it should be an easier problem - I would hope that techniques along the lines of RLHF should be able to discourage it from doing that
       
 (DIR) Post #AUNeqiPPUFysBaeZ1M by simon@fedi.simonwillison.net
       2023-04-06T16:09:10Z
       
       0 likes, 0 repeats
       
       @jon completely agree: this is a disaster, and there&#39;s a lot of room for urgently needed improvementI am seeing evidence of that work taking place: I have tried plenty of prompts that confabulate wildly on GPT-3.5 but return correct information on GPT-4
       
 (DIR) Post #AUNf1UtI5SnRRhlrKy by simon@fedi.simonwillison.net
       2023-04-06T16:10:37Z
       
       0 likes, 0 repeats
       
       @basler Bing and the ChatGPT alpha of &quot;browsing&quot; mode both do a much better job of that - they can run searches for information to use in their answers, then show the pages that they used
       
 (DIR) Post #AUNfD4iTEGlRk4EM9g by dvogel@mastodon.social
       2023-04-06T16:11:43Z
       
       0 likes, 0 repeats
       
       @simon Since we&#39;re discussing a potential change in mode without changing the goal of having reliable information, the degree of difference in quality is what matters. The difference between the NYT and ChatGPT is the same difference between the NYT and the The Register. Notable here because many fewer people would describe The Register as having a goal of providing reliable information, even though they rely on the same editorial mode as NYT.
       
 (DIR) Post #AUNfUJF0TnNRiFfQ92 by brianonbarrington@mastodon.world
       2023-04-06T16:18:33Z
       
       0 likes, 0 repeats
       
       @simon  LLMs are perfect digital postmodernism, where reality is in the eye of the algorithm rather than in the empiricism of the physical world.
       
 (DIR) Post #AUNfgGG1OlThvyqC6S by simon@fedi.simonwillison.net
       2023-04-06T16:18:42Z
       
       0 likes, 0 repeats
       
       @dvogel really that&#39;s a fundamental challenge here: we&#39;ve learned to associate quality of writing with quality of fact-checking, and language models can produce immaculate writing about completely fake information
       
 (DIR) Post #AUNg8KubgbouTjDAw4 by basler@mastodon.social
       2023-04-06T16:30:36Z
       
       0 likes, 0 repeats
       
       @simon ... Should be the default experience?
       
 (DIR) Post #AUNgMiSmv5ZvZl3YPo by joelanman@hachyderm.io
       2023-04-06T16:37:01Z
       
       0 likes, 0 repeats
       
       @simon Also how they&#39;re marketed. This is a big flaw if you need something to be accurate. So they should be marketed as what they are
       
 (DIR) Post #AUNgYKbBTQ6d2IWQSm by PrincexOfCups@mastodon.xyz
       2023-04-06T16:37:02Z
       
       0 likes, 0 repeats
       
       @simon perhaps a LLM which still makes stuff up wildly but which always has accurate citations to back its claims is just too much of an infohazard?
       
 (DIR) Post #AUNi8ffEjL4ukSJFku by simon@fedi.simonwillison.net
       2023-04-06T16:56:53Z
       
       0 likes, 0 repeats
       
       @joelanman there&#39;s a footer on every page of ChatGPT that says &quot;ChatGPT may produce inaccurate information about people, places, or facts&quot;Increasingly clear that no-one reads that though! Or at least understands the full implications of it Breaking people away from 50+ years of science fiction ideas about what AI is turns out to be very difficult
       
 (DIR) Post #AUNiL8c6vEsg0bnAhc by simon@fedi.simonwillison.net
       2023-04-06T16:57:27Z
       
       0 likes, 0 repeats
       
       @PrincexOfCups I&#39;ve caught both Bing and Bard using cited data but also still slipping in a hallucination among the facts
       
 (DIR) Post #AUNiXaOfS0pveTRKvg by dbreunig@note.computer
       2023-04-06T16:44:07Z
       
       0 likes, 0 repeats
       
       @simon The problem is so much of the construction of the model is that frequency of occurrence = trust. That and ChatGPT (in particular) presents itself as intelligence. It&#39;s right in the name (AI). If it pretends to be human enough to pass the average user&#39;s personal Turing Test and it&#39;s sold and positioned as intelligence, there&#39;s going to be a problem.
       
 (DIR) Post #AUNiXbTJSE98zAcYoC by simon@fedi.simonwillison.net
       2023-04-06T16:58:18Z
       
       0 likes, 0 repeats
       
       @dbreunig I really wish we could break away from the term AI but I very much doubt that&#39;s possible at this point, at least for society in general
       
 (DIR) Post #AUNj6AteFhXshQQLse by joelanman@hachyderm.io
       2023-04-06T17:07:32Z
       
       0 likes, 0 repeats
       
       @simon True but that&#39;s literally the small print :) what does the marketing both formal and informal say? That it&#39;s intelligent, it knows things, you can learn things from it, better than search engines and so on
       
 (DIR) Post #AUNjJnfLQ7BihlEMOu by carlton@fosstodon.org
       2023-04-06T17:09:11Z
       
       0 likes, 0 repeats
       
       @simon absolutely! They’re amazing frankly. I’m curious how we embed “I’m this likely to be right” though. Stats all the way down?
       
 (DIR) Post #AUNkuQrVGdL8eGwupM by simon@fedi.simonwillison.net
       2023-04-06T17:26:21Z
       
       0 likes, 0 repeats
       
       @joelanman most people don&#39;t learn about these tools from the official marketing though: they hear about it through word of mouth, or breathless press reporting, or hype on social mediaHard to know how to fix that!
       
 (DIR) Post #AUNl89cMB5sDDjPKro by lmorchard@hackers.town
       2023-04-06T17:28:04Z
       
       0 likes, 0 repeats
       
       @simon Yeah, I just keep thinking what a mistake it is to try deploying these tools for anything other than creative lubrication. Like really fancy Oblique Strategies decks.
       
 (DIR) Post #AUNlLGpafaWtMCqsro by joelanman@hachyderm.io
       2023-04-06T17:30:24Z
       
       0 likes, 0 repeats
       
       @simon Yeah that&#39;s what I meant by informal, I&#39;m sure openAI do not discourage/correct it at the least
       
 (DIR) Post #AUNljkkR5iNL5o7CaG by enkiv2@eldritch.cafe
       2023-04-06T17:31:55Z
       
       0 likes, 0 repeats
       
       @simon I don&#39;t think anybody expects an LLM to automatically fact-check; the typical complaint is that, if somebody stupidly replaces a human-who-fact-checks with an LLM that doesn&#39;t, fact checking is no longer done (and &quot;we&#39;re using an LLM&quot; is not a good excuse for stopping).LLMs also aren&#39;t capable of performing minimal checks of internal consistency, which makes them strictly worse than expert systems. Like, GIGO applies to both LLMs and expert systems, but LLMs produce new garbage by design while expert systems only produce new garbage as the result of a bug.
       
 (DIR) Post #AUNlyQ4kEVjTK1bkwK by simon@fedi.simonwillison.net
       2023-04-06T17:32:51Z
       
       0 likes, 0 repeats
       
       @joelanman every formal statement I&#39;ve seen from them does warn about this, but it&#39;s honestly like shouting into a storm at this point
       
 (DIR) Post #AUNm9qbVwOcsmgNvJQ by sayrer@mastodon.social
       2023-04-06T17:32:10Z
       
       0 likes, 0 repeats
       
       @lmorchard @simon They&#39;re really good at any syntactic task. I use it to write Rust macros sometimes. I don&#39;t use the exact code most of the time, but it demonstrates the tricky parts.
       
 (DIR) Post #AUNm9sndmHVxas4dUW by simon@fedi.simonwillison.net
       2023-04-06T17:34:25Z
       
       0 likes, 0 repeats
       
       @sayrer @lmorchard that&#39;s what makes them so beguiling: I keep on learning new things that they&#39;re good at... and new things that they are terrible at too
       
 (DIR) Post #AUNmYpkcFZINNTYZ3Q by sayrer@mastodon.social
       2023-04-06T17:46:20Z
       
       0 likes, 0 repeats
       
       @simon @lmorchard the other thing that&#39;s really interesting about them is that the interface keeps context. So I have stuff like “Do the same task, but use a nested loop to reduce repetition” in there. The voice cylinders have a hard time distinguishing whether a phrase is a new query or a continuation.
       
 (DIR) Post #AUNmmsI1bixTJ1sWum by pettter@mastodon.acc.umu.se
       2023-04-06T17:47:27Z
       
       0 likes, 0 repeats
       
       @simon that would require you to have a fairly great NLP system to detect asserted facts and so on because LLMs aren&#39;t more or less &quot;sure&quot; about any specific long sections of text, generally. They sample from a distribution of probable continuations of some specific input text.
       
 (DIR) Post #AUNn5U5rnAv3lV47BA by joelanman@hachyderm.io
       2023-04-06T17:52:17Z
       
       0 likes, 0 repeats
       
       @simon I&#39;ve seen a lot of things like this which don&#39;t really touch on the limitationshttps://www.forbes.com/sites/alexkonrad/2023/02/03/exclusive-openai-sam-altman-chatgpt-agi-google-search/?sh=47dcda796a63
       
 (DIR) Post #AUNnInJBfAm77idtyq by misc@mastodon.social
       2023-04-06T17:52:49Z
       
       0 likes, 0 repeats
       
       @simon I think this touches on a deeper problem. In some ways, I&#39;d argue, an LLM that confabulates is preferable to one that falsely ordains a singular truth (dangerous) or compulsively hedges (annoying, not useful).
       
 (DIR) Post #AUNnVrz1arWrO2RkeG by eax@ioc.exchange
       2023-04-06T17:53:41Z
       
       0 likes, 0 repeats
       
       @simon the answer to your inquiry was answered in the seminal paper by J. Searle: The Chinese RoomReference: https://rintintin.colorado.edu/~vancecd/phil201/Searle.pdf
       
 (DIR) Post #AUNwGEEG13h7KoBiqm by NIH_LLAMAS@mastodon.social
       2023-04-06T19:34:48Z
       
       0 likes, 0 repeats
       
       @simon Especially with so many fact checks needing to evaluate propaganda or myths which have evolved to sound true, play word games on us, etc.
       
 (DIR) Post #AUO0g5iSglToHJu5ho by SilverMoose@universeodon.com
       2023-04-06T20:24:17Z
       
       0 likes, 0 repeats
       
       @simon Confidence level indication is not sufficient. Most people either have no clue how to use such indications, or choose not to expend the substantial effort needed to do so.How about some hard constraints:NEVER make up a citation.NEVER make up a quote.NEVER state anything as fact without being able to provide an accurate citation.NEVER mis-attribute any quote or statement of fact.Absent appropriate hard constraints, systems like ChatGPT produce little more than verbal vomit -- masquerading as filet mignon.
       
 (DIR) Post #AUO9H7qGXyGXDfiy5A by RevPancakes@hachyderm.io
       2023-04-06T21:59:45Z
       
       0 likes, 0 repeats
       
       @simon Well I think that’s just it we don’t usually treat information as immediately true/false rather we have more or less confidence based on the **processes** in place. E.g I’d still rather take my coding problems to stackoverflow because votes and multiple (independent but cross referencing) answers  allow me to exercise a kind of judgement you can’t with a take it or leave it bot output
       
 (DIR) Post #AUOEidemrqwlTtIHRo by gvelez17@mas.to
       2023-04-06T22:56:39Z
       
       0 likes, 0 repeats
       
       @simon fwiw i asked gpt4 how to do it and got a detailed answer involving gnn’s - im interested in working on this with serious folk (or not so serious is ok too...)
       
 (DIR) Post #AUOGC0GC0YJjw6M86y by Rycaut@mastodon.social
       2023-04-06T23:11:01Z
       
       0 likes, 0 repeats
       
       @simon the problem is deeper - most questions/searches people make do NOT have a single answer. Yet search engines and people all over want there to be ONE “answer”. Google in particular is really bad here and I think it is related to LLMs as well - that there is an assumption that there should (and could) be a single answer to any query and not, in fact. Lots of answers that vary by context with many questions having no easy to summarize answer at all even with a lot of context
       
 (DIR) Post #AUOSgRBPflAPZcOnJY by nottrobin@union.place
       2023-04-07T01:38:08Z
       
       0 likes, 0 repeats
       
       @simonI feel this is a bit of a false equivalence. Firstly, I think that LLMs&#39; fact checking is currently significantly worse than human fact checking, even though it fails sometimes. But also, I think the standard to which we hold technological automated solutions should be much higher. And this isn&#39;t a technophobic argument.
       
 (DIR) Post #AUOSgSG3fyTcuJa1C4 by nottrobin@union.place
       2023-04-07T01:38:08Z
       
       0 likes, 0 repeats
       
       @simonWhen we automate a solution the amount of reliance we then place on that automation increases exponentially. People know humans are flawed, people expect computers to be reliable.And I do think it&#39;s important to be cautious in claiming that technology can solve problems that they maybe just can&#39;t yet. There are tons of examples of things that &quot;just worked&quot; in the past that are now much less reliable because of complex software stacks etc.Driverless cars also come to mind...
       
 (DIR) Post #AUOVH3sdTUbtvLCvCa by simon@fedi.simonwillison.net
       2023-04-07T02:07:36Z
       
       0 likes, 0 repeats
       
       @nottrobin I think LLMs fact checking isn&#39;t just significantly worst than humans, it&#39;s almost entirely non-existent!I&#39;m mainly arguing here against the idea that it should be an easy fix for LLMs to always tell the truth - much as I wish that were the case
       
 (DIR) Post #AUOggi8E3aFBUrvsjw by piccolbo@toot.community
       2023-04-07T04:09:39Z
       
       0 likes, 0 repeats
       
       @simon It looks like it may be possible: https://arxiv.org/pdf/2212.03827.pdf (via Sam Bowman). In short, it&#39;s possible to extract a truth prob. out of the internal state of an LM by fitting a model to be consistent wrt negation f(LMstate(A)) = 1 - f(LMstate(not A)) The LM appears to be implicitly estimating the truth prob of a sentence even if it considers false continuations to be more likely. If confirmed, it may allow the unsupervised creation of a truth-labeled training set.
       
 (DIR) Post #AUOhKTuKMSNexsObRY by glyph@mastodon.social
       2023-04-07T04:22:35Z
       
       0 likes, 0 repeats
       
       @simon @matt when I say that LLMs are designedto emit bullshit, I don&#39;t mean it pejoratively. I realize that it might sound that way because I do, also, deeply dislike them. But how else would you describe the training criteria for the outputs they produce? &quot;Plausible sounding, gramatically correct English prose&quot; is inherently bullshit. It&#39;s not fiction, because it is not evaluated for its falsity one way or another, only its relevance. Beyond &quot;text&quot;, what else would you call it?
       
 (DIR) Post #AUOwHRkDRh9IhRg2nQ by alaric@mastodon.social
       2023-04-07T07:10:08Z
       
       0 likes, 0 repeats
       
       @simon that&#39;s the problem of unreliable source data, but the particular problem with LLMs is that they will extrapolate widely from source data to answer a question rather than say &quot;I don&#39;t know&quot;, and the nature of the LLM architecture means they don&#39;t really know they&#39;re doing it because their memory is made of trends rather than data points. We need a different (possibly not based on a neural network) architecture to work with data points instead of trends.
       
 (DIR) Post #AUPOs7FrNz5aaEkhG4 by ErikJonker@mastodon.social
       2023-04-07T12:30:24Z
       
       0 likes, 0 repeats
       
       @simon fair point, errors will remain, also because of errors in the training set, bringing the level of hallucinations down would be nice however
       
 (DIR) Post #AUPP4YyW9FvS20ZkPI by IndrekIbrus@mastodon.social
       2023-04-07T12:31:01Z
       
       0 likes, 0 repeats
       
       @simon More of semantic web is needed to eliminate this. But this takes time.
       
 (DIR) Post #AUPPlQPfrDOJveiS7U by mazdam@c18.masto.host
       2023-04-07T12:40:23Z
       
       0 likes, 0 repeats
       
       @simon I think the concern is not the possibility of error coming from the black box, but the likelihood that all the people, institutions, cultures of truth seeking would be displaced
       
 (DIR) Post #AUPQBoWepV3mF4Z88e by afamiglietti79@mastodon.social
       2023-04-07T12:45:07Z
       
       0 likes, 0 repeats
       
       @simon I agree but would go one further. It&#39;s not just a technical fix, we&#39;ve got to educate people out of the mindset that it&#39;s possible to type a question in a box and get back &quot;truth&quot; in a neat package, easy to read, and less than 300 words...
       
 (DIR) Post #AUPRiMB3LisNpbXeJU by erikavaris@mas.to
       2023-04-07T13:02:12Z
       
       0 likes, 0 repeats
       
       @simon this reminds me of another convo where someone was saying that what we really needed from a speech synth model was the ability to modify its emotional quality on a granular level, so you could build up gradually to angry or sad or whatever from neutral, and it was like: yeah, you want to *direct* the acting. Like you do with a human. Both acting and directing are hard tasks for humans.
       
 (DIR) Post #AUPSX0f56Ud3s1jy3k by simon@fedi.simonwillison.net
       2023-04-07T13:11:35Z
       
       0 likes, 0 repeats
       
       @mazdam I very much share that concern - it&#39;s one of the reasons I think it&#39;s essential we help people understand the actual capabilities of LLMs
       
 (DIR) Post #AUPSiltTpsJ1Xn5GDI by afamiglietti79@mastodon.social
       2023-04-07T12:51:10Z
       
       0 likes, 0 repeats
       
       @simon on the technical side though, I can&#39;t help but wonder, if we want accuracy and accountability, why have an LLM generate novel language at all? Why not do an embedding search of sources and just return excerpts (though maybe LLM summary could be useful here)
       
 (DIR) Post #AUPSimajF4tthvJncm by simon@fedi.simonwillison.net
       2023-04-07T13:13:06Z
       
       0 likes, 0 repeats
       
       @afamiglietti79 there are all kinds of ways I&#39;m finding LLMs useful that are dependent on their ability to generate completely new output - here&#39;s a recent example https://til.simonwillison.net/gpt3/gpt4-api-design
       
 (DIR) Post #AUPSy4YMk6INl7VIYa by joshuafoust@appdot.net
       2023-04-07T13:16:21Z
       
       0 likes, 0 repeats
       
       @simon isn’t this am argument AGAINST deploying them in every app, search engine, and communication tool? Why is this being rushed to market when it’s makers have no way to even guess how accurate it is with its output?
       
 (DIR) Post #AUPTmd03p8JyDkhdvk by Riedl@sigmoid.social
       2023-04-07T13:25:14Z
       
       0 likes, 0 repeats
       
       @simon There is nothing inherently wrong with failing to achieve the high bar of “truth” _if_ ChatGPT, Bing, and Bard are used appropriately.The problem is that people have always ascribed perfection, logicalness, and superiority to AI systems. It’s innate in corp marketing (new Bing must be better than old Bing, right?)Explainable AI can help to “calibrate trust”. We have preliminary evidence this can be done https://arxiv.org/abs/2204.07693
       
 (DIR) Post #AUPU0WoVcZGZ5XvabI by afamiglietti79@mastodon.social
       2023-04-07T13:26:48Z
       
       0 likes, 0 repeats
       
       @simon oh, I agree, there are uses for LLMs! Lots of them! I just wonder if the search engine like function people are eager to use them for (and encouraged to use them for) is in fact their best use
       
 (DIR) Post #AUPUbbxtQnnXmP02uO by mapto@qoto.org
       2023-04-07T13:34:26Z
       
       0 likes, 0 repeats
       
       @simon @carlton but statistics cannot give you intent. And without intent, it is just chatter
       
 (DIR) Post #AUPXqMuNzdmZHy5V1E by simon@fedi.simonwillison.net
       2023-04-07T14:10:56Z
       
       0 likes, 0 repeats
       
       @afamiglietti79 it&#39;s definitely not! I think that&#39;s the most important (and hardest) lesson for people to learn: this thing is a terrible search engine for many things - but identifying which things it&#39;s good at searching for and which it&#39;s useless at is highly unintuitive