Post AUFj9lAymlxRF7fUbA by simon@fedi.simonwillison.net
 (DIR) More posts by simon@fedi.simonwillison.net
 (DIR) Post #AUFN7i5D7I3Y4yaTUe by simon@fedi.simonwillison.net
       2023-04-02T16:23:29Z
       
       0 likes, 1 repeats
       
       Think of language models like ChatGPT as a “calculator for words”I wrote about why using ChatGPT as a search engine replacement isn't the right framing if you want to get the most out of these powerful new toolshttps://simonwillison.net/2023/Apr/2/calculator-for-words/
       
 (DIR) Post #AUFNeyzXGwJc4mnZZ2 by maxtappenden@me.dm
       2023-04-02T16:29:12Z
       
       0 likes, 0 repeats
       
       @simon I don’t like that analogy at all. A calculator shows its work and always produces the same answer for the same input.ChatGPT does neither of these things.
       
 (DIR) Post #AUFNsRqBK5yXJZqOPY by rmcomplexity@mastodon.social
       2023-04-02T16:32:13Z
       
       0 likes, 0 repeats
       
       @simon I like this idea because it deviates from the approach of LLMs work almost as well as another human. But I'm not comfortable trusting GPT's summarization of something.The way I see it, even if I ask another person to give me a summary, I will follow that with many questions. IME those follow up never go well. Also, I'm never sure if the summary leaves out stuff important to ME but not statistically critical.
       
 (DIR) Post #AUFOfz5cUc3iclgaKu by mergesort@macaw.social
       2023-04-02T16:40:51Z
       
       0 likes, 0 repeats
       
       @simon Something I started thinking about last night, ChatGPT (or equivalent) generally isn't much better than Google when one search result will suffice, but when I have to string together multiple searches to get to my desired output, LLMs really excel. This can be me brainstorming fun things to do on a vacation or even programming when I'm bouncing between half a dozen StackOverflow pages to patch together a script, the key is not having to open 10 tabs and synthesizing the results myself.
       
 (DIR) Post #AUFOrtM7dXlYJf5BWC by simon@fedi.simonwillison.net
       2023-04-02T16:41:53Z
       
       0 likes, 0 repeats
       
       @rmcomplexity Yeah, summaries are effectively a form of hallucination: the act of deciding what to omit and what to include is inherently lossy, so there's always a risk that it will make decisions that are either clearly wrong or that you disagree with
       
 (DIR) Post #AUFP2fXDMbYKdVPTY8 by nelson@tech.lgbt
       2023-04-02T16:41:55Z
       
       0 likes, 0 repeats
       
       @simon I think this is a very insightful analogy except for one problem: ChatGPT makes stuff up. You get at this at the end ("work to build an accurate mental model of how they work"). One nice thing about a numerical calculator is it's very easy to develop a mental model, it doesn't fabricate stuff.
       
 (DIR) Post #AUFPDlVvTPJsMk6A3k by simon@fedi.simonwillison.net
       2023-04-02T16:43:10Z
       
       0 likes, 0 repeats
       
       @nelson Erk, I write about hallucination so much that I thought I'd mentioned it in there but it looks like I haven't - adding a sentence or two about that now.
       
 (DIR) Post #AUFPPsKBydAGxpNtXE by simon@fedi.simonwillison.net
       2023-04-02T16:45:42Z
       
       0 likes, 0 repeats
       
       @nelson Added "Language models are also famous for “hallucinating”—for inventing new facts that fit the sentence structure despite having no basis in the underlying data." and "It’s important to note that there is still a risk of hallucination here, even when you feed it the facts you want it to use. I’ve caught both Bing and Bard adding made-up things in the middle of text that should have been entirely derived from their search results!"
       
 (DIR) Post #AUFPbNiHcD5TV18hZg by simon@fedi.simonwillison.net
       2023-04-02T16:47:44Z
       
       0 likes, 0 repeats
       
       @mergesort I'm really looking forward to the point when a LLM can conduct multi-stage research for me: run a search, then another search, then read several articles, then summarize the resultsIt's /almost/ there - I've managed to get ChatGPT "browse" mode to attempt this - but it's not reliable enough yet, and the token context length limit really holds back how effective it can beGPT4's 32,000 token limit - when they make that available - could be really interesting here
       
 (DIR) Post #AUFPpqAcbq8voNHz8q by xek@hachyderm.io
       2023-04-02T16:52:15Z
       
       0 likes, 0 repeats
       
       @simon It is unfortunate that nobody under the age of 50 has ever seen, much less used, a slide rule.  Because they are an even more apt analogy: you have to be careful, and maintain asurprising amount of context beyond the device itself, and also use your best judgment to get good results.(My watch has a circular slide rule on its bezel, which I use constantly for SWAGs.  I rather like it, but it's pretty niche, and priced accordingly.)
       
 (DIR) Post #AUFQjeufeOqj79RcHo by natedog@fosstodon.org
       2023-04-02T17:03:53Z
       
       0 likes, 0 repeats
       
       @simon I really like "This is reflected in their name: a “language model” implies that they are tools for working with language. That’s what they’ve been trained to do, and it’s language manipulation where they truly excel."I totally agree! Which is why I think this is a perfect tool for developers!
       
 (DIR) Post #AUFQzfZOyYrLmk89Uu by nelson@tech.lgbt
       2023-04-02T17:06:10Z
       
       0 likes, 0 repeats
       
       @simon thanks! I have been enjoying watching how you engage with ChatGPT and have been doing similar things myself recently. It's a remarkable tool.
       
 (DIR) Post #AUFUGP97amiBdIVegi by lilianedwards@someone.elses.computer
       2023-04-02T17:43:33Z
       
       0 likes, 0 repeats
       
       @simon thx. I do not know why this search framing has, wrongly, taken off
       
 (DIR) Post #AUFj9lAymlxRF7fUbA by simon@fedi.simonwillison.net
       2023-04-02T20:30:16Z
       
       0 likes, 0 repeats
       
       @maxtappenden yeah the non-repeatability thing is definitely a big flaw in the analogy - I added a note a out that to the end of the postNot showing working is another flaw in the analogy
       
 (DIR) Post #AUFnYnD4lDxKbgqC3s by eliocamp@mastodon.social
       2023-04-02T21:17:25Z
       
       0 likes, 0 repeats
       
       @simon @nelson The "hallucinations" language always feels off to me. Llama are *always* hallucinating, but sometimes their hallucinations are useful and sometimes they aren't, right?
       
 (DIR) Post #AUHtv88kbxJnvtlwdE by simon@fedi.simonwillison.net
       2023-04-03T21:39:44Z
       
       0 likes, 0 repeats
       
       @andrew @maxtappenden I got the impression from somewhere that the really big LLMs such as GPT-4 aren't deterministic even with a seed due to the way they get executed across multiple GPUs - but I can't remember where I got that impression and I don't have any confidence in it.
       
 (DIR) Post #AhwIODidQGH9ynUzuS by andrew@social.aylett.co.uk
       2023-04-03T19:01:55.221297Z
       
       0 likes, 0 repeats
       
       The model is immutable at the point of use, so (use of a RNG or timing data notwithstanding) the output should be entirely deterministic?   Not letting the caller pick a random seed seems to me to be a design choice.Controlling all sources of entropy is hard if no-one has deliberately decided to try to make a process repeatable, but that doesn’t make the LLM non-deterministic.(Back when I worked on a compiler, we carefully didn’t depend on any randomness – and on a single thread, the same input with the same compiler build should always give the same output.  But change anything and all bets were off)