fsebugoutzone.org:9999

       Post AUFj9lAymlxRF7fUbA by simon@fedi.simonwillison.net
 (DIR) More posts by simon@fedi.simonwillison.net
 (DIR) Post #AUFN7i5D7I3Y4yaTUe by simon@fedi.simonwillison.net
       2023-04-02T16:23:29Z
       
       0 likes, 1 repeats
       
       Think of language models like ChatGPT as a “calculator for words”I wrote about why using ChatGPT as a search engine replacement isn&#39;t the right framing if you want to get the most out of these powerful new toolshttps://simonwillison.net/2023/Apr/2/calculator-for-words/
       
 (DIR) Post #AUFNeyzXGwJc4mnZZ2 by maxtappenden@me.dm
       2023-04-02T16:29:12Z
       
       0 likes, 0 repeats
       
       @simon I don’t like that analogy at all. A calculator shows its work and always produces the same answer for the same input.ChatGPT does neither of these things.
       
 (DIR) Post #AUFNsRqBK5yXJZqOPY by rmcomplexity@mastodon.social
       2023-04-02T16:32:13Z
       
       0 likes, 0 repeats
       
       @simon I like this idea because it deviates from the approach of LLMs work almost as well as another human. But I&#39;m not comfortable trusting GPT&#39;s summarization of something.The way I see it, even if I ask another person to give me a summary, I will follow that with many questions. IME those follow up never go well. Also, I&#39;m never sure if the summary leaves out stuff important to ME but not statistically critical.
       
 (DIR) Post #AUFOfz5cUc3iclgaKu by mergesort@macaw.social
       2023-04-02T16:40:51Z
       
       0 likes, 0 repeats
       
       @simon Something I started thinking about last night, ChatGPT (or equivalent) generally isn&#39;t much better than Google when one search result will suffice, but when I have to string together multiple searches to get to my desired output, LLMs really excel. This can be me brainstorming fun things to do on a vacation or even programming when I&#39;m bouncing between half a dozen StackOverflow pages to patch together a script, the key is not having to open 10 tabs and synthesizing the results myself.
       
 (DIR) Post #AUFOrtM7dXlYJf5BWC by simon@fedi.simonwillison.net
       2023-04-02T16:41:53Z
       
       0 likes, 0 repeats
       
       @rmcomplexity Yeah, summaries are effectively a form of hallucination: the act of deciding what to omit and what to include is inherently lossy, so there&#39;s always a risk that it will make decisions that are either clearly wrong or that you disagree with
       
 (DIR) Post #AUFP2fXDMbYKdVPTY8 by nelson@tech.lgbt
       2023-04-02T16:41:55Z
       
       0 likes, 0 repeats
       
       @simon I think this is a very insightful analogy except for one problem: ChatGPT makes stuff up. You get at this at the end (&quot;work to build an accurate mental model of how they work&quot;). One nice thing about a numerical calculator is it&#39;s very easy to develop a mental model, it doesn&#39;t fabricate stuff.
       
 (DIR) Post #AUFPDlVvTPJsMk6A3k by simon@fedi.simonwillison.net
       2023-04-02T16:43:10Z
       
       0 likes, 0 repeats
       
       @nelson Erk, I write about hallucination so much that I thought I&#39;d mentioned it in there but it looks like I haven&#39;t - adding a sentence or two about that now.
       
 (DIR) Post #AUFPPsKBydAGxpNtXE by simon@fedi.simonwillison.net
       2023-04-02T16:45:42Z
       
       0 likes, 0 repeats
       
       @nelson Added &quot;Language models are also famous for “hallucinating”—for inventing new facts that fit the sentence structure despite having no basis in the underlying data.&quot; and &quot;It’s important to note that there is still a risk of hallucination here, even when you feed it the facts you want it to use. I’ve caught both Bing and Bard adding made-up things in the middle of text that should have been entirely derived from their search results!&quot;
       
 (DIR) Post #AUFPbNiHcD5TV18hZg by simon@fedi.simonwillison.net
       2023-04-02T16:47:44Z
       
       0 likes, 0 repeats
       
       @mergesort I&#39;m really looking forward to the point when a LLM can conduct multi-stage research for me: run a search, then another search, then read several articles, then summarize the resultsIt&#39;s /almost/ there - I&#39;ve managed to get ChatGPT &quot;browse&quot; mode to attempt this - but it&#39;s not reliable enough yet, and the token context length limit really holds back how effective it can beGPT4&#39;s 32,000 token limit - when they make that available - could be really interesting here
       
 (DIR) Post #AUFPpqAcbq8voNHz8q by xek@hachyderm.io
       2023-04-02T16:52:15Z
       
       0 likes, 0 repeats
       
       @simon It is unfortunate that nobody under the age of 50 has ever seen, much less used, a slide rule.  Because they are an even more apt analogy: you have to be careful, and maintain asurprising amount of context beyond the device itself, and also use your best judgment to get good results.(My watch has a circular slide rule on its bezel, which I use constantly for SWAGs.  I rather like it, but it&#39;s pretty niche, and priced accordingly.)
       
 (DIR) Post #AUFQjeufeOqj79RcHo by natedog@fosstodon.org
       2023-04-02T17:03:53Z
       
       0 likes, 0 repeats
       
       @simon I really like &quot;This is reflected in their name: a “language model” implies that they are tools for working with language. That’s what they’ve been trained to do, and it’s language manipulation where they truly excel.&quot;I totally agree! Which is why I think this is a perfect tool for developers!
       
 (DIR) Post #AUFQzfZOyYrLmk89Uu by nelson@tech.lgbt
       2023-04-02T17:06:10Z
       
       0 likes, 0 repeats
       
       @simon thanks! I have been enjoying watching how you engage with ChatGPT and have been doing similar things myself recently. It&#39;s a remarkable tool.
       
 (DIR) Post #AUFUGP97amiBdIVegi by lilianedwards@someone.elses.computer
       2023-04-02T17:43:33Z
       
       0 likes, 0 repeats
       
       @simon thx. I do not know why this search framing has, wrongly, taken off
       
 (DIR) Post #AUFj9lAymlxRF7fUbA by simon@fedi.simonwillison.net
       2023-04-02T20:30:16Z
       
       0 likes, 0 repeats
       
       @maxtappenden yeah the non-repeatability thing is definitely a big flaw in the analogy - I added a note a out that to the end of the postNot showing working is another flaw in the analogy
       
 (DIR) Post #AUFnYnD4lDxKbgqC3s by eliocamp@mastodon.social
       2023-04-02T21:17:25Z
       
       0 likes, 0 repeats
       
       @simon @nelson The &quot;hallucinations&quot; language always feels off to me. Llama are *always* hallucinating, but sometimes their hallucinations are useful and sometimes they aren&#39;t, right?
       
 (DIR) Post #AUHtv88kbxJnvtlwdE by simon@fedi.simonwillison.net
       2023-04-03T21:39:44Z
       
       0 likes, 0 repeats
       
       @andrew @maxtappenden I got the impression from somewhere that the really big LLMs such as GPT-4 aren&#39;t deterministic even with a seed due to the way they get executed across multiple GPUs - but I can&#39;t remember where I got that impression and I don&#39;t have any confidence in it.
       
 (DIR) Post #AhwIODidQGH9ynUzuS by andrew@social.aylett.co.uk
       2023-04-03T19:01:55.221297Z
       
       0 likes, 0 repeats
       
       The model is immutable at the point of use, so (use of a RNG or timing data notwithstanding) the output should be entirely deterministic?   Not letting the caller pick a random seed seems to me to be a design choice.Controlling all sources of entropy is hard if no-one has deliberately decided to try to make a process repeatable, but that doesn’t make the LLM non-deterministic.(Back when I worked on a compiler, we carefully didn’t depend on any randomness – and on a single thread, the same input with the same compiler build should always give the same output.  But change anything and all bets were off)