Post AbPhkUAx3NZTkXLoFk by simon@fedi.simonwillison.net
(DIR) More posts by simon@fedi.simonwillison.net
(DIR) Post #AbP9BjLRGZDkDdVnl2 by simon@fedi.simonwillison.net
2023-11-02T18:39:05Z
0 likes, 1 repeats
New LLM paper highlighting quite how weird and ridiculous these things are https://arxiv.org/abs/2307.11760Adding "it's important to my career" can produce better results, across every model they tested!
(DIR) Post #AbP9NsoBuEdh2rktcG by MattHodges@mastodon.social
2023-11-02T18:41:35Z
0 likes, 0 repeats
@simon I was messing with GPT4 last week and found that if I told it, "you are an expert at X" its responses about X suddenly got better.
(DIR) Post #AbP9cHY6iZNXpa6A1g by SnoopJ@hachyderm.io
2023-11-02T18:42:05Z
0 likes, 0 repeats
@simon yeesh
(DIR) Post #AbPAfQwuKE7To7lUw4 by evan@cosocial.ca
2023-11-02T18:55:47Z
0 likes, 0 repeats
@simon you'd think the Stack Overflow training data would result in responses like "we are not here to do your homework" or "you can pay my regular consulting rate"
(DIR) Post #AbPArpaUXu7DhIBxGC by pbrane@sigmoid.social
2023-11-02T18:56:14Z
0 likes, 0 repeats
@simon "I will get fired if you screw up, and then I can't feed my family"
(DIR) Post #AbPCcIdWyTQL4AKqA4 by doctorambient@mastodon.social
2023-11-02T19:17:39Z
0 likes, 0 repeats
@simon That is right up there with the performance bumps that come in doing math problems by adding "take a deep breath." ๐
(DIR) Post #AbPDtZYVQIoMPsmjCa by zellyn@hachyderm.io
2023-11-02T19:31:58Z
0 likes, 0 repeats
@simon I suspect that it's because these things are trained on the internet, which is 90% bad takes on things! Any kind of signifier that the answer is likely to be "expert" or "well considered" is thus likely to bias towards better answers.I would expect: "I found this answer on a forum where only licensed medical doctors can post:" to have a positive effect.And possibly, "Please don't reply unless you have personal experience with this problem: I've had enough answers that were guesses!"
(DIR) Post #AbPEQOcn5KOs4TRNuS by simon@fedi.simonwillison.net
2023-11-02T19:38:08Z
0 likes, 0 repeats
@MattHodges I imagine that works just as well for "I'm an expert about X" - it's a signal for the level of information it should provideThere's a similar thing called "sandbagging" described as "where models are more likely to endorse common misconceptions when their user appears to be less educated" https://simonwillison.net/2023/Apr/5/sycophancy-sandbagging/
(DIR) Post #AbPFmOjbSOLsjkgMa0 by kissane@mas.to
2023-11-02T19:53:11Z
0 likes, 0 repeats
@simon Kinda goofy framing, but interesting results nevertheless.
(DIR) Post #AbPFydcPwaaYNG9qFs by jsit@social.coop
2023-11-02T19:54:36Z
0 likes, 0 repeats
@simon "pretty please with sugar on top"
(DIR) Post #AbPGtKogWRx8ZBvlse by AlgoCompSynth@ravenation.club
2023-11-02T20:05:45Z
0 likes, 0 repeats
@simon "it's importatn to my career?" What data was this thing trained on?
(DIR) Post #AbPI0NaAjl2V7t486y by nicolegoebel@digitalcourage.social
2023-11-02T20:18:12Z
0 likes, 0 repeats
@simon oh... wow. I knew it! At least I made similar experiences with GPT-4.
(DIR) Post #AbPINyWBZq8iLBS6Vs by osma@sigmoid.social
2023-11-02T20:22:10Z
0 likes, 0 repeats
@simonHow does this compare to taking a deep breath? https://arstechnica.com/information-technology/2023/09/telling-ai-model-to-take-a-deep-breath-causes-math-scores-to-soar-in-study/
(DIR) Post #AbPMpl43agZ8XWna6a by PaulWay@aus.social
2023-11-02T21:10:50Z
0 likes, 0 repeats
@simon Are you saying that if you said that to a human it would not also get them to put more effort into trying to give you the right answer?
(DIR) Post #AbPPaIngh2MEOoSod6 by Wikisteff@mastodon.social
2023-11-02T21:43:10Z
0 likes, 0 repeats
@simon I will try it and get back to you!
(DIR) Post #AbPQAwmVWROcp5NxDs by ratkins@mastodon.social
2023-11-02T21:49:48Z
0 likes, 0 repeats
@simon This is infuriating and Iโm going to try it.
(DIR) Post #AbPRO0hSMegIh8odUG by simon@fedi.simonwillison.net
2023-11-02T22:01:35Z
0 likes, 0 repeats
@AlgoCompSynth the same trick works on a bunch of different models all presumably with different training data - apparently the idea that "important to my career" means you should provide a better answer is universal!
(DIR) Post #AbPRqKLAw0F7iXtdYG by AlgoCompSynth@ravenation.club
2023-11-02T22:07:18Z
0 likes, 0 repeats
@simon What other cheat codes do those LLMs have? And why aren't the LLM makers required by law to disclose them?
(DIR) Post #AbPWBmUhr2HOvsq8mW by faassen@fosstodon.org
2023-11-02T22:57:06Z
0 likes, 0 repeats
@simonI asked chatgpt 3.5 about whether there was an elephant at Charlemagne's court. It said it was a common myth (it's not)Then I asked chatgpt 4 and it knew it is history.For a while I thought that showed chatgpt 4 works better.Then I said to chatgpt 3.5 to pretend it was a hypothetical chatgpt 4, asked the question and it also knew the right answer.
(DIR) Post #AbPcnb8b7rB6RKulTU by tessarakt@mastodon.social
2023-11-03T00:10:53Z
0 likes, 0 repeats
@simon What about saying "please"?
(DIR) Post #AbPd0Hl4UOys6GxaZE by crashglasshouses@tsukihi.me
2023-11-03T00:11:44Z
0 likes, 0 repeats
@simon and the computer still won't care about you or the results, because it doesn't know anything!
(DIR) Post #AbPhkSUNKBv6WCQIFc by wizzwizz4@fosstodon.org
2023-11-03T00:45:23Z
0 likes, 0 repeats
@AlgoCompSynth @simon I think your model of how these things work is entirely too sensible. They're not complex algorithms designed to know a lot about a lot of topics.Nobody *designed* the algorithms: they wrote a small-ish computer program to identify (a certain class of) patterns in textual data, then gave it loads of space in which to store those patterns (parameters) and gave it a *lot* of text to look for patterns in. The *output* of this process โ the patterns โ is the language model.
(DIR) Post #AbPhkTMG5r1PDJdKEq by wizzwizz4@fosstodon.org
2023-11-03T00:51:10Z
0 likes, 0 repeats
@AlgoCompSynth @simon These "cheat codes" are the patterns in those patterns โ but a neural network is a very different kind of structured data than a text corpus, so you can't use the same techniques to pick out patterns. Instead, it's mostly just people going โI wonder if this pattern is there?โ โOh, yes it is! Better write a paper on it.โThe people who โmakeโ the language models have no special insight into their behaviour, except in as far as they've deliberately altered it.
(DIR) Post #AbPhkUAx3NZTkXLoFk by simon@fedi.simonwillison.net
2023-11-03T01:06:28Z
0 likes, 0 repeats
@wizzwizz4 @AlgoCompSynth right, this is the one of the things I find so interesting about LLMs: even the creators don't know what they can do or what prompts will work best - people keep on discovering new tricks even years after the model was released
(DIR) Post #AbQ3U9vOmAdxIihDUW by cerisara@mastodon.online
2023-11-03T05:10:08Z
0 likes, 0 repeats
@simon ... and that they may be better than we think but we don't know how to 'control' them!
(DIR) Post #AbQDwJS7wb2yV84H56 by BaumGeist@mastodon.social
2023-11-03T07:07:16Z
0 likes, 0 repeats
@simon I have a theory that LLMs are actually really bad at language in that they can only "understand" explicit context, which excludes 80% of communication.On the other side, humans are not great at estimating exactly how much goes unsaid. E.g. when I ask you a question, you likely know I desire an on-topic answer that represents reality as accurately as symbolic communication can. When I ask GPT, all it knows is that I want a response.
(DIR) Post #AbQFNjfyecH1bXBcMy by osma@sigmoid.social
2023-11-03T07:23:28Z
0 likes, 0 repeats
@simon I propose that this kind of prompt engineering should be called "silly computing", or "sillyputing" for short. With nods to Silly Putty and of course Monty Python."I'm afraid your prompt isn't silly enough. Can you make it sillier?"#sillyputing
(DIR) Post #AbQTxJYucZukOek1A0 by gray17@mastodon.social
2023-11-03T10:06:44Z
0 likes, 0 repeats
@simon @MattHodges I feel like "signal for level of information" is maybe misleading. (the paper's claim of "understanding emotion" is definitely misleading).Maybe, "nudge the probability space of plausible continuations"? A French prompt will nudge the plausible continuations to be French, etc. In general, any contextual cues in a prompt will nudge the plausible continuations to make them better fit those contexts.
(DIR) Post #AbQgjZJMZHikalFBUO by acowley@mastodon.social
2023-11-03T12:29:43Z
0 likes, 0 repeats
@simon A lot of prompt engineering for #LLMs (particularly generic query prefixes or suffixes that improve the results in some way) is hand-crafted features applied after the fact. In the old ways, you'd filter your input before training the model to boost the signal to noise. You might do this by looking for specific features that are associated with good inputs. These days, you train on everything, then pick the outputs based on inputs that correlate with those features.
(DIR) Post #AbQqjmbK9lBhLoNzbE by baltakatei@twit.social
2023-11-03T14:23:56Z
0 likes, 0 repeats
@simon @wizzwizz4 @AlgoCompSynth In Discworld terms, I imagine Terry Pratchett would describe LLMs as L-Space resonators which get more powerful according to how many books you add to it, library-wise.The wizards who made it don't know why inserting an anthill inside works but they do know from empirical evidence that an anthill helps computations. https://www.youtube.com/live/r0Ogt-q956I&t=8241
(DIR) Post #AbQr6v4eosi2I38fPU by wizzwizz4@fosstodon.org
2023-11-03T14:28:06Z
0 likes, 0 repeats
@baltakatei Pterry would've had such a take on this.