Post AWMIIP8ajC6SBYFKIC by simon@fedi.simonwillison.net
 (DIR) More posts by simon@fedi.simonwillison.net
 (DIR) Post #AWM7agxS70aDAeYNsW by simon@fedi.simonwillison.net
       2023-06-04T18:17:00Z
       
       0 likes, 0 repeats
       
       I wrote about how it's infuriatingly hard to understand how closed models train on their input https://simonwillison.net/2023/Jun/4/closed-model-training/
       
 (DIR) Post #AWM7nMwvoXC4seM4Bs by simon@fedi.simonwillison.net
       2023-06-04T18:19:20Z
       
       0 likes, 0 repeats
       
       This post ended up pretty different from what I sat down to write - I initially wanted to write about how I'm confident that models DON'T use their input for future training but I don't have enough data to justify that confidence
       
 (DIR) Post #AWM8WDb7QlAQuKXPn6 by ncweaver@thecooltable.wtf
       2023-06-04T18:27:21Z
       
       0 likes, 0 repeats
       
       @simon One thing I worry about if they do have a closed loop without supervision if you could poison the data with lies deliberately...
       
 (DIR) Post #AWM8hka78cCiOGZCzY by StuartGray@mastodonapp.uk
       2023-06-04T18:29:10Z
       
       0 likes, 0 repeats
       
       @simon I can provide some insight into the "Where things get a lot murkier is ChatGPT itself" data use - RHLF. I can't say what model this approach was used for because it was never disclosed and done through 3rd party contractors...On task gig sites, there have been several high paying rounds of tasks that present ~20 conversation exchanges between and user & a bot, and each bot response has to be scored Y/N on 10-15 different metrics.There's also another task type where they present a
       
 (DIR) Post #AWMA7BLKsMRiaGLS52 by u0421793@functional.cafe
       2023-06-04T18:45:12Z
       
       0 likes, 0 repeats
       
       @simon my guess, no, estimate, no, guess, is that the “further training” is basically a stage after the fine tuning which itself is after the training – so a kind of free and user-generated ‘Reinforcement Learning from Human Feedback’ (RLHF) stageI’d doubt it contributes to the actual learning of the model, because after all they’re pretrained (the P in Generative Pretrained Transformer) so what they contain is effectively frozen and preset (which I view as a drawback and that must change in the future)For specialised domains, though, you probably don’t need so much wide generalisation as a commercial GPT which contains a phenomenal amount of computer programming know-how training (wtf use is that? Oh yes, the machines can do it all so people don’t have to now) and also quite often an entire multiplicity of other human languages too (which bulks up the token size of the model)
       
 (DIR) Post #AWMBv7dN2YUzVu9qhk by simon@fedi.simonwillison.net
       2023-06-04T19:04:56Z
       
       0 likes, 0 repeats
       
       @u0421793 That's pretty much my guess too... but the problem is that it's just a guess, so I'm not confident sharing it with other people without adding all sorts of disclaimers to itI want certainty
       
 (DIR) Post #AWMDDlV5ekmuK8jpyK by jacob@social.jacobian.org
       2023-06-04T19:13:53Z
       
       0 likes, 0 repeats
       
       @simon this feels really damning to me. If these companies WEREN’T training on input, it’d be incredibly easy for them to just clearly and catagoricially say they aren’t. The lack of a statement to that effect is all that I need to conclude they are (or will in the future). I see absolutely zero reason to trust these companies.
       
 (DIR) Post #AWMDDoUUXR5jb9TtxI by simon@fedi.simonwillison.net
       2023-06-04T19:19:47Z
       
       0 likes, 0 repeats
       
       @jacob that's the exact kind of cynical take that I'd like to be able to argue against... but I can't
       
 (DIR) Post #AWMDPEGT3IYSigMhOK by u0421793@functional.cafe
       2023-06-04T19:22:08Z
       
       0 likes, 0 repeats
       
       @simon I will add through that I sort of intuitively feel that it might be possible in there somewhere to sort of teach (in a different way, something more like hypnotic suggestion) a Large Language Model to synthesise new knowledge from what it already has, and in doing so add to the knowledge it has and in doing so violate the idea that it is pretrained – who knows, nature finds a way etc
       
 (DIR) Post #AWMESaKTZZO9yVP7CK by glyph@mastodon.social
       2023-06-04T19:33:56Z
       
       0 likes, 0 repeats
       
       @simon @jacob it’s not cynical; they absolutely do, and that’s why tech companies are generally starting to ban it https://gizmodo.com/chatgpt-ai-samsung-employees-leak-data-1850307376
       
 (DIR) Post #AWMGYGqN8G87BgA796 by codingGarden@mstdn.social
       2023-06-04T19:57:07Z
       
       0 likes, 0 repeats
       
       @simon But they actually say they are using customer prompts delivered to their API even back in the original instructGPT paper: https://openai.com/research/instruction-following and later on in the chatgpt paper they say they used data from their playground, from users. I don’t see why they should have stopped now?
       
 (DIR) Post #AWMGkHAHlQ89etCWTg by acdha@code4lib.social
       2023-06-04T19:58:17Z
       
       0 likes, 0 repeats
       
       @simon that’s what has me thinking @jacob is right. It’d be so easy to head off lost sales with a simple statement if they weren’t either  using or considering using that data.
       
 (DIR) Post #AWMGtQXGIGPXKLJNVg by simon@fedi.simonwillison.net
       2023-06-04T19:58:20Z
       
       0 likes, 0 repeats
       
       @codingGarden ooh that's a useful quote, thanksThat supports my suspicion that user input isn't used directly as raw input to the pre-training step but instead participates in rounds of things like RLHF
       
 (DIR) Post #AWMH566i4TMNReSOdk by simon@fedi.simonwillison.net
       2023-06-04T20:00:56Z
       
       0 likes, 0 repeats
       
       @glyph @jacob something can be both cynical and true at the same time!I still don't personally think that (outside of justified security/leak concerns) the  rationale that tech companies have for banning usage holds up, but as I said in my post I'm not confident enough to stake my reputation on to it - and the vendors are doing nothing to help me get to that state of confidence
       
 (DIR) Post #AWMHEvuOqNxqi7OB8q by simon@fedi.simonwillison.net
       2023-06-04T20:03:07Z
       
       0 likes, 0 repeats
       
       @acdha @jacob that's effectively what they've now done for data submitted through their API - they now have a non-ambiguous statement that "OpenAI does not use data submitted by customers via our API to train OpenAI models or improve OpenAI’s service offering."ChatGPT via their consumer web interface is an entirely different issue though
       
 (DIR) Post #AWMI8KWP2EmmyPa18S by simon@fedi.simonwillison.net
       2023-06-04T20:15:07Z
       
       0 likes, 0 repeats
       
       @codingGarden I added a section to my post based on that - thanks for the lead https://simonwillison.net/2023/Jun/4/closed-model-training/#instructgpt
       
 (DIR) Post #AWMIIP8ajC6SBYFKIC by simon@fedi.simonwillison.net
       2023-06-04T20:16:49Z
       
       0 likes, 0 repeats
       
       I updated that post with some extra notes about clues provided in the InstructGPT data as to how user input might be used to improve the modelshttps://simonwillison.net/2023/Jun/4/closed-model-training/#instructgpt
       
 (DIR) Post #AWMKpiI2jxj33Ag5a4 by codingGarden@mstdn.social
       2023-06-04T20:45:21Z
       
       0 likes, 0 repeats
       
       @simon I am so surprised that you read it like that. For me, it is more like: If they used the shitty, few playground data back in the day for RLHF stuff, then now the millions of prompts they get per day will be automatically categorized and without/minimal human in the loop fed back into the training data. Why else provide ChatGPT for free? If not to get the prompts people are asking? Would be very uncharacteristic of them imho
       
 (DIR) Post #AWMMNSaNjbSi90uYHw by andrei_chiffa@mastodon.social
       2023-06-04T21:02:32Z
       
       0 likes, 0 repeats
       
       @simon I would be very surprised if they didn't use their input/output threads for creating conversational conversion datasets after some filtering. Currently models are more limited by good data than anything and further pretraining models on datasets that are closer to intended applications are definitely a way to achieve it.
       
 (DIR) Post #AWMOMI9nDtUHQ12rD6 by simon@fedi.simonwillison.net
       2023-06-04T21:24:56Z
       
       0 likes, 0 repeats
       
       @codingGarden Operating ChatGPT for free also made them, almost overnight, the most famous organization in the entire AI space by an order of magnitude - prior to ChatGPT most tech decision makers had only a vague idea of what they were doing, if they had heard of them at all
       
 (DIR) Post #AWMjmTWjfbzBFU6R6G by DavidObando@hachyderm.io
       2023-06-05T01:24:27Z
       
       0 likes, 0 repeats
       
       @simon this was a great read and is inspiring me to write about IntelliCode (a Visual Studio and VSCode technology), its history, data, methods of extraction, etc.