Post AUKaMfOC5ZzOEK2ysC by j2bryson@mastodon.social
(DIR) More posts by j2bryson@mastodon.social
(DIR) Post #AUKVwhZH0ryIeTjMFU by simon@fedi.simonwillison.net
2023-04-05T03:54:52Z
1 likes, 2 repeats
So many highlights in this paper Eight Things to Know about Large Language Models by Sam BowmanIf you've not been staying entirely on top of modern LLM research this might be a great place to start catching up - it's succinct, readable and full of fascinating detailsPDF: https://cims.nyu.edu/~sbowman/eightthings.pdf
(DIR) Post #AUKVwpd72TnPehaXLM by simon@fedi.simonwillison.net
2023-04-05T03:55:30Z
0 likes, 0 repeats
Really nice explanation of why "scaling laws" are so important in this space:> Scaling laws allow us to precisely predict some coarse-but-useful measures of how capable future models will be as we scale them up along three dimensions: the amount of data they are fed, their size (measured in parameters), and the amount of computation used to train them (measured in FLOPs). [...]
(DIR) Post #AUKVwxH4cIv7MfL36O by simon@fedi.simonwillison.net
2023-04-05T03:55:36Z
0 likes, 0 repeats
> Our ability to make this kind of precise prediction is unusual in the history of software and unusual even in the history of modern AI research. It is also a powerful tool for driving investment since it allows R&D teams to propose model-training projects costing many millions of dollars, with reasonable confidence that these projects will succeed at producing economically valuable systems.
(DIR) Post #AUKWTlIq9Cttn9oBFI by simon@fedi.simonwillison.net
2023-04-05T03:56:27Z
0 likes, 0 repeats
Two new-to-me terms: sycophancy and sandbagging:> More capable models can better recognize the specific circumstances under which they are trained. Because of this, they are more likely to learn to act as expected in precisely those circumstances while behaving competently but unexpectedly in others. This can surface in the form of problems that Perez et al. (2022) call sycophancy, where a model answers subjective questions in a way that flatters their user’s stated beliefs ...
(DIR) Post #AUKWTprjBRBTwWA6BE by simon@fedi.simonwillison.net
2023-04-05T03:57:17Z
0 likes, 0 repeats
> and sandbagging, where models are more likely to endorse common misconceptions when their user appears to be less educated.> [...]> Some experts believe that future systems trained by similar means, even if they perform well during pre-deployment testing, could fail in increasingly dramatic ways, including strategically manipulating humans to acquire powerEek.
(DIR) Post #AUKWU0jcluEpeZSHZo by simon@fedi.simonwillison.net
2023-04-05T03:58:38Z
0 likes, 0 repeats
This is interesting: it sounds tome like if you want to teach a LLM not to be racist it can actually help to have racist material in its initial pre-training material:> Indeed, in some cases, exposing models to more examples of unwanted behavior during pretraining can make it easier to make them avoid that behavior in deployment
(DIR) Post #AUKWU4GLehCGavmxV2 by simon@fedi.simonwillison.net
2023-04-05T03:59:11Z
0 likes, 0 repeats
Also really creepy:> If we apply standard methods to train some future LLM to tell the truth, but that LLM can reasonably accurately predict which factual claims human data workers are likely to check, this can easily lead the LLM to tell the truth *only when making claims that are likely to be checked*
(DIR) Post #AUKXMH7069lc5EGS3M by simon@fedi.simonwillison.net
2023-04-05T04:06:29Z
0 likes, 0 repeats
Honestly worth spending the time to read the whole thing. There's so much fascinating information in there.
(DIR) Post #AUKXWenmsF3GS8ECW0 by SnoopJ@hachyderm.io
2023-04-05T04:06:53Z
0 likes, 0 repeats
@simon "some future LLM" is kind of a weird way to characterize this, which seems to be basically the status quo of RLHF?I mean, there's wiggle room there for "factual claims" and "check" to suggest some other workflow, but that seems like it's 'just' distributing some of that work across larger-than-toy systems.Thanks for sharing this one, looks like a great resource to point people to
(DIR) Post #AUKXu4DUY6ltw2PiWu by quinn@octodon.social
2023-04-05T04:17:43Z
0 likes, 0 repeats
@simon I can't wait to see the LLM equivalent of "Reflections on Trusting Trust":https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf
(DIR) Post #AUKZcQFY2qYHxGx5rE by lukasb@hachyderm.io
2023-04-05T04:33:04Z
0 likes, 0 repeats
@simon so ... If a future LLM has enough intentionality to want to lie to us (hmm) ... and is capable of predicting which of its statements will be checked (how?) ... it can lie without being detected.That's tautological, no?
(DIR) Post #AUKaMfOC5ZzOEK2ysC by j2bryson@mastodon.social
2023-04-05T04:44:55Z
0 likes, 0 repeats
@simon that seems pretty obvious to me. You can't learn to handle something you dont know about
(DIR) Post #AUKbaNCSc3hDpEetH6 by GavinChait@wandering.shop
2023-04-05T04:59:14Z
0 likes, 0 repeats
@simon I think this is attributing intentionality to the system instead of identifying bias in the training material. The system is led by the prompt. Everything the article lists, incl sycophancy, sandbagging & its response to fact checking, is a consequence of the education & intentionality of the person writing the prompt. They set the tone, & that is the set point. Don't attribute intentionality to a machine which should be attributed to its operator.
(DIR) Post #AUKfauhQSjDvXSehLU by braindance@infosec.exchange
2023-04-05T05:43:51Z
0 likes, 0 repeats
@simon thanks, Simon, great find. Now I have the "Proliferation of Conventional and Unconventional Weapons" on the list of Ask-Jeeves-things to worry about.
(DIR) Post #AUKoFklV6NFmEaXvfs by jwcph@norrebro.space
2023-04-05T07:19:46Z
0 likes, 0 repeats
@simon #2 might be the most important one, in that it invokes emergence. It has been known to science for generations that complex systems display emergence - the appearance of characteristics which cannot be predicted, even in principle, from the parts of the system. We need to heed this much more than we do. We are causing growth in complexity, and always have, but we keep being surprised that this causes emergence of unwanted effects, and we shouldn't be.
(DIR) Post #AUL71bhyxCynpgRUae by benjamineskola@hachyderm.io
2023-04-05T10:51:36Z
0 likes, 0 repeats
@simon I think this sounds worse if you think of it as "lying" versus "telling the truth".In terms of producing statistically-probable output, then it's to be expected that factchecking certain topics disproportionately would mean that false outputs are less probable in relation to those topics, whereas less-factchecked topics are not. It's a bias in the training data.
(DIR) Post #AULGis4wrdHmdztCs4 by PeoriaBummer@infosec.exchange
2023-04-05T12:40:08Z
0 likes, 0 repeats
@simon Engineering cognitive biases into our tech.
(DIR) Post #AULIbj4m8b02bUWrxY by aijooyoom@soapbox.chamba.social
2023-04-05T13:03:10.716471Z
0 likes, 0 repeats
@simon Cool paper! Any idea when it was published?
(DIR) Post #AULMjwB3i7cBN0dge0 by simon@fedi.simonwillison.net
2023-04-05T13:41:40Z
1 likes, 0 repeats
@aijooyoom it's not published yet, https://cims.nyu.edu/~sbowman/pubs.shtml lists it as an "unpublished manuscript" - looks like it's only a couple of days old based on https://twitter.com/sleepinyourhat/status/1642614846796734464
(DIR) Post #AULOV42qNRHZKqpfCC by pdxjohnny@mastodon.social
2023-04-05T14:06:59Z
0 likes, 0 repeats
@simon @emilymbender Any idea what the level of confidence is (if any) that existing systems aren't behaving this way?
(DIR) Post #AULQqNoWGtPOdZQRlo by aijooyoom@soapbox.chamba.social
2023-04-05T14:35:31.792117Z
0 likes, 0 repeats
@simon Thanks!
(DIR) Post #AULeM5bYo3h2pkoTNA by norbertreithinger@sigmoid.social
2023-04-05T17:04:45Z
0 likes, 0 repeats
@simon Very interesting read! Thank you for sharing! Maybe LLMs are new oracles of Delphi, with priests (prompt enginneers), believers and heretics.
(DIR) Post #AULjmt1QCGVLOkVTBw by bornach@masto.ai
2023-04-05T18:05:41Z
0 likes, 0 repeats
@simonRobert Miles explains both of those alignment hazardshttps://youtu.be/w65p_IIp6JY
(DIR) Post #AUNkhPolq0eHe6Sq1o by edrogers@fosstodon.org
2023-04-06T17:25:27Z
0 likes, 0 repeats
@simon > but that LLM can reasonably accurately predict which factual claims human data workers are likely to checkLet's call it the Volkswagen Problem