[HN Gopher] GPT-4 LLM simulates people well enough to replicate ...
___________________________________________________________________
GPT-4 LLM simulates people well enough to replicate social science
experiments
Author : thoughtpeddler
Score : 203 points
Date : 2024-08-07 21:30 UTC (1 days ago)
(HTM) web link (www.treatmenteffect.app)
(TXT) w3m dump (www.treatmenteffect.app)
| thoughtpeddler wrote:
| Accompanying working paper that demonstrates 85% accuracy of
| GPT-4 in replicating 70 social science experiment results:
| https://docsend.com/view/qeeccuggec56k9hd
| Jensson wrote:
| Do you even get 85% replication rate with humans in social
| science? Doesn't seem right.
|
| But at least it can give them hints of where to look, but going
| that way is very dangerous as it gives LLM operators power to
| shape social science.
| TeaBrain wrote:
| The study isn't trying to do replication, but seems to have
| tested the rate that GPT-4 predicts human responses to survey
| studies. After reading the study, the writers really were not
| clear on how they were feeding the studies they were
| attempting to predict the responses to into the LLM. The data
| they used for training also was not clear, as they only
| dedicated a few lines referring to this. For 18 pages, there
| was barely any detail on the methods employed. I also don't
| believe the use of the word "replication" makes any sense
| here.
| nullc wrote:
| But do the experiments replicate better in LLMs than in actual
| humans? :D
|
| We should expect LLMs to be pretty good at repeating back to us
| the stories we tell about ourselves.
| pedalpete wrote:
| I wonder if this could be used for testing marketing or UX
| actions?
| vekntksijdhric wrote:
| Same energy as https://mastodon.radio/@wa7iut/112923475679116690
| dantyti wrote:
| why not just link directly?
| https://existentialcomics.com/comic/557
| 42lux wrote:
| Everyone and their mom in advertising sold "GPT Persona" tools
| which are basically just an api call to brands for target group
| simulation. Think "Chat with your target group" kinda stuff.
|
| Hint: They like it because it's biased for what they want... like
| real marketing studies.
| markovs_gun wrote:
| Yeah anyone who has used ChatGPT for more than 30 minutes of
| asking it to write poetry about King Charles rapping with Tupac
| and other goofy stuff has realized that it is essentially
| trained to assume that whatever you're saying to it is true and
| to not say anything negative to you. It can't write stories
| without happy endings, and it can't recognize when you ask it a
| question that contains a false premise. In marketing, I assume
| if you ask a fake target demographic if it will like your new
| product that is pogs but with blockchain technology, it will
| pretty much always say yes
| cj wrote:
| I've noticed this in article summaries. It does seem to have
| some weird biases.
|
| I've been able to get around that by directly asking it for
| pros/cons or "what are the downsides" or "identify any
| inconsistencies" or "where are the main risks"... etc
|
| There's also a complexity threshold where it performs much
| better if you break a simple question down into multiple
| parts. You can basically to prompt-based transformations of
| your own input to break down information and analyze it in
| different ways prior to using all of that information to
| finally answer a higher level question.
|
| I wish ChatGPT could do this behind the scenes. Prompt itself
| "what questions should I ask myself that would help me answer
| this question?" And go through all those steps without
| exposing it to the user. Or maybe it can or already does, but
| it still seems like I get significantly better results when I
| do it manually and walk ChatGPT through the thought process
| myself.
| Propelloni wrote:
| If you can do that and do it, for what do you need to ask
| the chatbot? Genuine question, because in my mind that's
| the heavy lifting you do there and you will get to a
| conclusion in the process. All the bot can do is agree with
| you and that serves what purpose?
| markovs_gun wrote:
| Another interesting case with this is an instance I had
| with Google Assistant's AI summary feature for group chats.
| In the group chat, my mom said that my grandma was in the
| hospital and my sister said she was going to go visit her.
| In the AI summary, my grandma was on vacation and my sister
| was in the hospital. Completely useless.
| IIAOPSW wrote:
| Yes, but with the caveat of in some very specific cases no.
|
| I spent a good deal of time trying to get it to believe there
| was a concept in the law of "praiseworthy homicide". I even
| had (real) citations to a law textbook. It refused to believe
| me.
|
| Given the massive selling point of ChatGPT to the legal
| profession, and the importance of actually being right,
| OpenAI certainly reduces the "high trait agree-ability" in
| favor of accuracy in this particular area.
| AnthonBerg wrote:
| There's a way to apply the concept of praiseworthy homicide
| - metaphorically - to your battle with it.
| fragmede wrote:
| here's a story with a sad ending, called sad musical
| farewell.
|
| https://chatgpt.com/share/0d651c67-166f-4cef-
| bc8c-1f4d5747bd...
| Zambyte wrote:
| Apparently counter examples are very unappreciated. I also
| gave a counter example for each of their claims, but my
| comment got flagged immediately.
|
| https://news.ycombinator.com/item?id=41187549
| markovs_gun wrote:
| I should have clarified that I meant that it has trouble
| writing stories with bad endings unless you ask for them
| directly and specifically, which can be burdensome if
| you're trying to get it to write a story about something
| specific that would naturally have a sad ending.
| Terr_ wrote:
| > it is essentially trained to assume that whatever you're
| saying to it is true and to not say anything negative to you.
|
| Oh, it's actually worse than that: A given LLM probably has
| zero concept of "entities", let alone "you are one entity and
| I am another" or "statements can be truths or lies."
|
| There is merely one combined token stream, and dream-
| predicting the next tokens. While that output prediction
| often _resembles_ a conversation we expect between two
| entities with boundaries, that says more about effective
| mimicry than about its internal operation.
| ithkuil wrote:
| I agree there is limited modeling going on but the smoking
| gun is not on the fact that all there is to an LLM is mere
| "next token prediction".
|
| In order to successfully predict the next token the model
| needs to reach a significant level of "understanding" of
| the preceding context and the next token is the "seed" of a
| much longer planned response.
|
| Now, it's true that this "understanding" is not even close
| to what humans would call understanding (hence the quotes)
| and that the model behaviour is heavily biased towards
| productions that "sound convincing" or "sound human".
|
| Nevertheless LLMs perform an astounding amount of
| computation in order to produce that next token and that
| computation happens in a high dimensional space that
| captures a lot of "features" of the world derived from an
| unfathomably large and diverse training set. And there is
| still room for improvement in collecting. cleaning and/or
| synthesizing an even better training corpus for LLMs.
|
| Whether the current architecture of LLMs will ever be able
| to truly model the world is an open question but I don't
| think the question can be resolved just by pointing out
| that all the model does is produce the next token. That's
| just an effective way researchers found to build a channel
| with the external world (humans and the training set) and
| transform to and from the high-dimensional reasoning space
| and legible text.
| TeaBrain wrote:
| I don't think "replicate" is the appropriate word here.
| valiant55 wrote:
| I'm sure Philip K Dick would disagree.
| __loam wrote:
| Dick would hate these guys lol.
| Xen9 wrote:
| Only until you realize that repli-cant.
| dartos wrote:
| Were those experiments in the training set?
|
| If so, how close was the examination vs the record the model was
| trained on.
|
| Some interesting insights there, I think.
| masterofpupp3ts wrote:
| The answers to your questions are in the paper linked in the
| first line of the app
| cpeterso wrote:
| > Accuracy remained high for unpublished studies that could
| not appear in the model's training data (r = 0.90).
| croes wrote:
| Is that the solution to social science's replication problem?
| nomel wrote:
| With the temperature parameter effectively set to 0, it may
| finally be possible!
| xp84 wrote:
| Can someone translate for us non-social-scientists in the
| audience what this means? "3. Treatment. Write a message or
| vignette exactly as it would appear in a survey experiment."
|
| Probably would be sufficient to just give a couple examples of
| what might constitute one of these.
|
| Sorry, I know this is probably basic to someone who is in that
| field.
| X0nic wrote:
| Same for me. I had no idea what was being asked.
| LogicalRisk wrote:
| A treatment might look like
|
| "In the US, XXX are much more likely to be unemployed than are
| YYY. The unemployment rate is defined as the percentage of
| jobless people who have actively sought work in the previous
| four weeks. According to the U.S. Bureau of Labor Statistics,
| the average unemployment rate for XXX in 2016 was five times
| higher than the unemployment rate for YYY"
|
| "How much of this difference do you think is due to
| discrimination?"
|
| In this case you'd fill in XXX and YYY with different values
| and show those treatments to your participants based on your
| treatment assignment scheme.
| tylerrobinson wrote:
| A YC company called Roundtable tried to do this.[1]
|
| The comments were not terribly supportive. They've since pivoted
| to a product that does survey data cleaning.
|
| [1] https://news.ycombinator.com/item?id=36865625
| AnthonBerg wrote:
| A social science experiment in and of itself. A fine thread of
| tragedy in the rich tapestry of enterprise.
| janalsncm wrote:
| Products like this make me pretty cynical about VCs' ability to
| evaluate novel technical products. Any ML engineer who spent 5
| minutes understanding would have rejected the pitch.
| rytill wrote:
| I'm an ML engineer who's spent more than 5 minutes thinking
| about this idea and would not have automatically rejected the
| pitch.
| janalsncm wrote:
| There are so many basic questions raised in the Launch HN
| thread that didn't have good answers. It indicates to me
| that YC didn't raise those questions, which is a red flag.
| dongobread wrote:
| I'm very skeptical on this, the paper they linked is not
| convincing. It says that GPT-4 is correct at predicting the
| experiment outcome direction 69% of the time versus 66% of the
| time for human forecasters. But this is a silly benchmark because
| people are not trusting human forecasters in the first place,
| that's the whole purpose for why the experiment is run. Knowing
| that GPT-4 is slightly better at predicting experiments than some
| human guessing doesn't make it a useful substitute for the actual
| experiment.
| fl0id wrote:
| This so much. There was another similar one recently which was
| also BS.
| yas_hmaheshwari wrote:
| Nicely put! Well argued!
|
| I was not able to put my finger on what I felt wrong about the
| article -- till I read this
| authorfly wrote:
| I totally agree. So many people are missing the point here.
|
| Also important is that in Psychology/Sociology, it's the
| counter-intuitive results that get published. But these results
| disproportionately fail to replicate!
|
| Nobody cares if you confirm something obvious, unless it's on
| something divisive (e.g. sexual behavior, politics), or there
| is an agenda (dieting, etc). So people can predict those ones
| more easily than predicting a randomly generated premise. The
| ones that made their way into the prediction set were the ones
| researchers expected to be counter-intuitive (and likely
| P-hacked a significant proportion of them to find that result).
| People know this (there are more positive confirming papers
| than negative/fail-to-replicate).
|
| This means the _counter-intuitive, negatively forecast results,
| are the ones that get published_ i.e. the dataset saying that
| 66% of human forecasters is disproportionately constructed of
| studies that found counter-intuitive results compared to the
| overall neutral pre-published set of studies, because
| scientists and grant winners are incentivised to publish
| counter-intuitive work. I would even suggest the selected
| studies are more tantalizing that average in most of these
| studies, they are key findings, rather than the miniature of
| comments on methods or re-analysis.
|
| By the way the 66% result has not held up super well in other
| research, for example, only 58% could predict if papers would
| replicate later on: https://www.bps.org.uk/research-
| digest/want-know-whether-psy... - Results with random people
| show that they are better than chance for psychology, but on
| average by less than 66% and with massive variance. This figure
| doesn't differ from psychology professors which should tell you
| the stat represents more the context of the field and it's
| research apparatus itself rather than capability to predict
| research. What if we revisit this GPT-4 paper in 20 years, see
| which have replicated, ask people to predict that - will GPT-4
| still be higher if it's data is frozen today? If it is up to
| date? Will people hit 66%, 58%, or 50%?
|
| My point is, predicting the results now is not that useful
| because historically, up to "most" of the results have been
| wrong anyhow. Predicting which results will be true and remain
| true would be more useful. The article tries to dismiss the
| issue of the replication crisis by avoiding it, and by using
| pre-registered studies, but such tools are only bandages.
| Studies still get cancelled, or never proposed after internal
| experimentation, we don't have a "replication reputation meter"
| to measure those (which affect and increase false positive
| results), and we likely never will, with this model of science
| for psychology/sociology statistics. If the authors read my
| comment and disagree, they should use predictions for underway
| replications with GPT-4 and humans, wait a few years for the
| results, and then conduct analysis.
|
| Also, more to the point, as a Psychology grant funded once told
| me - the way to get a grant in Psychology is to: 1) Acquire a
| result with a counter-intuitive result first. Quick'n'dirty
| research method like students filling in forms, small sample
| size, not even published, whatever. Just make the story good
| for this one and get some preliminary numbers on some topic by
| casting a big web of many questions (a few will get P < 0.05 by
| chance eventually in most topics anyway at this sample size) 2)
| Find an angle whereby said result says something about culture
| or development (e.g. "The Marhsmallow experiment shows that
| poverty is already determined by your response to tradeoffs at
| a young age", or better still "The Marshmallow experiment is
| rubbish because it's actually entirely explained by SES as a
| third factor, and wealth disparity in the first place is ergo
| the cause". Importantly, change the research method to
| something more "proper" and instead apply P-hacking if possible
| when you actually carry out the research. The biggest P-hack is
| so simple and obvious nobody cares: you drop results that
| contradict or are insignificant, and just don't report them -
| carrying out alternate analysis, collecting slightly different
| data, switching from online to in person experiments, whatever
| you canto get a result. 3) Upon the premise of further
| tantalizing results, propose several studies which can fund you
| over 5 years, apply some of the buzz words of the day. Instead
| of "Thematic Analysis", It's "AI Summative Assessment" for the
| Word Frequency amounts, etc. If you know the grant judgers,
| avoid contradicting whatever they say, but be just outside of
| the dogma enough (usually, culturally) to represent
| movement/progress of "science".
|
| This is how 99% of research works. The grant holder directs the
| other researchers. When directing them to carry out an
| alternate version of the experiment or change what we are
| analyzing, you motivate them that it's for the good of the
| future, society, being at the cutting edge, and supporting the
| overarching theory (which ofcourse, already has "hundreds" of
| supporting evidence from other studies constructed in the same
| fashion).
|
| As to sociology/psychology experiments - Do social experiments
| represent language and culture more than people and groups?
| Randomly.
|
| Do they represent what would be counter-intuitive or support
| developing and entrenching models and agendas? Yes.
|
| 90% of social science studies have insufficient data to say
| anything at P < 0.01 level which should realistically be our
| goal if we even want to do statistics with the current dogma
| for this field (said kindly because some large datasets are
| genuine enough and used for several studies to make up the
| numbers in the 10%). I strongly see a revolution in
| psychology/sociology within the next 50 years to redefine a new
| basis.
| equinox12 wrote:
| I think this analysis is misguided.
|
| Even considering a historic bias for counter-intuitive
| results in social science, this has no bearing on the results
| of the paper being discussed. Most of the survey experiments
| that the researchers used in their analyses came from TESS,
| an NSF-funded program that collects well-powered nationally
| representative samples for researchers. A key thing to note
| here is that not every study from TESS gets published. Of
| course, some do, but the researchers find that GPT4 can
| predict the results of both published and unpublished studies
| at a similar rate of accuracy (r = 0.85 for published studies
| and r = 0.90 for unpublished studies). Also, given that the
| majority of these studies 1) were pre-registered (even pre-
| registering sample size), 2) had their data collected through
| TESS (an independent survey vendor), and 3) well-powered +
| nationally-representative, makes it extremely unlikely for
| them to have been p-hacked. Therefore, regardless of what the
| researchers hypothesized, TESS still collected the data and
| the data is of the highest quality within social science.
|
| Moreover, the researchers don't just look at psychology or
| sociology studies, there are studies from other fields like
| political science and social policy, for example, so your
| critiques about psychology don't apply to all the survey
| experiments.
|
| Lastly, the study also includes a number of large-scale
| behavioral field experiments and finds that GPT4 can
| accurately predict the results of these field experiments,
| even when the dependent variable is a behavioral metric and
| not just a text-based response (e.g., figuring out which text
| messages encourage greater gym attendance). It's hard for me
| to see how your critique works in light of this fact also.
| authorfly wrote:
| Yes, I am sure you should have said the same about the
| research before 2011 with the replication crisis, when it
| was always claimed that scientists like Bell (premonition)
| and Baumeister (Ego-depletion) could not possibly be faking
| their findings - they contributed so much, their models
| have "theoretical validity", they had hundreds of studies
| and other researchers building on their work! They had big
| samples. Regardless of TESS/NSF, the studies it focuses are
| have been funded (as you mention) and they were simply not
| chosen randomly. People had to apply to grants. They had to
| bring in early, previous or prototype results to convince
| people of funding.
|
| The specificness to psychology applies to most fields in
| the soft sciences with their typical research techniques.
|
| The main point is that prior research shows absolutely no
| difference between field experts and random people in
| predicting the results of studies, per-registered,
| replications, and others.
|
| GPT-4 achieving the same approximate success rate as any
| person has nothing whatsoever to do with it simulating
| people. I suspect an 8 year old could reliably predict
| psychology replications after 10 years with about the same
| accuracy. It's also key that in prior studies, like the one
| I linked, this same lack of difference occurred even when
| the people involved were provided additional recent
| resources from the field, although with higher prediction
| accuracy.
|
| The meat of the issue is simple - show me a true positive
| study, make the predictions on whether it will replicate,
| and let's see in 10 years when replication efforts have
| been taken out, whether GPT-4 is any higher than a random
| 10 year old who no information on the study. The implied
| claim here is that since GPT-4 can supposedley simulate
| sociology experiments and so more accurately judge the
| results, we can iterate it and eventually conduct science
| that way or speed up the scientific process. I am telling
| you that the simulation aspect has nothing to do with the
| success of the algorithm, which is not really outpeforming
| humans because to put it simply, humans are bad at using
| any subject-specific or case knowledge to predict the
| replication/success of a specific study(there is no
| difference between lay people and experts) and the entire
| set of published work is naturally biased anyhow. In other
| words, this style may elicit higher test score results, by
| altering the prompt.
|
| The description of the role of GPT-4 here as simulating is
| a human theoretical construction. We know that people with
| a knowledge advantage are not able to apply this to
| predicting output results any more accurately than lay
| people. That is because they are trying to predict a biased
| dataset. The field of sociology as a whole, as are most
| studies that involve humans (because they are vastly
| underfunded for large samples) struggles to replicate or
| conduct scientific in a reliable, repeatable way, and until
| we resolve that, the GPT-4 claims of simulating people, are
| spurious and unrelated at best, misleading at worst.
| equinox12 wrote:
| I'm not sure how to respond to your point about Bem and
| Baumeister's work since those cases are the most obvious
| culprits for being vulnerable to scientific
| weakness/malpractice (in particular, because they came
| before the time of open access science, pre-registration,
| and sample sizes calculated from power analyses).
|
| I also don't get your point about TESS. It seems obvious
| that there are many benefits for choosing the repository
| of TESS studies from the authors' perspective. Namely, it
| conveniently allows for a consistent analytic approach
| since many important things are held constant between
| studies such as 1) the studies have the exact same sample
| demographics (which prevents accidental heterogeneity in
| results due to differences in participant demographics)
| and 2) the way in which demographic variables are
| measured is standardized so that the only difference
| between survey datasets is the specific experiment at
| hand (this is crucial because the way in which
| demographic variables are measured varies can affect the
| interpretation of results). This is apart from the more
| obvious benefits that the TESS studies cover a wide range
| of social science fields (like political science,
| sociology, psychology, communication, etc., allowing for
| the testing of robustness in GPT predictions across
| multiple fields) and all of the studies are well-powered
| nationally representative probability samples.
|
| Re: your point about experts being equal to random people
| in predicting results of studies, that's simply not true.
| The current evidence on this shows that, most of the
| time, experts are better than laypeople when it comes to
| predicting the results of experiments. For example, this
| thorough study (https://www.nber.org/system/files/working
| _papers/w22566/w225...) finds that the average of expert
| predictions outperforms the average of laypeople
| predictions. One thing I will concede here though is
| that, despite social scientists being superior at
| predicting the results of lab-based experiments, there
| seems to be growing evidence that social scientists are
| not particularly better than laypeople at predicting
| domain-relevant societal change in the real world (e.g.,
| clinical psychologists predicting trends in loneliness)
| [https://www.cell.com/trends/cognitive-
| sciences/abstract/S136... ; full-text pdf here: https://w
| ww.researchgate.net/publication/374753713_When_expe...].
| Nonetheless, your point about there being no difference
| in the predictive capabilities of experts vs. laypeople
| (which you raise multiple times) is just not supported by
| any evidence since, especially in the case of the GPT
| study we're discussing, most of the analyses focus on
| predicting survey experiments that are run by social
| science labs.
|
| Also, based on what the paper is suggesting, the authors
| don't seem to be suggesting that these are "replications"
| of the original work. Rather, GPT4 is able to simulate
| the results of these experiments like true participants.
| To fully replicate the work, you'd need to do a lot more
| (in particular, you'd want to do 'conceptual
| replications' wherein you the underlying causal model is
| validated but now with different stimuli/questions).
|
| Finally, to address the previous discussion about the
| authors finding that GPT4 seems to be comparable to human
| forecasters in predicting the results of social science
| experiments, let's dig deeper into this. In the paper,
| but specifically in the supplemental material, the
| authors note that they "designed the forecasting study
| with the goal of giving forecasters the best possible
| chance to make accurate predictions." The way they do
| this is by showing laypeople the various conditions of
| the experiment and have the participants predict where
| the average response for a given dependent variable would
| be within each of those conditions. This is _very
| different_ from how GPT4 predicts the results of
| experiments in the study. Specifically, they prompt GPT
| to be a respondent and do this iteratively (feeding it
| different demographic info each time). The result of this
| is essentially the same raw data that you would get from
| actually running the experiment. In light of this, it 's
| clear that this is a very conservative way of testing how
| much better GPT is than humans at predicting results and
| they still find comparable performance. All that said,
| what's so nice about GPT being able to predict social
| science results just as well as (or perhaps better than)
| humans? Well, it's much cheaper (and efficient) to run
| thousands of GPT queries than is to recruit thousands of
| human participants!
| addcn wrote:
| For sure. Great argument
|
| + the experiments may already be in the dataset so it's really
| testing if it remembers pop psychology
| a123b456c wrote:
| Yes. A stronger test would be guessing the results of as-yet-
| unpublished experiments.
| lumb63 wrote:
| Furthermore, there's a replication crisis in social sciences.
| The last thing we need is to accumulate less data and let an
| LLM tell us the "right" answer.
| verdverm wrote:
| You can see this in their results, where certain types of
| studies have a lower prediction rate and higher variability
| katzinsky wrote:
| That's surprisingly low considering it was probably trained on
| many of the papers it's supposed to be replicating.
| itkovian_ wrote:
| Phsycohistory
| scudsworth wrote:
| garbage in, eh?
| AdieuToLogic wrote:
| So did ELIZA[0] about sixty (60) years ago.
|
| 0 - https://en.wikipedia.org/wiki/ELIZA
| uptownfunk wrote:
| Is it possible to train an LLM that is minimally biased and that
| could assume various personas for the purpose of the experiments?
| Then I imagine it's just some prompt engineering no?
| nsonha wrote:
| please don't, need I remind you the joke that social science is
| not real science
| visarga wrote:
| Reminds me of:
|
| > Out of One, Many: Using Language Models to Simulate Human
| Samples
|
| > We propose and explore the possibility that language models can
| be studied as effective proxies for specific human sub
| populations in social science research. Practical and research
| applications of artificial intelligence tools have sometimes been
| limited by problematic biases (such as racism or sexism), which
| are often treated as uniform properties of the models. We show
| that the "algorithmic bias" within one such tool -- the GPT 3
| language model -- is instead both fine grained and
| demographically correlated, meaning that proper conditioning will
| cause it to accurately emulate response distributions from a wide
| variety of human subgroups. We term this property "algorithmic
| fidelity" and explore its extent in GPT-3. We create "silicon
| samples" by conditioning the model on thousands of socio
| demographic backstories from real human participants in multiple
| large surveys conducted in the United States. We then compare the
| silicon and human samples to demonstrate that the information
| contained in GPT 3 goes far beyond surface similarity. It is
| nuanced, multifaceted, and reflects the complex interplay between
| ideas, attitudes, and socio cultural context that characterize
| human attitudes. We suggest that language models with sufficient
| algorithmic fidelity thus constitute a novel and powerful tool to
| advance understanding of humans and society across a variety of
| disciplines.
|
| https://arxiv.org/abs/2209.06899
| anileated wrote:
| If GPT emulations of social experiments are not correct, policy
| decisions based on them will make them so.
|
| "GPT said people would hate buses, so we halved their number and
| slashed transportation budget... Wow, do our people actually hate
| buses with passion!"
|
| "A year ago GPT said people would not be worried about climate
| change, so we stopped giving it coverage and removed related
| social adverts and initiatives. People really don't give a flying
| duck about climate change it turns out, GPT was so right!"
|
| This is an oversimplification, of course; to say it with more
| nuance, anything socio- and psycho- is a minefield of self-
| fulfilling prophecies that ML seems to be nicely positioned to
| wreak havoc in. (But the small "this is not a replacement for
| human experiment" notice is going to be heeded by all, right?)
|
| As someone wrote once, all you need for machine dictatorship is
| an LLM and a critical number of human accomplices. No need for
| superintelligence or robots.
| crngefest wrote:
| All you need for dictatorship in general is a critical number
| of human accomplices. I don't see how an LLM in the mix would
| make it worse.
|
| IMO mass communication technologies (radio, TV, internet) are
| much more important in building a dictatorship.
| anileated wrote:
| The quote was mostly a flourish (and apparently too open to
| interpretation to be useful).
|
| In any case, it is about hypothetical "machine dictatorship"
| in particular, not human dictatorships you describe. Machine
| dictatorship _traditionally_ invokes an image of "AGI" and
| violent robots forcing or eliminating humans with raw power
| and compute capabilities, and thus with no substantial need
| for accomplices (us vs. them). In contrast, it could be that
| the more realistic and probable danger from ML is in fact
| more insidious and prosaic.
|
| What you say about human dictatorship is trivially true, but
| the quote is not about that.
|
| > I don't see how an LLM in the mix would make it worse
|
| How about a thought experiment.
|
| 1. Take some historical persona you consider well-intentioned
| (for example, Lincoln), throw an LLM in that situation, and
| see if it could make it better
|
| 2. Take a person you consider a badly intentioned dictator
| (maybe that is Hitler), throw an LLM in that situation, and
| see if it could make it worse
|
| Let me know what you find.
| tgv wrote:
| Don't forget the deceptive aura of objectivity that
| machines have. It's easier to issue a command when "the
| machine has decided" or "God has decided" rather than "I
| just made this up".
| actionfromafar wrote:
| Even a pair of dice helps in that regard.
| AnimalMuppet wrote:
| This. The point of the "AI" is that it may make the
| humans are more willing to go along with the orders.
| Mordisquitos wrote:
| > "GPT said people would hate buses, so we halved their number
| and slashed transportation budget... Wow, do our people
| actually hate buses with passion!"
|
| You jest, but if you don't mind me going off on a tangent, this
| reminds me how in the summer 2020 post-lockdown-period the
| local authorities of Barcelona decided that to reduce the
| spread of COVID they had to discourage out-of-town people going
| to the city for nightlife... so they halved the number of night
| buses connecting Barcelona with nearby towns. Because, of
| course, making twice the number of people congregate at bus
| stops and making night buses even more crammed was a great way
| to reduce contagion. Also, as everybody knows, people's
| decision whether or not to party in town on a Friday night is
| naturally contingent on the purely rational analysis as to the
| number of available buses to get home afterwards.
| strogonoff wrote:
| Institutions have shown themselves not well-geared for
| coordinating and enacting consistent policy changes and
| avoiding unintended consequences under time pressure.
| Hopefully COVID was a lesson they learned from.
|
| I remember how in Seoul city authorities put yellow tape over
| outdoor sitting areas in public parks, while at the same time
| cafes (many of which are next to parks, highlighting the
| hilarity in real time) were full of people--because another
| policy allowed indoor dining as long as the number of people
| in each party is small and you put on a mask while not eating
| and leave when you are finished (guess how well that all was
| enforced).
| pembrook wrote:
| In actuality though, GPT would likely be correct on the
| democratic will of the people for the things you cited. It's
| literally just the blended average of human knowledge. What's
| more democratic than that?
|
| Meanwhile, it seems the bigger risk for dictatorship is the
| current system where we put a tiny group of elites who
| condescendingly believe they're smarter than the rest of us in
| charge ("you will take the bus with your 3 kids and groceries
| in hand and you will like it").
|
| This is how you get do-nothing social signaling policies for
| climate change (eg. Straws, bottle caps, grocery bags). Which
| make urban elites feel good about themselves but are ironically
| actively harmful towards getting the correct policies inacted
| (eg. Investment in nuclear).
| eru wrote:
| > It's literally just the blended average of human knowledge.
| What's more democratic than that?
|
| No, it's the 'blended average' of the texts it's been fed
| with.
|
| To state the obvious: illiterate people did not get a vote.
| Terminally online people got plenty of votes.
|
| And, GPT is also tuned to be helpful and to not get OpenAI in
| the news for racism etc, which is far from the 'blended
| average' of even the input texts.
| anileated wrote:
| > GPT would likely be correct on the democratic will of the
| people for the things you cited
|
| This is a dangerous line of thought, if you extend it to "why
| bother actually asking people what they want, let's get rid
| of voting and just use unfeeling software that can be pointed
| fingers at whenever things go wrong".
|
| > a tiny group of elites who condescendingly believe they're
| smarter
|
| I suppose I don't disagree, a small group without a working
| democratic election process is how dictatorships work.
|
| > you will take the bus with your 3 kids and groceries in
| hand and you will like it
|
| Bit of a tangent from me, but it looks like you are mixing
| bits of city planner utopia with bits of, I guess, typical
| American suburban reality. In a walkable city planned for
| humans (not cars) the grocery store is just downstairs or
| around the corner, because denser living makes them
| profitable enough. When you can pop down for some eggs, stop
| by local bakery for fresh bread, and be back home in under 7
| minutes, you don't really _want_ to take a major trip to
| Costco with all your kids to load up the fridge for the week.
| You could still drive there, of course, and I don't think
| those "condescending elites"* frown too much on a fully
| occupied car (especially with kids), but unless you really
| enjoy road trips and parking lots you probably wouldn't.
|
| > do-nothing social signaling policies for climate change
| (eg. Straws, bottle caps, grocery bags)
|
| Reducing use of plastic is not "do-nothing" for me. I'm not
| sure it has much to do with climate change but I don't want
| microplastics to accumulate in my body or bodies of my kids.
| However, I can agree with you that these are only half-
| measures with good optics.
|
| * Very flattering by the way, I can barely afford a car** but
| if seeing benefits to walkable city planning makes me a bit
| elite I'll take it!
|
| ** If my lack of wealth now makes you think I'm some kind of
| socialist, well I can only give you my word that I am far
| from.
| AnimalMuppet wrote:
| > As someone wrote once, all you need for machine dictatorship
| is an LLM and a critical number of human accomplices. No need
| for superintelligence or robots.
|
| If that dictatorship shows up, the real dictator will be a
| human - the one who hacks the AI to control it. (Whether
| hacking from the inside or outside, and whether hacking by
| traditional means, or by feeding it biased training data.)
| lccerina wrote:
| Source: trust us. This is some bullshit science.
| padjo wrote:
| Well that's one way to solve the replication crisis
| benterix wrote:
| So, we finally found the cure for the replication crisis in
| social sciences: just run them on LLMs.
| consp wrote:
| At least they will confirm the experiments they have been
| trained on.
| somedude895 wrote:
| Maybe that will help extend the veneer of science on social
| studies for a few more years before the echo chamber
| implodes.
| raxxorraxor wrote:
| Problem is that many policy decisions are based on bad science
| in the social sciences, because it provides an excuse. The
| validity is completely secondary.
| jtc331 wrote:
| But does it replicate _better_ than really running the experiment
| again?
|
| Joking...but not joking.
| NicoJuicy wrote:
| That's only for known situations.
|
| Eg. Try LLM's to find availability hours when you have the start
| and end time of each day.
|
| LLM's don't really understand that you need to use the day 1
| endhour and then the starthour of the next day.
| boesboes wrote:
| And yet it can't replicate a human support agent. Or even a basic
| search function for that matter ;)
| 1oooqooq wrote:
| this tells more about how social science data is manipulated than
| the usefulness of llm
| gitfan86 wrote:
| The good news is that they should be able to replicate real world
| events to validate of this is true or not.
|
| Tesla FSD is a good example of this in real life. You can measure
| how closely the car acts like a human based off of interventions
| and crashes that were due to unhuman behavior, as well in the
| first round of the robot taxi fleet which will have a safety
| driver, you can measure how many people complain that the driver
| was bad
| freeone3000 wrote:
| I think it is far, far more likely that it replicates social
| science experiments well enough to simulate people
| pftburger wrote:
| This is gonna end well...
| Piskvorrr wrote:
| Ooooor maybe, testing if the experiments are similar to what was
| in the corpus.
| klyrs wrote:
| Why stop at social science? I say we make a questionnaire, give
| it to the GPT over a broad range of sampling temperatures, and
| collect the resulting score:temperature data. From that dataset,
| we can take people's temperatures over the phone with a short
| panel of questions!
|
| (this is parody)
| jrflowers wrote:
| I love that anyone can just write whatever they want and post it
| online.
|
| GPT-4 can stand in for humans. Charlie Brown is mentioned in the
| Upanishads. The bubonic plague was a spread via telegram. Easter
| falls on 9/11 once every other decade.
|
| You can just write shit and hit post and boom, by nature of it
| being online someone will entertain it as true, even if only
| briefly so. Wild stuff!
___________________________________________________________________
(page generated 2024-08-08 23:02 UTC)