[HN Gopher] New study disavows marshmallow test's predictive powers
___________________________________________________________________
New study disavows marshmallow test's predictive powers
Author : npalli
Score : 73 points
Date : 2022-02-21 20:36 UTC (2 hours ago)
(HTM) web link (anderson-review.ucla.edu)
(TXT) w3m dump (anderson-review.ucla.edu)
| karaterobot wrote:
| A good test for whether a psychological or sociological study may
| turn out to be hard to replicate is: does it make a sweeping
| claim about something as complex as human beings? If it's not
| tentative, incremental, wrapped in caveats and conditionals, I
| don't put much weight in it anymore.
| goatlover wrote:
| Pretty much this.
| dang wrote:
| All: if you're going to post here, can please make sure you're
| not posting a shallow dismissal? Those are the quickest and
| easiest reactions to post, but they're repetitive and boring.
| This site is supposed to be for _interesting_ conversation, and
| that requires new information--not things we 've all heard
| before.
|
| Hint: if you're making a strong, large statement--e.g. an
| emphatic claim about an entire category of things--then it's most
| likely a shallow comment.
|
| https://news.ycombinator.com/newsguidelines.html
| awb wrote:
| Here's a 2011 meta analysis that reports that DRD (delayed reward
| discounting -- basically putting lower importance on delayed
| gratification and instead putting greater importance on immediate
| rewards) is highly associated with addictive personalities:
|
| https://addictions.psych.ucla.edu/wp-content/uploads/sites/1...
|
| > _Conclusions_ These results provide strong evidence of greater
| DRD in individuals exhibiting addictive behavior in general and
| particularly in individuals who meet criteria for an addictive
| disorder.
| erichocean wrote:
| The article, and especially the headline, are extremely
| misleading.
|
| The actual result: measures of self-control either weakly or
| strongly predict positive life outcomes, depending on the measure
| and how much adjusting was done, e.g.
|
| > _[The study] created a new measure of the time each original
| preschooler waited before taking a bite (or getting the reward)
| to adjust for variables such as age, gender and experiment
| conditions._
|
| This study found that the "marshmallow test"--as a single measure
| --is no more or less predictive than a basket of other measures
| of self-control the study tested, or any of those other measures
| of self-control taken alone.
|
| Despite the misleading article and headline, the study itself
| seems well-designed (e.g. pre-registered), but the conclusion in
| the headline is utterly wrong as that is not what the study
| found: self-control matters, can be measured, and those measures
| weakly or strongly predict positive life outcomes.
|
| Here's an accurate headline: Self-control still predicts positive
| life outcomes, Marshmallow Test creator finds.
| antonfire wrote:
| > This study found that the "marshmallow test"--as a single
| measure--is no more or less predictive than a basket of other
| measures of self-control the study tested, or any of those
| other measures of self-control taken alone.
|
| Are we looking at the same study? I don't see where "no more or
| less predictive than a basket" comes from, specifically where
| "no less predictive" comes from.
|
| My reading of the abstract is that, the study found that a
| measure based on the "marshmallow test" ("preschool delay of
| gratification", RND in the article body), is not predictive of
| the outcomes they measured (11 capital formation outcomes).
|
| It also found that a basket of measures of self-control
| (collected at various ages, RNSRI/RNCCQ in the article body)
| _is_ predictive of the outcomes, whether you include the
| preschool measurement or not.
|
| So from skimming the study without even reading the article, it
| sounds to me like they found that the preschool measure doesn't
| predict the outcomes they're measuring by itself, and it
| doesn't contribute predictive power when it's used as part of
| an index of self-control measured at a variety of ages.
| oraphalous wrote:
| Headline seems accurate to me. The headline:
|
| New Study Disavows Marshmallow Test's Predictive Powers.
|
| And it does. The marshmellow test as a single measure is
| referred to as RND. Which is a test which measures
| gratification wait times and is applied in pre-school. Their
| hypothesis regarding RND:
|
| hyp2: On its own, RND (measured around age four) will have only
| a very small correlation with the measures of mid-life capital
| formation.
|
| And they report a confirmation of this hypothesis.
|
| The other hypothesis refers to RNSRI rank-normalized self-
| regulatory index - which is 4 different components measured at
| different ages - each component is RND + 86 other measures!
| This is reported as having a "modest" impact on outcomes - not
| "strong" as you say. So your reporting seems the more
| inaccurate to me.
|
| But this is irrelevant anyway with respect to your claim about
| the headline, which is only referring to the paper's disavowel
| of RND.
|
| Further evidence of the headline's accuracy and the paper's
| disavowel of RND is that they also looked at RNCCQ - which is
| RNSRI minus the inclusion of the RND test from each of the four
| components. They found that including RND did not improve RNSRI
| over RNCCQ in terms of their predictive power.
| KerrAvon wrote:
| The original headline isn't inaccurate, though it is clumsily
| worded and neither it nor your proposed headline fully describe
| the results of the study. The study itself says the following
| (quoting verbatim). Note that the second point is more or less
| what the headline says.
|
| - Self-regulation composite (preschool & ages 17-37) predicts
| capital formation at 46.
|
| - Preschool delay of gratification alone does not predict
| capital formation at 46.
|
| - The composite is more predictive partly because it consists
| of many items.
|
| - No evidence of more predictive power for self-regulation
| reported later in life.
| feanaro wrote:
| Yes, but how would you then inanely riff on psychology, which
| all the cool kids are doing nowadays?
| [deleted]
| nefitty wrote:
| Thank you for the clarification. You've thought about this a
| lot. If a friend asked you for advice on how to help their
| struggling teenage son improve his self-control, what do you
| think you would say?
| renewiltord wrote:
| Oh interesting. Self-regulation is correlated with good outcomes
| but the marshmallow test is a poor test of self regulation. Okay,
| interesting.
|
| What I would enjoy, I think, is taking a monthly test battery and
| uploading that to a central database with other self-researchers
| and then looking at that in a historic sense to derive ideas to
| study. Obviously, since one is post-hoc slicing one will find
| many spurious correlations but perhaps these correlations will
| yield interesting areas to search around. Does anyone know of
| anything like this?
| acchow wrote:
| From the Journal Article: "They included a total of 550 students
| from Stanford's Bing Nursery School, aged about 4 years old
| (ranging from 2 to 6). Many of the participants are children of
| Stanford faculty and staff."
|
| https://www.sciencedirect.com/science/article/pii/S016726811...
|
| How can we conclude anything at all about the general population
| using a sample of Stanford kids?
| gkop wrote:
| This isn't my field, but I think this is par for the course
| unfortunately and a manifestation of a larger issue. Eg
| https://journals.plos.org/plosone/article/file?id=10.1371/jo...
| dahart wrote:
| I like to ask this whenever Dunning Kruger comes up; DK was a
| sample of Cornell undergrads, and the study was one tenth the
| size of the Stanford study. DK participants were volunteers of
| a psych class who got extra credit. Presumably they needed and
| could use extra credit, which may have excluded the A and the F
| students. It's hard to imagine ways to start with more bias, or
| how we can possibly accept this sample as representative of
| humanity.
| ren_engineer wrote:
| Social sciences are all a sham, now think about the trillions
| of dollars of government spending that are based on that same
| sham science as justification. People wonder why so many
| government programs fail, it's because they are built on a
| rotten foundation
|
| https://en.wikipedia.org/wiki/Replication_crisis
|
| >How can we conclude anything at all about the general
| population using a sample of Stanford kids?
|
| it's a well known problem that is rarely brought up, WEIRD
| bias. Most social science research participants are college
| students being bribed with extra credit or gift cards
|
| https://en.wikipedia.org/wiki/Psychology#WEIRD_bias
| brimble wrote:
| Good social science possible, but is difficult and
| (sometimes) expensive. If you can get the same _personal_
| outcome by doing something cheap and easy instead, of course
| that 's what most people are going to do. Fixing that seems
| to be the Big Problem for most of science, for at least the
| last few decades (though, yes, particularly social science).
|
| > People wonder why so many government programs fail
|
| This, though, I'm not so sure about. Do government programs
| fail at a rate greater than those undertaken by other large
| organizations, like corporations or non-profits?
| scotuswroteus wrote:
| David Brooks in shambles
| learn_more wrote:
| Sounds like it is predictive. Just not when:
|
| > Controlling for differences such as household income and
| cognitive abilities ...
|
| So it's a (predictive) IQ test.
|
| Perhaps it disavows the prior assumed basis of "deferred
| gratification", but not the predictive power of the test.
| api wrote:
| Am I overreacting to consider these kinds of psychometric studies
| to be not much better than phrenology?
|
| "Behavioral phrenology" maybe?
| lr4444lr wrote:
| Delayed gratification AFAIK has solid research as a trait
| predictive of many things. That a child's ability at 4 or 5 to
| do it being predictive of their adult self is something else,
| though.
| TrinaryWorksToo wrote:
| There could easily be confounders to that though. Like people
| who are wealthy might be able to delay satisfaction better
| than poor, because their needs are more satisfied.
| dahart wrote:
| Indeed, and the article mentions this. "The Watts study
| findings support a common criticism of the marshmallow
| test: that waiting out temptation for a later reward is
| largely a middle or upper class behavior. If you come from
| a place of shortages and broken promises, eating the treat
| in front of you now might be the better bet than trusting
| there will be more later."
| fancifalmanima wrote:
| To say this more explicitly, even the idea of waiting for
| the second marshmallow being the "preferred" behavior is
| somewhat classist.
|
| Sounds more like the test is just testing for an
| adaptation that happens to be well suited to living in a
| upper-middle class to wealthy environment. If resources
| are scarce, the kid that takes what they can get now
| rather than trusting other people will do better in the
| long run.
| nostrademons wrote:
| A lot of observable phenomena function as positive feedback
| loops, simply because positive feedback loops are usually
| needed to generate effect sizes that become "observable"
| beyond individual variation. It's very likely the being
| able to delay gratification makes you wealthy, which makes
| you better able to delay gratification, which makes you
| wealthier, and so on. And that's why we have discernible
| social classes, where mobility from one to another becomes
| very difficult.
|
| Breaking the feedback loop usually involves doing something
| farsighted, risky, and irrational - for example, risking
| getting fired from your retail job by studying programming
| and applying to software engineering jobs in your downtime,
| or quitting your stable corporate job to found a startup.
| dahart wrote:
| Predictive is synonymous with correlated in a research
| setting, but lay use of that word seems like it runs the risk
| of implying causation. This may be the primary problem with
| the Standford Marshmallow experiment, right? - that delayed
| gratification is highly correlated with socioeconomic status,
| which is well known to be an excellent predictor of future
| socioeconomic status.
| civilized wrote:
| It's an interesting idea. But to me the marshmallow test at
| least had some plausible connection to personality.
|
| But maybe in the days of phrenology, people thought a hooked
| nose* had a plausible connection to personality as well?
|
| Weird to think about.
|
| *Sorry, this is physiognomy not phrenology. The same basic
| point stands though.
| frgtpsswrdlame wrote:
| >But maybe in the days of phrenology, people thought a hooked
| nose had a plausible connection to personality as well?
|
| Phrenology is bumps on the head right? I think hook nose
| would be physiognomy. But yes, the idea was that your
| behavior was due to your brain and your brain was composed of
| many different parts that each controlled different
| propensities or abilities. Then it was just a matter of
| identifying where those propensities lived, in relation to
| the head and then you could feel for the differences from
| person to person across the surface of the head. From the
| naive viewpoint it _is_ plausible, oh, you say the back right
| section of my head, above the ear is a bit larger so the
| self-control portion of my brain is well developed? I 've
| always thought so!
|
| Setting aside the marshmallow test, you can easily see how
| scientific theories about this sort of thing, both right and
| wrong, easily integrate.
| well_i_guess wrote:
| I think that the issue is that there is no true metric for
| "highly marketable talents/traits." One generations genius
| could be another generations average worker, solely because
| market forces eliminate the competitive advantage of certain
| things. Many, many authors seem to lament the distractability
| of the current generations yet I would bet you many of the
| most famous people to Gen Z are incredibly attention-fickle.
| Whereas, 20 years ago, focus would probably be an essential
| skill for key performers.
| fancifalmanima wrote:
| Focus is almost surely an essential skill for key
| performers. Even among the most famous Gen Z -- you don't
| think they focus on their social media presence and what
| they do? What is an 8 hour photo shoot if not focusing? A
| lot of work goes into what social media influencers post,
| its not all done on a whim. There's also plenty of Gen Z
| doing other more traditional work (almost everyone of that
| generation, really). If anything, they've probably had to
| develop coping mechanisms from an extremely early age to
| deal with distraction, compared to prior generations.
| brimble wrote:
| I'm reminded of the SlateStarCodex post that mulls over
| the difference between "real" ADD and just having totally
| ordinary (but pretty great) difficulty focusing on the
| exact same boring crap on a computer screen day, after
| day, after day--especially if, in the latter case, a lot
| of the people these folks are comparing themselves to,
| when deciding that they might have ADD, are _already_ on
| ADD meds (or coke...) for exactly that reason.
|
| If our society needs 1% of the population to be
| accountants (to pick an example) but only 0.1% of the
| population either have incredible focus abilities or
| don't find accounting brain-meltingly dull, then at least
| 90% of accountants are going to feel like they have a lot
| of trouble focusing at work. Once enough start medicating
| (legally or otherwise) it's gonna feel to others like
| they really do have a condition that most don't, but they
| both kinda do (in a practical sense, they _do_ need to
| focus better to keep up with their peers) and kinda don
| 't (in that it's sort of our society that's sick, not
| them--they're just acting like _most_ people would, in
| that situation).
| jrumbut wrote:
| My poorly informed impression is that the key challenge of
| any data driven investigation is striking the balance
| between how hard something is to measure and how close it
| is to what you really want to know.
|
| The marshmallow test was so appealing because it was
| incredibly easy to perform and seemed like it was pretty
| close to a measure of the kind of self-control and
| discipline that's needed to succeed in a variety of life's
| most important challenges.
| DeusExMachina wrote:
| Given the current replication crisis, I would say, not much.
|
| https://en.wikipedia.org/wiki/Replication_crisis
| gumby wrote:
| Yes, I think it's better, for the reason this article explains:
| people are following up and revisiting the conclusions.
|
| Nutrition studies are more like phrenology in that they are so
| hard to do with so many confounding factors that you can't
| really trust any macro conclusion.
| yboris wrote:
| I think your distrust in nutrition studies might stem from
| the fact there are nefarious entities publishing things.
| Various industries have a financial interest in making it
| look like their product, because it contains substance X, is
| beneficial to people. So they can design the most flimsy
| experiment with no pre-registration, and re-run it numerous
| times until they get the result they want.
|
| Lots of conclusions from nutrition studies (especially meta
| analyses) are robust _and_ useful to follow.
|
| Consider the _NutritionFacts_ website as a good starting
| point: a non-profit which has no ads, no industry
| "partners", etc - focused on distilling well-designed studies
| to see what everyday people can use from them.
|
| https://nutritionfacts.org/
| gumby wrote:
| I was not even considering the issue of bad actors. Simply
| that longitudinal, multi-variate studies of sufficient
| scale are essentially impossible to conduct.
|
| Even though nutrition is one of the very oldest, and
| perhaps _the_ very oldest, fields of human study, it still
| remains in the "butterfly collecting" phase of development
| as a science. It's very very hard. I'm glad some people
| try.
| brimble wrote:
| I've got some pretty good predictive powers, myself.
|
| I predict that in twenty years, no matter how thoroughly this is
| debunked, I'll still see this treated as true _constantly_ , and,
| even when what's under discussion is _taking action_ based on its
| being true, I 'll only get eye-rolls and head-pats and plain
| disapproval/loss-of-face for bringing up that it's questionable
| at best, then everyone will go on treating it as true.
| suzzer99 wrote:
| I've never understood the marshmallow test. I'm supposed to sit
| there and stare at a delicious marshmallow for some indeterminate
| amount of time in order to get _one extra marshmallow_? Offer me
| a whole bag and we 'll talk.
|
| I've always wondered if this test measures more of the child's
| willingness to please the researcher, and not so much their
| capacity for delayed gratification.
| mansoon wrote:
| This deserves better study.
| jl2718 wrote:
| "Adding the marshmallow test results to the index does virtually
| nothing to the prognosis, the study finds."
|
| This does not mean that the test is not predictive. It means that
| the index (a bunch of measurements) contains statistical
| dependencies. From a practical view, the marshmallow test result
| depends on many cognitive factors unrelated to self-control. The
| child must understand the instructions, remember them for the
| duration of the test, trust the provider, value the second
| marshmallow, and then make a decision. To be of any value, it
| should have been tested against a standard cognitive battery,
| which it almost certainly would have failed to improve upon.
| Cognitive tests have worked extremely well to predict life
| outcomes for decades now if not centuries.
| ramesh31 wrote:
| Sure. And fifty years worth of other studies have shown its'
| effectiveness. This is meaningless noise in the absence of meta
| analysis.
| awb wrote:
| Related meta analysis:
|
| https://addictions.psych.ucla.edu/wp-content/uploads/sites/1...
|
| > _Conclusions_ These results provide strong evidence of
| greater DRD in individuals exhibiting addictive behavior in
| general and particularly in individuals who meet criteria for
| an addictive disorder.
|
| They don't draw conclusions about causative success, just a
| correlation with addiction.
| rilezg wrote:
| I think the 'golden goose award' page from 2015 (linked in the
| article) gives a better overview of the original research than
| the article:
| https://www.goldengooseaward.org/01awardees/marshmallowtest
|
| A small quote: "But this is not a story of fate - of children's
| long-term success being determined by their self-control as four-
| year-olds. It is a story about how children can change: those who
| are "low delayers" can in fact learn to be "high delayers," and
| gain the life benefits that self-control imparts."
|
| So this is more olds than news, but perhaps it is good to be
| reminded that we all have room to grow (or shrink) from who we
| were at age 4. I personally would bet high-delayers can also
| learn to become low-delayers, and I also would bet there are
| times in life when you would be better off eating the marshmallow
| now instead of investing it for another 30 years at 5% because
| the man in the suit told you to.
___________________________________________________________________
(page generated 2022-02-21 23:00 UTC)