[HN Gopher] The Dunning-Kruger effect is autocorrelation
___________________________________________________________________
The Dunning-Kruger effect is autocorrelation
Author : ljosifov
Score : 557 points
Date : 2023-11-25 18:14 UTC (1 days ago)
(HTM) web link (economicsfromthetopdown.com)
(TXT) w3m dump (economicsfromthetopdown.com)
| Jensson wrote:
| Psychologists using their pet theories to explain results and
| then people taking that explanation as the truth when they should
| really just look at the data is probably an as large problem as
| the replication crisis.
| glitchc wrote:
| Geez, this is eye-opening. Thank you for sharing this.
| tempestn wrote:
| I don't buy this take, and this rebuttal does a better job than I
| could of explaining why: https://andersource.dev/2022/04/19/dk-
| autocorrelation.html
|
| Basically, this autocorrelation take shows that if performance
| and evaluation of performance were random and independent, you
| would get a graph like the D-K one, and therefore it states that
| the effect is just autocorrelation. But in reality, it would be
| very surprising if performance and evaluation of performance were
| independent. We expect people to be able to accurately rate their
| own ability. And D-K did indeed show a correlation between the
| two, just not as strong of one as we would expect. Rather, they
| showed a consistent bias. That's the interesting result. They
| then posit reasons for this. One could certainly debate those
| reasons. But to say the whole effect is just a statistical
| artifact because random, independent variables would act in a
| similar way ignores the fact that these variables aren't expected
| to be independent.
| Jensson wrote:
| The effect that the worst overestimate their skill is known
| since before, that wasn't the main result of Dunning-Kruger.
| The effect that the best underestimate their skill can be
| chalked up to auto-correlation.
| tempestn wrote:
| The best don't tend to overestimate their skill; they
| underestimate it. The D-K results show a consistent bias in
| estimates toward (somewhere near) the mean. Hence an
| overestimate at the bottom and an underestimate at the top.
| Jensson wrote:
| > The best don't tend to overestimate their skill; they
| underestimate
|
| I wrote the wrong word, I fixed it. The best can't
| overestimate their rank, so of course that wasn't what I
| meant.
| anonymouskimmer wrote:
| Dunning-Kruger posits this as a psychological effect, yes?
| On the top half psychological effects such as imposter
| syndrome could come in to play.
|
| Have sociological factors such as being kind or big fish
| little pond been considered as likely causes of the
| misestimates?
| chiefalchemist wrote:
| I have the same question...why do some get it so wrong?
| Was there a nudge in the process of the study that caused
| some to answer what they did?
|
| Heck, I'm wondering if "Honestly, I can't say" was an
| allowed response. Or were they forced to pick a number?
| If so, then I'd want to know what happens when you ask
| 100 ppl to pick a number between 0 and 100. I bet it's
| not evenly distributed. Maybe the beginners give a
| "discounted" version of the distribution?
|
| Even if the autocorrection explanation is off, there does
| now seem to be flaws in DK, at least from the perspective
| of pure and proper science
| svnt wrote:
| The author of this assumes the conclusion in order to decide
| how to analyze his data.
|
| He cannot reasonably say both:
|
| > we have a decision to make: what are we going to assume? How
| are we going to quantify our surprise from the results?
|
| > The first option is, as in the case of the state census, to
| assume dependence between X and Y. I.e. to assume that,
| generally, people are capable of self-assessing their
| performance.
|
| > The second option conforms with the Research Methods 101
| rule-of-thumb "always assume independence." Until proven
| otherwise, we should assume people have no ability to self-
| assess their performance.
|
| > It seems to me glaringly obvious that the first option is
| much, much more reasonable than the second.
|
| -- and -
|
| > most notably the claim that the more skilled people are, the
| better they are at self-assessing their performance. This
| result is supported by their plot, but in any case, my issue is
| not with objections to this claim
|
| and then expect to carry any credibility.
|
| The author of this piece both suggests that a key variable is
| fixed and later admits it varies within the same dataset.
|
| I guess at least they admit it, but this lacks basic self-
| consistency.
| Jensson wrote:
| > The author of this piece both suggests that a key variable
| is fixed and later admits it varies within the same dataset.
|
| I don't see how that variable changes, here is an example how
| the error variable can be exactly the same for everyone and
| reproduce the results:
|
| Lets say the overconfidence is always that you feel 50% of
| those better than you are actually worse than you. So
| everyone is equally overconfident, just that the top wont
| move their own placings as much as the bottom since there are
| much fewer people that they can mistake being worse than
| them. Then apply noise to this and you get the graph Dunning-
| Kruger got.
|
| You could say "But they are better at estimating their
| rank!", but that is just a mathematical artefact, it isn't a
| psychological result. Even if everyone always guessed that
| they are number 1, the better you are the better your guess
| will be, but in that case it is easy to see that everyone
| overestimates their skill in the same way instead of the
| better people having a fundamentally different way of
| evaluating themselves.
| raincole wrote:
| > Lets say the overconfidence is always that you feel 50%
| of those better than you are actually worse than you. So
| everyone is equally overconfident, just that the top wont
| move their own placings as much as the bottom since there
| are much fewer people that they can mistake being worse
| than them. Then apply noise to this and you get the graph
| Dunning-Kruger got.
|
| But the data of original D-K paper shows that the top 25%
| people _underestimate_ their placings. So this whole
| paragraph, while logically true, has little to do with the
| original D-K effect.
|
| > You could say "But they are better at estimating their
| rank!", but that is just a mathematical artefact, it isn't
| a psychological result. Even if everyone always guessed
| that they are number 1...
|
| If everyone always guessed that they are number 1, it's a
| huge psychological result: it means people are extremely
| irrational when it comes to self-evaluation.
| Jensson wrote:
| > But the data of original D-K paper shows that the top
| 25% people underestimate their placings. So this whole
| paragraph, while logically true, has little to do with
| the original D-K effect.
|
| That is what you would expect under my model, due to the
| randomness being limited upwards for the high placings
| but still go downwards. That is the effect the article we
| are talking about refers to when they say
| "Autocorrelation".
| svnt wrote:
| Both analyses seem to agree on one finding: people's skill
| at estimating their own ability increases with that skill.
| It can't be a purely mathematical artifact because you
| would see a tapering at either end, or a narrowing
| distribution of errors at the bottom end, not just a
| narrowing toward the top end.
|
| This should be unsurprising for anyone who has become
| sufficiently skilled at something. Beginners can't even
| discern the differences the experts are discussing, and
| frequently make errors in classes they don't even
| understand.
| chiefalchemist wrote:
| Beginners, by definition, are guessing 100%. Some will
| guess high, others low, and the rest in between. But they
| are all guessing. Perhaps There's a cultural bias to
| over-estimate their skill? Perhaps there's a nudge in the
| process of the study that led them to overestimate?
|
| The lede isn't that people over-estimate their skill
| level. The lede is, why would that be as they have
| nothing else to go on. That is the trigger or triggers?
| And to say, the more experienced estimate better? Well,
| duh.
| contravariant wrote:
| I'm utterly confused. The latter statements it just the
| author explaining which parts they didn't discuss in their
| article; it has no bearing whatsoever on the section before
| it.
| svnt wrote:
| It discloses the cognitive dissonance in his position. He
| seems to be saying both "skill at assessing ability is
| random and mathematically bounded only" while admitting
| "skill at assessing ability changes with ability."
| atleastoptimal wrote:
| The issue is people have differing personal definitions of
| Dunning Kruger. The generally demonstrated effect in the sample
| of people Dunning and Kruger analyzed was "people tend to
| estimate the percentile of their own skill as closer to the
| average than it really is, with a slight bias towards an above-
| average mean. This leads to overestimation of relative ability
| by those in lower percentiles, and the opposite for those in
| higher percentiles"
|
| However when people cite Dunning Kruger in popular culture they
| mean "below average people think they're above average, and
| above average people assume they're below average", which was
| not shown in the original study, and wouldn't show up in an
| analysis attempting to justify it via a misunderstanding of
| autocorrelation.
|
| The general point in the rebuttal is correct. A completely
| noisy graph of people's estimations of their own ability would
| show a Dunning-Kruger resembling residual graph (x-y vs x).
| However, one wouldn't expect people in the 1st percentile to
| have an equal distribution of perceived skill as people in the
| 50th or 99th percentile. If that were true, it would be worth
| reporting.
| ShamelessC wrote:
| > "below average people think they're above average, and
| above average people assume they're below average"
|
| There's no way to know if you're wrong, but when I see it
| used it seems to be pointing out - "some (not all) under
| qualified people tend to defer to their own beliefs rather
| than the views/statements from experts, even when that is
| demonstrably silly."
|
| ^ Referring to the pop-sci interpretation, not in
| disagreement with the general point.
| staunton wrote:
| Which also has nothing at all to do with this study by
| Dunning and Kruger. So you agree with the general point of
| parent.
| ShamelessC wrote:
| Yes. Just clarifying a small disagreement about the pop-
| sci interpretation of the phrase.
| crazygringo wrote:
| Yup. Assuming the sample sizes are statistically significant,
| the original paper clearly shows:
|
| - On average, people estimate their ability around the 65th
| percentile (actual results) rather than the 50th (simulated
| random results) -- a significant difference
|
| - That people's self-estimation _increases with their actual
| ability_ , but only by a surprisingly small degree (actual
| results show a slight upwards trend, simulated random results
| are flat) -- another significant difference
|
| The author's entire discussion of "autocorrelation" is a red
| herring that has nothing to do with anything. Their randomly-
| generated results do _not_ match what the original paper shows.
|
| None of this really sheds much light on to what degree the
| results can be or have been robustly replicated, of course. But
| there's nothing inherently problematic whatsoever about the way
| it's visualized. (It would be nice to see bars for variance,
| though.)
| ketozhang wrote:
| The autocorrelation is important to show that it's
| transformation to D-K plot will always give you the D-K
| affect for independent variables.
|
| However, the focus on autocorrelation is not very
| illuminating. We can explain the behaviors found quite
| easily:
|
| - If everyone's self-assessment score are (uniformally)
| random guesses, then the average self-assessment score for
| any quantile is 50%. Then of course those of lower quantile
| (less skilled) are overestimating.
|
| - If self-assessment score vs actual score are dependent
| proportionally, then the average of each quantile is always
| at least it's quantile value. This is the D-K effect, which
| is weaker as the correlation grows.
|
| -The opposite is true for disproportional relation.
|
| So, the D-K plot is extremely sensitive to correlations and
| can easily over-exaggerate the weakest of correlations.
| cortesoft wrote:
| > That people's self-estimation increases with their actual
| ability, but only by a surprisingly small degree (actual
| results show a slight upwards trend, simulated random results
| are flat) -- another significant difference
|
| If everyone thinks they are slightly above average, isn't
| this inevitable? If everyone thinks they are slightly above
| average, people who are slightly above average are going to
| be the most accurate at predicting where they land?
| zuminator wrote:
| Even if "people tend to slightly overrate their own
| ability," was the only takeaway, it would still refute the
| author's conclusion that DK has nothing to do with human
| psychology.
| IanCal wrote:
| Yes but then you'd see a flat line for people's estimates,
| which wasn't the result.
| cycomanic wrote:
| Have you not just summarized the Dunning-Kruger effect in
| other words?
|
| That essentially follows from everyone assume they are
| slightly above average. That's also the crux of the
| refutation and why the whole autocorrelation is a red
| hering, even if we all would just self assess completely
| randomly, that actually confirms the Dunning-Kruger effect
| is real (because if we self assess randomly worse
| performance are more likely to overestimate).
|
| We could argue that this is not surprising, but the
| "surprising" bit is that the curves show that better
| performers are actually more skilled at assessing their
| performance, which incidentally was also confirmed by the
| followup studies.
| quetzthecoatl wrote:
| Is it though? Everyone overestimating their ability a bit
| isn't DK effect. It's when people with less knowledge and
| ability vastly over estimate their ability (because they
| don't know how little they know - while others do), and
| the opposite for those who are truly more able and
| knowledgeable (again because they understand how vast the
| topic is and though they know more and are capable more
| than the average person, they also understand how little
| they truly know compared to what they don't know)
| dahart wrote:
| > If everyone thinks they are slightly above average, isn't
| this inevitable? If everyone thinks they are slightly above
| average, people who are slightly above average are going to
| be the most accurate at predicting where they land?
|
| Yes, it's inevitable. And this study only asked Cornell
| undergrads what they think of themselves - people who were
| taught to believe they are above average, and also people
| who got into a selective school and probably all had higher
| than average scores on standardized tests. Is it surprising
| in any way that this group estimated their ability at above
| average?
| somenameforme wrote:
| > "On average, people estimate their ability around the 65th
| percentile (actual results) rather than the 50th (simulated
| random results) -- a significant difference"
|
| This is a different issue than D-K. The D-K hypothesis is
| that self assessment and actual performance are less
| correlated for weaker than higher performing individuals.
| People think they're better than average is a different (and
| much less controversial) bias.
|
| ---
|
| [DK-Effect] : I totally know I scored at least a 30% on that
| test, and that's certainly way better than average (it's
| not). [Actually scored 10%]
|
| [No DK-Effect] : I totally know I scored at least a 30% on
| that test, and that's certainly way better than average (it's
| not). [Actually scored 30%]
|
| ---
| kstenerud wrote:
| > The D-K hypothesis is that self assessment and actual
| performance are less correlated for weaker than higher
| performing individuals.
|
| Isn't that what the graph shows? The bottom quartile group
| is guessing almost 50 percentile points higher than their
| actual performance, whereas the top quartile is at most 15
| points off.
|
| They're all guessing somewhere between the 60th and 75th
| percentiles (i.e. "I'm a bit better than average") - with
| some upwards trend since the high performers seem to at
| least know they have some skill, although not very
| accurately. It's just that for the poor performers, a guess
| of the 60th percentile wayyy off the mark.
| somenameforme wrote:
| EDIT: Something important for the rest of this post. In
| case it's not clear, the graph is showing your percentile
| ranking within the group - not your actual score.
|
| Nope, because there's an interesting statistical trick in
| play. Imagine you take 100 highly skilled physicists and
| give them some lengthy series of otherwise relatively
| basic physics questions. Everybody is going to rate their
| predicted performance as high. But some people will miss
| some questions simply due to silly mistakes or whatever.
| And those people would end up on the bottom 10% of this
| group, even if the difference between #1 and #100 was
| e.g. 0.5 points. Graph it as D-K did, and you'd show a
| huge Dunning Kruger effect, even when there is obviously
| nothing of the sort.
|
| In fact the _fewer_ differences in ability within a
| group, and the greater the relative ease of a task, the
| _bigger_ the Dunning-Kruger effect you 'd show. Because
| everybody will rate themselves relatively high, but you
| will always have a bottom 10%, even if they are
| practically identical to the top 10%.
|
| You can see this most clearly in the original paper. They
| carried out 4 experiments. The one that was most
| objective and least subject to confounding variables was
| #2, where they asked people a series of LSAT based logic
| questions, and assessed their predicted vs actual
| results. And there was very little difference. Quoting
| the paper, "Participants did not, however, overestimate
| how many questions they answered correctly, M = 13.3
| (perceived) vs. 12.9 (actual), t < 1. As in Study 1,
| perceptions of ability were positively related to actual
| ability, although in this case, not to a significant
| degree." Yet look at the graph for it, and again it shows
| some seemingly large D-K effect.
|
| And there's even more issues with D-K, and especially
| experiment #1 (which is the one with the prettiest graph
| by far), but that's outside the scope of this post. I'm
| happy to get into it, if you are though. I find this all
| just kind of shocking and exceptionally interesting! I've
| referenced the D-K effect countless times in the past,
| never again after today!
|
| [1] - https://sci-hub.se/10.1037/0022-3514.77.6.1121
| dahart wrote:
| Yes yes yes! I'm in the very same boat, and came to an
| epiphany that the ranking trick here, combined with some
| subjective questions (ability to appreciate humor -
| seriously!?), that these things hide almost everything
| about actual skill. Not only does it amplify mistakes, it
| also forces the participants to have to know something
| about their cohort. Having to guess your ranking fully
| explains the less than perfect correlation. It also
| undermines all claims about competence and incompetence.
| They're not testing skill, they're only testing ability
| to randomly guess the skill of others.
|
| What about the slight bias upwards? Well, what _exactly_
| was the question they asked? It's not given in the paper.
| They were polling only Cornell undergrads looking for
| extra credit. What if the question somehow accidentally
| or subtly implied they were asking about the ranking
| against the general population, and then they turned
| around and tested the answers against a small Cornell
| cohort? I just went and looked at the paper again and
| noticed that the descriptions of the ranking question
| changed between the various "studies" with the first one
| comparing to the "average Cornell student" (not their
| experiment cohort!). The others suggest they're asking a
| question about ranking relative to the class in which
| they're receiving extra credit. Curiously study 4 refers
| to the ranking method of study 2 specifically, and not 3.
| The class used in study 4 was a different subject than 2
| & 3. How they asked this question could have an enormous
| influence on the result, and they didn't say what they
| actually asked.
|
| Cornell undergrads are a group of kids that got accepted
| to an elite school and were raised to believe they're
| better than average. Whether or not all people believe
| they're better than average, this group was primed for
| it, and also have at least one piece of actual evidence
| that they really are better than average. If these were
| majority freshmen undergrads, they might be especially in
| calibrated to the skills of their classmates.
|
| In short, the sample population is definitely biased, and
| the potential for the study to amplify that bias is
| enormous. The paper uses suggestions and jumps to
| hyperbolic conclusions throughout. I'm really surprised
| that evidence and methodology this weak claims to show
| something about all of humanity and got so much
| attention.
| dahart wrote:
| > The D-K hypothesis is that self assessment and actual
| performance are less correlated for weaker than higher
| performing individuals.
|
| I'm not sure that's an accurate summary. The correlation of
| the perceived ability is effectively the slope of the line,
| and the slope is more or less constant. The paper suggests
| that the _bias_ of the bottom quartile is higher than the
| bias of the upper quartile, not that the correlation is any
| different.
|
| But it's strange that the DK paper makes an example of the
| lower performers, since the bias of the scores appears to
| be constant; it appears the high performers have pretty
| much the same bias as the low performers -- it's a
| straightish line that goes through 65% in the middle rather
| than the expected straight line that goes through 50% in
| the middle. If the 'high performers' had a different bias,
| then the line wouldn't be so straight.
| JamesBarney wrote:
| Yeah my understanding is
|
| 1. the slope of self-perceived ability is lower than
| actual ability
|
| 2. The y intercept is dependent on difficulty of test
|
| Therefore with an easier test the better testies are more
| accurate, and with a very difficult test the worse
| testies are more accurate because of where the lines
| intersect. Meaning DK is artifact of test difficulty.
|
| This also means if the test was difficult enough you
| could create a bizarro-DK effect where the better testies
| were less accurate.
| dahart wrote:
| For 1, the data is based on guessing, so it's zero
| surprise that self-perceived ability doesn't correlate
| perfectly with actual ability. It would be extremely
| surprising and unbelievable if the slopes were the same,
| right?
|
| For 2, the DK paper shows one thing, but the replication
| attempts have show this effect doesn't even exist for
| very complex tasks, like being an engineer or lawyer. The
| DK effect doesn't generalize, and doesn't even measure
| exactly what it claims to measure, which is why we don't
| need to speculate about the bizarro-DK reversal effect -
| we already have evidence that it doesn't happen, and we
| already have a big enough problem with people mistakenly
| believing that DK showed an inverse correlation between
| confidence and competence, when they did no such thing.
| dragonwriter wrote:
| > This is a different issue than D-K.
|
| No, its literally the D-K finding.
|
| > The D-K hypothesis is that self assessment and actual
| performance are less correlated for weaker than higher
| performing individuals
|
| That may have been a _hypothesis_ Dunning and Kruger had at
| some point, its not the effect they actually identified
| from their research. But I don 't think its even that, its
| an "effect" people have associated with D-K because they
| heard discussion of the D-K research that got dustorted at
| multiple steps from the original work, and then that
| misunderstanding, because it made a nice taunt, replicated
| widely and became popular.
| dahart wrote:
| To be fair, the paper itself uses hyperbolic language
| that completely distorts it's own data. It heavily pushes
| and leads the reader into one possible dramatic
| explanation for their results, while downplaying and
| ignoring a bunch of other less dramatic explanations.
| Using words like "incompetent" are almost completely
| unfounded based on what they actually did. Section
| headings like "competence begets calibration", "it takes
| one to know one", and "the burden of expertise" are
| uncurious platitudes and jumping to conclusions. I'm
| kind-of stunned at the popular longevity of this paper
| given how unscientific is it and how often replication
| results with better methodology have shown conflicting
| results.
| somenameforme wrote:
| This is straight from their paper [1]:
|
| "Perhaps more controversial is the third point, the one
| that is the focus of this article. We argue that when
| people are incompetent in the strategies they adopt to
| achieve success and satisfaction, they suffer a dual
| burden: Not only do they reach erroneous conclusions and
| make unfortunate choices, but their incompetence robs
| them of the ability to realize it."
|
| [1] - https://sci-hub.se/10.1037/0022-3514.77.6.1121
| singingfish wrote:
| > assuming the sample sizes are statistically significant
|
| Nitpick: should read "assuming the sample sizes provide
| sufficient statistical power"
| IAmGraydon wrote:
| So what we have here is some scientists trying to prove that
| the Dunning-Kruger effect doesn't exist and instead they give
| us a perfect example of the Dunning-Kruger effect.
| wyldfire wrote:
| > The irony is that the situation is actually reversed. In
| their seminal paper, Dunning and Kruger are the ones
| broadcasting their (statistical) incompetence by conflating
| autocorrelation for a psychological effect. In this light,
| the paper's title may still be appropriate. It's just that it
| was the authors (not the test subjects) who were 'unskilled
| and unaware of it'.
| t_mann wrote:
| I was surprised by the figure from the original article, imho
| that's the strongest rebuttal: perceived ability grows strictly
| mononotonically with actual ability, no sign of the famous non-
| monotonic U-curve. Yeah, the slope is less than one, and it
| grows a bit faster from the second to the third quartile than
| from the first to the second, but none of that changes the fact
| that people tend to slot themselves correctly. The chart is
| interesting in that it confirms that everyone perceives
| themselves to be slightly above average in terms of ability,
| which of course can't be true in practice. But what it also
| shows is that when they think they'll be below or above that
| (false) baseline, they're actually correct about it. So pretty
| much the exact opposite of what the Dunning-Kruger effect
| claims.
| jampekka wrote:
| The slope will be less than one if there's e.g. any random
| guessing in the test even if the self-assesment is perfect
| (apart from whether they know if their guess is right or
| wrong of course) [1].
|
| I think this is the effect that the post is dancing around,
| but doesn't seem to really understand (and how
| "autocorrelation" and indepence are discussed is very
| nonstandard to be charitable).
|
| [1] https://en.m.wikipedia.org/wiki/Regression_dilution
| t_mann wrote:
| I agree, the statistical analysis in the original post
| makes me very uneasy. I think it could be a case where the
| conclusion is correct, even though argument isn't
| necessarily.
|
| And yes, the fact that the slope is less than one is fairly
| uninteresting.
|
| The real problem here is that the Dunning-Kruger effect, as
| it's classically stated, claims that if you asked four
| people to rank themselves in terms of ability, the result
| would be 1-3-2-4, ie the people who know a little would put
| themselves above the people who know a lot but aren't quite
| experts. The problem is the data shows that they'd actually
| rank themselves correctly 1-2-3-4. But such a boring
| finding probably wouldn't have made the authors quite as
| famous, which might be why they tried bit of data mangling,
| and they found this really cool story that everyone would
| secretly love to be true.
|
| Which is a shame, because I think the fact that the mean of
| perceived ability is too high (and the variance too low) is
| really interesting too, and perfectly supported by the raw
| data.
| jampekka wrote:
| Yes. The methodology in the original D&K is quite shoddy,
| and vulnerable to e.g. good old regression to the mean,
| and the interpretations are too strong. This is sadly
| very common in psychology (and many other fields I'd
| guess) and even researchers don't care so much if the
| story is juicy enough.
|
| The pop version of the DK effect seems to be something
| like a 4-3-2-1 ranking, which is obviously not supported
| by the data.
| tempestn wrote:
| But they wouldn't. They'd rank themselves something like
| 1,2,2,3. We're not dealing with a population
| collaborating to all rank themselves in order, but rather
| each person individually estimating where their abilities
| lie in the population.
|
| The point is that if you ask someone in the, say, 5th
| percentile of ability what their ability is compared to
| the population, they might say 25th percentile. Ask
| someone at the 25th,and they might say 40th. At the 40th
| they could say 55th. And at the 90th, maybe they'll say
| 80th. So yes, if you order their guesses, they will be in
| roughly the correct order. But, crucially, that doesn't
| mean that they are ranking themselves correctly!
| tempestn wrote:
| > The chart is interesting in that it confirms that everyone
| perceives themselves to be slightly above average in terms of
| ability, which of course can't be true in practice.
|
| No, everyone biases their self-assessments _toward_ a point
| slightly above the mean. That 's not the same as saying
| everyone thinks they're slightly above average, nor that
| people's self-assessments have no predictive power
| whatsoever. The lowest performers still think they're below
| average, just not as much as they should. The highest
| performers still think they're considerably above average.
| But they all have a bias toward (slightly above) the middle.
|
| So yes, people are generally correct in the direction that
| they deviate from that median self-assessment, but that just
| shows that people's self-assessments aren't completely
| without basis. Which D-K certainly didn't claim.
| t_mann wrote:
| D-K claim a non-monotonic relationship, which simply isn't
| supported by that data, as you yourself point out: people
| rank themselves correctly (ordinally). I didn't mean to say
| that all self-assessments are the same, if that was the
| misunderstanding. My point is that the self-assessments
| indeed are meaningful, even more so than D-K claim.
| RevEng wrote:
| Check the original paper by D-K. Fix only focused on the
| first plot which has a monotonically increasing trend.
| The later plots show varying degrees of nonmonotonicity,
| though sadly they don't include error bars to indicate
| how statistically significant the differences between
| groups is.
| zeroonetwothree wrote:
| But we don't know their true ability, only the results on
| one test. It could be they accurately predicted their
| ability but because of random chance they did better/worse
| than their guess. Then you would get the exact data that is
| observed.
| lokar wrote:
| I thought they were estimating their performance on the
| test relative to others. There was no "real world"
| element.
| raincole wrote:
| > And D-K did indeed show a correlation between the two, just
| not as strong of one as we would expect. Rather, they showed a
| consistent bias. That's the interesting result.
|
| "D-K effect in its original form" vs "D-K effect in pop
| culture" is the biggest D-K effect live example. Of course I
| mean D-K effect in pop culture here.
|
| Interestingly, the "interesting" part of the original result is
| that the correlation between actual performance and perceived
| performance is less than people intuitively think.
|
| But as the "D-K effect in pop culture" spreads, people's
| collective intuition changes. Today if you explained the
| original D-K effect to a random person on the internet, they
| might find it interesting because the correlation is _greater_
| than they thought: they thought the correlation would be
| negative!
| hoosieree wrote:
| D-K effect effect is almost as entertaining as the Butterfly
| effect effect[1].
|
| [1]: Which is the far-away effect attributed to having
| watched the movie The Butterfly Effect.
| expazl wrote:
| > But in reality, it would be very surprising if performance
| and evaluation of performance were independent. We expect
| people to be able to accurately rate their own ability.
|
| This seems to be attacking an irrelevant point in the analysis.
| The argument goes as such: Researcher carries out all the
| studies needed to prove the Dunning-Kruger effect, then trips
| and drops all the results into a vat of acid. But he's ashamed
| and quickly generates random numbers for the results, and
| somehow the data still proves the Dunning-Kruger effect. Not
| just that, repeating the same exercise again and again with
| completely random data leads to the same result, the effect is
| always present. So is the Dunning-kruger effect so powerful
| that it exists in the very fabric of the universe devoid of any
| human interaction, or is something amiss?
|
| In this situation we are forced to look at the test we have
| that concluded from the data that the Dunning-Kruger effect
| exists and conclude that it's a bad test, we need something
| different.
|
| You seem to be arguing "oh no, you can't look at random data,
| because we wouldn't expect the experiment to yield random
| data!". But that doesn't work as an argument for why the test
| should still be considered good. If it's supposed to have any
| worth, then the test has to be able to come to one of two
| conclusions: The Dunning-Kruger effect exists or the Dunning-
| Kruger effect doesn't exist. And if the test is set up such
| that for positive experimental results, or just random noise,
| it comes out in the positive, and only in extremely unlikely
| and a narrow band of the possible outcome space come out
| negative, then the test is bad.
|
| If we want to try to rephrase everything a bit to make the
| issue much clearer. Lets set up a coin-toss competition between
| ChatGPT and a group of 100 people. Each participant goes 1:1
| against ChatGPT where both parties toss a coin and whoever has
| the most heads wins, on draws toss again, in case a pair goes
| into an infinite loop that doesn't end before our allotted
| trial time, they get removed from the study. A human assistant
| tosses on the behalf of ChatGPT on account of it not having
| arms yet.
|
| Now we ask each person how they would rate their ability vs.
| ChatGPT in a coin-toss, everyone answers 50/50, for obvious
| reasons.
|
| So we run the experiment, the line for "ability plotted against
| ability" is a straight diagonal line. The line for estimated
| ability vs actual ability is a a straight flat line at 50%.
|
| Eureka! To the presses! we have just proven the Dunning-Coin-
| Kruger effect! People who are worse at throwing coins tend to
| over estimate their ability, and people who are better at
| throwing coins underestimate their ability! What a marvelous
| bit of psychological insight, it really tells us something
| about how the human mind works, and has broader insights about
| our society! But naturally we always expected this outcome,
| people who are bad a tossing coins are dumb and of cause they
| are overconfident, not like people who are good at tossing
| coins who have a remarkable Intellect about themselves and are
| therefore humble in their self estimation... and so on and on
| about preconceived biases that have nothing to do with the
| actual test we performed.
| cool_dude85 wrote:
| Yeah this must be some high end satire where the guy Dunning-
| Krugers up an explanation of Dunning-Kruger. Since even an
| economist is supposed to understand ANOVA I have to conclude
| that this article is a joke.
| nickelpro wrote:
| The incorrect usage of "autocorrelation" made me double take
| and wonder if this was satire the first time it was posted.
| xpe wrote:
| The rebuttal by Daniel (andersource.dev) is useful, generally.
| However, when he writes ...
|
| > The history of statistics is well out of scope for this post,
| but very succinctly, my answer is that statistics is an attempt
| to objectively quantify surprise.
|
| ... I cannot agree. Statistics is not this; it is much broader.
| One may or may not be surprised by particular statistics, sure,
| but there are _specific_ concepts that map more directly to
| surprise, such as entropy from information theory.
| vasco wrote:
| If entropy is defined as statistical disorder than I think
| the definition of "quantifying surprise" is great.
| xpe wrote:
| You aren't suggesting that statistics as a field defined a
| notion of "order", prior to thermodynamic entropy or
| Shannon entropy, are you? To me, that would be circular.
|
| Based on my knowledge, it seems likely the first published
| quantification of disorder arose in the study of
| thermodynamic entropy. Later, Shannon defined entropy in
| information-theoretic terms, independent of physics. It can
| be interpreted as a notion of 'surprise' or what he called
| information.
|
| My claims:
|
| First, the field of statistics is _not_ historically rooted
| around concepts such as: "order/ordering" or
| "information/surprise".
|
| Second, the field of statistics, as a directed graph of
| abstractions, is not rooted in ordering nor surprise.
|
| Third, in teaching statistics, practically or conceptually,
| the concept of surprise isn't foundational. The idea of
| _variation_, on the other hand, is central.
|
| I'll add a few more comments. To talk meaningfully about
| 'surprise', there has to be a stated or assumed baseline or
| 'expectation' about what is _not_ surprising. For Shannon,
| if the probability of an event is certain, there is no
| surprise. Probability and statistics work together, but
| they are conceptually separable. This is particularly clear
| when you compare descriptive statistics with, say,
| probabilities over combinatorics problems.
| vasco wrote:
| > The field of statistics is not organized around
| concepts relating to "order" or "ordering".
|
| Sure but reduced to the simplest form, statistics are
| used to predict things, the most basic thing in the
| Universe being "is this particle gonna stay put or move a
| little in a given direction", which is related to
| entropy, so to me intuitively these two things seem very
| related. The fact that in statistics we don't use the
| words "order" and "disorder" doesn't mean it doesn't
| reduce to that.
|
| Btw I'm an electrical engineer that isn't amazing at
| statistics or thermodynamics so beware I might just be
| talking nonsense.
| xpe wrote:
| > ... reduced to the simplest form, statistics are used
| to predict things
|
| Inferential statistics is not the simplest kind of
| statistics. Descriptive statistics are both simpler and
| foundational for inference.
|
| P.S. I should say that I am a bit of a stickler regarding
| discussions along the lines of e.g. "these things are
| related". Yes, many things are related, but it is really
| nice when we can clearly tease things apart and specify
| what depends on what.
| zeroonetwothree wrote:
| This rebuttal seems weak because it's using unbounded datasets
| (population). A big issue with the DK research is using bounded
| data (test scores). For example if I get 100% right it's
| mathematically impossible to have overestimated.
| kkoyung wrote:
| I agree. Using the terminologies from the author, the DK paper
| was trying to show that dy/dx < 1 = dx/dx, rather than the
| correlation of y-x vs x.
| bradley13 wrote:
| I have to agree. You cannot separate the statistical analysis
| from the _meaning_ of the study. In the article, the author 's
| random data is _exactly_ an extreme replication of Dunning-
| Kruger. Why? Because, in his random data, people with low test
| scores almost always overestimate their ability, while people
| with high test scores almost always underestimate.
|
| That is precisely the premise of the Dunning-Kruger effect. The
| fact that the original Dunning-Kruger paper shows a less
| extreme effect? That just shows that people are slightly better
| than random at estimating their own abilities - but still
| nowhere accurate.
| jgilias wrote:
| So that's what the Dunning-Kruger effect basically boils down
| to, right? That people in general are just bad at assessing
| their skills.
| mnky9800n wrote:
| I really appreciate that he points out that the use of the term
| in the original article of autocorrelation is nonstandard.
| Because it is nonstandard but it's a rather flippant way to
| dismiss the rest of the article.
| somenameforme wrote:
| I found two very interesting things in the original D-K paper
| [1] that challenge your otherwise reasonable point. The first
| is that the graph everybody associates with D-K, the one
| showing the beautifully perfect linear result, is one of 4. The
| other 3 graphs are far messier, and indeed the paper discusses
| the fact that the correlations tend to be weaker and in some
| cases nonexistent.
|
| The second thing is that that beautiful perfectly linear graph
| everybody references, was measuring 'humor'!!! Humor is going
| to be something that's all but guaranteed to create near
| complete noise between self evaluation and 'expert'
| (professional comedians in this case) evaluation. And if
| everybody is essentially randomly guessing on their
| performance, then it will always show an extremely strong D-K
| effect with the top performers underestimating themselves, and
| the bottom performers overestimating themselves.
|
| The experiment that most simply and directly measured
| 'intelligence', without complicating matters in a potentially
| confounding fashion, is #2. It was based on logic problems from
| the LSAT. And the resultant graph is just all over the place.
| Quoting the paper's evaluation of this study:
|
| ---
|
| "Participants did not, however, overestimate how many questions
| they answered correctly, M = 13.3 (perceived) vs. 12.9
| (actual), t < 1. As in Study 1, perceptions of ability were
| positively related to actual ability, although in this case,
| not to a significant degree."
|
| ---
|
| This is really looking like another Zimbardo.
|
| [1] - https://sci-hub.se/10.1037/0022-3514.77.6.1121
| mike_hearn wrote:
| Yes, D-K is another one of those "classic" psychology studies
| that everyone knows about but is actually rubbish and
| shouldn't be cited for anything. You're not the first to
| notice this, I pointed it out on HN last year:
|
| https://news.ycombinator.com/item?id=31119836
|
| At some point I should write up a proper blog post on the D-K
| paper in the hope that it eventually surfaces in search
| results, because it's past time for this paper to be put to
| bed. The problems you cite aren't even the full set. The
| whole thing was (of course) a study on a handful of psych
| undergrads, their selection method for expert comedians has
| circular logic in it and it all goes downhill from there.
| gwd wrote:
| > And D-K did indeed show a correlation between the two, just
| not as strong of one as we would expect. Rather, they showed a
| consistent bias. That's the interesting result.
|
| Right, so:
|
| 1. If the data were truly random, with no correlation, we'd
| expect the line to be straight across the middle, with the
| first quartile at 50% and the last quartile also at 50%
|
| 2. If the data were 100% accurate and precise [1], we'd expect
| the line to be diagonal, with the first quartile at 12.5% and
| the last quartile at 87.5%.
|
| 3. If the data were accurate but not precise (i.e., basically
| right but with some randomness built in), we'd expect the line
| to be in between #1 and #2 -- basically, changing from #2 into
| #1 as the randomness increases, but with the intersection at
| 50%.
|
| That's because someone in the 2nd percentile _can 't_
| underestimate themselves as much as they can overestimate
| themselves, and someone in the 98th percentile can't
| oversetimate themselves as much as they can underestimate
| themselves. But in any case, the "0 bias" case looks symmetric.
|
| 4. But what we actually see is none of the above: we see the
| 1st quartile being at (eyeballing the chart) 60%, and the last
| quartile at 75%.
|
| That shows that there is indeed some ability for self-
| evaluation, but it's off. The fourth quartile could indeed just
| be random, the effect of clipping at the top meaning that the
| upper quartile _cannot_ overestimate themselves as much as they
| understimate themselves. But there 's no getting around the
| fact that the bottom quartile are overestimating themselves.
|
| [1] https://en.wikipedia.org/wiki/Accuracy_and_precision
| bitshiftfaced wrote:
| > But there's no getting around the fact that the bottom
| quartile are overestimating themselves.
|
| It's because higher competence goes along with more accurate
| self-assessment but not less bias. So the high performers
| underestimate with less magnitude than the low performers
| overestimate, but they both under and over estimate
| themselves with the same frequency.
| dmbche wrote:
| Isn't it ironic that they fooled themselves?
| ulizzle wrote:
| It was actually hilarious but I don't think many people here
| got the irony
| DangitBobby wrote:
| Literally the closing paragraph of TFA is about that exact
| irony.
| robwwilliams wrote:
| And here it is from OP (which made me laugh--right or
| wrong). And leave your hubris at home unless you rate
| yourself a damn fine statistician ;-)
|
| "However, there is a delightful irony to the circumstances
| of their blunder. Here are two Ivy League professors7
| arguing that unskilled people have a 'dual burden': not
| only are unskilled people 'incompetent' ... they are
| unaware of their own incompetence.
|
| "The irony is that the situation is actually reversed. In
| their seminal paper, Dunning and Kruger are the ones
| broadcasting their (statistical) incompetence by conflating
| autocorrelation for a psychological effect. In this light,
| the paper's title may still be appropriate. It's just that
| it was the authors (not the test subjects) who were
| 'unskilled and unaware of it'.
| lencastre wrote:
| Wasn't this DK effect already debunked?
| jahewson wrote:
| I don't know much about it but I'm sure you're right.
| hasch wrote:
| Article mentions 2016 somewhere. They explain a bit on top of
| that, with more depth ... at least my rough take on this
| xbar wrote:
| Yes. This article highlights the 2016, 2017 and 2020 debunkings
| of DK. But it hangs on as an oft repeated scientific fallacy.
|
| The fact that anyone has to ask if it has debunked shows how
| desirable some people find the DK myth. Even in the comments
| here, people are not willing to be skeptical of DK. That's
| interesting psychology.
| mrkeen wrote:
| Yes but some claim to have debunked the debunking also. [1]
|
| This paper (2023) claims "the magnitude of the effect was
| minimal; bringing its meaningfulness into question." [2]
|
| [1] https://andersource.dev/2022/04/19/dk-autocorrelation.html
|
| [2]
| https://www.sciencedirect.com/science/article/abs/pii/S01602...
| pie_flavor wrote:
| This take is a perfect example of Dunning-Kruger itself,
| ironically. https://andersource.dev/2022/04/19/dk-
| autocorrelation.html
| dahart wrote:
| How so? DK shows a positive correlation between confidence and
| competence.
| mewpmewp2 wrote:
| My take on Dunning Kruger:
|
| 1. People really like the idea of smart people being humble and
| arrogance meaning stupidity, so they like to believe that DK is
| true, and they like to repeat this.
|
| 2. Some smart/skilled people are humble, some are arrogant.
|
| 3. Some smart/skilled people underestimate their skills, some
| overestimate.
|
| 4. Some stupid people are humble, some are arrogant.
|
| 5. Some stupid people underestimate their skills, some
| overestimate.
|
| Overall, even if there is a correlation, you can't tell by just
| arrogance of a person whether we are dealing with DK or whether
| it's an effect at all. People's personalities, skills and
| everything are a bit more complex than that.
|
| Overall bringing DK up seems like some sort of social
| justice/fairness effort rather than something that is actually
| true given any situation where someone is arrogant.
| spacebacon wrote:
| Maybe this shows how effective dumb people are at keeping smart
| people hammered down with thought stopping arguments.
| greenthrow wrote:
| Lmao this article is an example of Dunning-Kruger at work. The
| author thinks they have found and are revealing something but
| they are just failing to fully understand the subject. Amazing.
| flappyeagle wrote:
| Try reading the article again and understanding the argument.
| greenthrow wrote:
| Oh I did. Completely.
| mattxxx wrote:
| Wait... but what if this is DK? What if my comment is DK??
| joefourier wrote:
| So from my understanding, the Dunning-Kruger Effect paper doesn't
| show the distribution of the perceived test scores nor the
| standard deviation, only an average, which rises with actual test
| score level.
|
| If they showed the spread bar in each bin, you could form very
| different conclusions. Do low skilled people consistently
| estimate their score at around 60, or do they give effectively
| random results centred around 60?
|
| Assuming the latter, it could mean that low skilled individuals
| are completely unable to evaluate their performance while higher
| skilled people are slightly better at it but still not very good,
| giving a slightly positive correlation which... is very distinct
| from what the DK effect implied.
| xanderlewis wrote:
| Naive take: I've always felt like Dunning-Kruger is just the
| result of the fact that when guessing the value of anything
| people tend towards some common mean, and so if the true value is
| low your guess tends to be high, and vice versa. This assumes
| nothing about what is being guessed, but does assume (perhaps
| wrongly) that there is a commonly believed mean value and that
| people tend to imagine they are close to it.
| wavemode wrote:
| That's essentially the plain-language interpretation of what
| the author of this article is pointing out - when you plot
| (actual score) against (difference between test score and
| actual score), you will always find a trend that
| underperformers overestimate and overperformers underestimate -
| for the exact reason you state.
| r0uv3n wrote:
| The discussion between Nicolas Boneel and the author in the
| comments of the article is interesting and Nicolas expresses the
| doubts I had when reading this. The whole point of the DK effect
| is that people are bad at estimating their skill, so if you
| assume that they randomly guess their skill level then of course
| you will replicate the results.
|
| The correct model for a world without DK should be something like
| (estimated test scores)=(actual test scores)+noise, and then the
| only form of spurious DK you'd expect is caused by the fact that
| there's a minimum and maximum test score. But this effect would
| be proportional to the variance of the noise, and I assume the
| variance on the additional dataset is too low to fully understand
| the effect seen there.
|
| Also, in this model on average everyone should still guess
| correctly in which half of the distribution they are, but even
| the bottom quartile seemed to estimate their abilities as above
| the 50th percentile
| svnt wrote:
| Just because the data appear random doesn't mean you've gotten
| at the cause though.
|
| From those charts it could equally be low skill throughout, or
| something nuanced like lack of skill at estimating at the
| bottom, improving skill in estimating through the middle, and
| high skill and learned modesty at the top.
| Jensson wrote:
| > Also, in this model on average everyone should still guess
| correctly in which half of the distribution they are, but even
| the bottom quartile seemed to estimate their abilities as above
| the 50th percentile
|
| Depends on the noise applied. If the noise is -10% to +100% for
| everyone then you get roughly the graph Dunning-Kruger got. So
| there is no reason to believe that the best are better at
| estimating their abilities, just that you can't estimate your
| own rank as better than the best.
| tempestn wrote:
| That's a great observation. For what it's worth though, it
| does seem logical to me that the best would also be best at
| estimating their skill. Not necessarily because they're
| better at it per se (though there's likely some of that too,
| for the reasons originally posited by D-K), but also because
| they have an easier problem to solve. When you know something
| well, it's fairly obvious that that's the case. (Think of the
| experience of acing a math test. It's entirely possible you'd
| know you answered everything correctly.) When you struggle
| somewhat though, it's much more difficult to estimate how
| much you're struggling compared to how others would fare.
| jampekka wrote:
| The correct model is probably (estimated test score +
| estimation noise) = (actual test score + test noise). The test
| contains a random element, e.g. guessing, that the person can't
| estimate.
|
| https://en.m.wikipedia.org/wiki/Regression_dilution
|
| https://en.m.wikipedia.org/wiki/Errors-in-variables_models
| hn_throwaway_99 wrote:
| Previous discussion:
| https://news.ycombinator.com/item?id=31036800
| bitshiftfaced wrote:
| The authors did "X - Y vs X," but that's not even the biggest
| problem. The authors subtracted two measures that had been
| transformed and bounded from 0 to 1 (think percentiles). What
| happens at the extremes of those bounds? How much can your top
| performers overestimate their performance? They're almost at 1
| already, so not much. If they were to overestimate and
| underestimate at the same rate and by the same magnitude in terms
| of raw values, the ceiling effect on the transformed values means
| that the graph will make it look like they underestimate more
| often. The opposite problem happens for the worst performers.
|
| See "Random Number Simulations Reveal How Random Noise Affects
| the Measurements and Graphical Portrayals of Self-Assessed
| Competency." Numeracy 9, Iss. 1 (2016), particularly figures 7,
| 8, and 9.
| anonymouskimmer wrote:
| This can be dealt with to an extent by truncating the extreme
| ends. Even the middle quartiles in the graphs in the linked
| article show the same trends.
| bitshiftfaced wrote:
| Not that simple. This article demonstrates why enforcing
| bounds results in the changes in slope that you see in the
| expected grades (figure 2 and 4): https://www.frontiersin.org
| /articles/10.3389/fpsyg.2022.8401...
| ImaCake wrote:
| Thanks for stating just how much of a statistical minefield
| this is. The reference does a great job showing just how wrong
| the DK studies are. Unfortunately, most people have already
| made up their minds and are happy to link conflicting blog
| posts as evidence.
| Probiotic6081 wrote:
| Probably in another year or two they'll find another
| statistic that will render the old one moot like again and
| again.
| concordDance wrote:
| > wrong the DK studies are
|
| The DK studies are not wrong, they are misinterpreted by
| people who don't know what they're talking about (e.g. what
| tge DK effect actually is), like this blogger.
|
| "People have worse self assessment ability as their real
| ability declines" would be a valid interpretation of the DK
| data and notably would NOT be a valid conclusion from the
| random data in the blog post.
| ImaCake wrote:
| You should read the reference we are discussing which makes
| no such mistakes.
| dclowd9901 wrote:
| I think if people at all levels of skill were reasonably good
| at measuring their own ability, we would see two curves that
| roughly overlap. Instead we see the graph given.
|
| The fact that random noise can generate a mean curve on the Y
| axis doesn't mean DK doesn't exist. It just means DK's mean
| self analysis resembles a middling random mean, which if you
| think about it, makes sense. Most people will probably self
| evaluate as average, regardless of their actual skill. This
| means DK is right as rain.
| expazl wrote:
| > I think if people at all levels of skill were reasonably
| good at measuring their own ability, we would see two curves
| that roughly overlap. Instead we see the graph given.
|
| Actually, due to the construction of the test, the ability to
| evaluate your own absolute ability in a subject isn't
| sufficient for the two lines to be able to overlap.
|
| It's a percentile axis, so you need to be able to reasonably
| accurately estimate the ability of everyone taking the test,
| and where you fall in the quartile range of those
| participants.
| SamBam wrote:
| Exactly, that was my thought. How would it be _possible_ to get
| anything other than the D-K effect, even if it wasn 't just
| averaging to the mean?
|
| The lowest quartile can't say they're below the lowest
| quartile, so any error at all will be counted as
| "overconfidence." The top quartile can't say they're above the
| top quartile, so any error at all will be counted as
| "underconfidance."
| anonymouskimmer wrote:
| > Exactly, that was my thought. How would it be possible to
| get anything other than the D-K effect, even if it wasn't
| just averaging to the mean?
|
| Quite easily with the method they demonstrate in the study in
| figure 11. In that study test participants are not rating
| themselves in terms of population percentages, but in terms
| of the percentage correct they got on the test. In such a
| case the test could be designed to have a huge ceiling that
| even the most knowledgeable participants would have trouble
| reaching. And could have such a low floor that even the least
| knowledgeable participants would still get some answers
| correct (unless they weren't even trying, which would allow
| throwing out their data points).
|
| With 20 questions you could have four gimmes and four
| impossible questions, bounding the worst participants to
| about 20% and the best to about 80%.
| SamBam wrote:
| Right. To clarify, I meant: with the original study design,
| how could they not have gotten the result they did? (And
| that's rhetorical.)
| anonymouskimmer wrote:
| It would have been noteworthy in the original design if
| more than one group of participants were, on average,
| within their quartiles on the guessing. I also find it
| noteworthy that the average guess of the lowest quartile
| is lower than the average guess of the second lowest
| quartile, and on up the quartiles. On one hand this shows
| some awareness of relative ability along a massively
| smooshed logarithmic scale. On the other hand I wonder if
| this laddering follows as the averages are split into
| quintiles and deciles.
| jmpeax wrote:
| I wonder if estimating on the logit scale would solve this
| problem.
| dimask wrote:
| The boundedness of the data is also the main argument here
| https://www.frontiersin.org/articles/10.3389/fpsyg.2022.8401...
| wjnc wrote:
| Lognormality of data is killing for the methods of social
| scientists. If I were to hypothesize the underlying mechanism
| then it would be that raw skill is lognormally distributed for
| those taking tests at all (at least participating in these test
| usually entails an implicit lower bound on IQ, but also from
| the long tail of high performance in say sports), tests try to
| measure performance but with a reduction to normality (or 4
| categories) and then people estimate their own skills based on
| their task and grading experiences which are also reduction to
| a normal or constant distribution. ("I was always a B- in math
| in high school and expect that to have distribution X and this
| test to follow that distribution").
|
| It's three places where reductions in dimensionality take place
| both implicitly and explicitly. I don't envy researchers trying
| to unpeel this onion. I do like the unraveling of all these
| problems that pop up in pretty accessible designed experiments.
| It makes for better understanding.
| skue wrote:
| Dunning himself addressed this back in 2011:
|
| > 4.1. Regression to the mean
|
| > The most common critique of our metacognitive account of lack
| of self-insight into ignorance centers on the statistical
| notion of regression to the mean. Recall from elementary
| statistics classes that no two variables are ever perfectly
| correlated with one another. This means that if one selects the
| poorest performers along one variable, one will see that their
| scores on the second variable will not be so extreme.
| Similarly, if one selects the best performers along a variable,
| one is guaranteed to see that their scores on the second
| variable will be lower...
|
| His full response is longer than is appropriate to quote here,
| but you can easily find the chapter online.
|
| Dunning, David (1 January 2011). "Chapter Five - The Dunning-
| Kruger Effect: On Being Ignorant of One's Own Ignorance".
| Advances in Experimental Social Psychology. Vol. 44. Academic
| Press. pp. 247-296. doi:10.1016/B978-0-12-385522-0.00005-6.
| ISBN 9780123855220
| bitshiftfaced wrote:
| The author continues,
|
| > Some scholars observe that Fig. 5.2 looks like a regression
| effect, and then claim that this constitutes a complete
| explanation for the Dunning-Kruger phenomenon. What these
| critics miss, however, is that just dismissing the Dunning-
| Kruger effect as a regression effect is not so much
| explaining the phenomenon as it is merely relabeling it. What
| one has to do is to go further to elucidate why perception
| and reality of performance are associated so imperfectly. Why
| is the relation so regressive? What drives such a disconnect
| for top and bottom performers between what they think they
| have achieved and what they actually have? [...] As can be
| seen in the figure, correcting for measurement unreliability
| has only a negligible impact on the degree to which bottom
| performers overestimate their per-formance (see also Kruger &
| Dunning, 2002). The phenomenon remains largely intact.
|
| The DK effect says roughly, "low performers tend to
| overestimate their abilities." Yet when researchers analyzed
| the data, they found that high and low performers
| overestimate and underestimate with the same frequency. [0]
| It's just that high performers are more accurate than low
| performers (note how this statement differs from the DK
| effect). Since you can completely explain the "X graph" by
| the random noise combined with the ceiling effect, and since
| beginners' self evaluations are noisier than experts', you
| don't even need regression to the mean to explain why you get
| the "X graph."
|
| 0. Nuhfer, Edward, Steven Fleisher, Christopher Cogan, Karl
| Wirth, and Eric Gaze. "How Random Noise and a Graphical
| Convention Subverted Behavioral Scientists' Explanations of
| Self-Assessment Data: Numeracy Underlies Better
| Alternatives." Numeracy 10, Iss. 1 (2017): Article 4. DOI:
| http://dx.doi.org/10.5038/ 1936-4660.10.1.4
| chiefalchemist wrote:
| DK for me is simply: "You don't know what you don't know." When
| that happens, it's easy - surprise, surprise! - to misjudge your
| skill level. In a way, it almost feels cruel to ask someone with
| too few points of reference to say how much they know. The fact
| is whether high, low, or in the middle...they are guessing.
|
| On the other hand, with enough experience the depth and breadth
| of your context improves, as it should. At that point, mis-self-
| assessment is the result of arrogance, bravado, etc. That's a
| different problem than simply not knowing.
|
| If nothing else, DK has a case of apple v oranges.
| thewanderer1983 wrote:
| The Dunning-Kruger effect isn't as the article first quotes. It's
| an effect that everyone experiences. We as humans tend to over
| simplify things we don't understand well or at all. Therefore we
| over estimate our expertise on these subjects. We also tend to
| under estimate how much an expert on subjects we do know well.
| Everyone does this. It's not just dumb people.
| Jensson wrote:
| > We also tend to under estimate how much an expert on subjects
| we do know well
|
| Any evidence for this, except Dunning-Kruger? To me it looks
| like everyone overestimates themselves. There are a lot of
| professionals who think they are undervalued and that people
| worse than them gets all the rewards and fame.
| vismwasm wrote:
| The author measures the Dunning Kruger effect on his random data
| exactly because he assumes it when generating his random data.
|
| By modelling skill and perceived skill as uniform draws between 0
| and 100, the unskilled (e.g. skill=0) will over-estimate their
| skills (estimated skill = 50, the mean on the uniform random
| variable) and the skilled (e.g. skill=100) will underestimate it
| (as 50 as well, again the mean of the same random variable). The
| only ones who will be correct (on average) are the average
| skilled ones (skill=50).
| beltsazar wrote:
| I don't know if I agree that it's an autocorrelation, but one way
| to explain The Dunning-Krugger Effect is by acknowledging this
| simple fact:
|
| Most people think that they are an average person, but they can't
| be all average--there must be some people substantially below the
| median. Therefore, those people must overestimate their
| abilities.
|
| This also applies to other aspects, such as attractiveness. Less
| attractive people would overestimate their attractiveness.
| anonymouskimmer wrote:
| For all of the tests and rebuttals of the Dunning-Kruger effect
| the people tested are not drawing from the totality of other
| people, but trying to compare themselves solely to those who
| also took the same test.
|
| Anyone in a position to take such a test is almost guaranteed
| to be above average compared to the general population (which
| includes babies for intellectual tests, or the extremely old
| for attractiveness tests).
|
| I think this complicates personal evaluation.
| salty_biscuits wrote:
| It's just correlation, why do they keep calling it
| autocorrelation.
| stubish wrote:
| auto correlation, or self correlation. A correlation between
| different things may indicate an actual relation (smoking is
| correlated with early mortality). A self correlation is a
| tautology.
| snarkconjecture wrote:
| Nonstandard terminology warning: the author is using
| "autocorrelation" in a way I've never seen before. There is a
| much more common usage of "autocorrelation" to refer to the
| correlation of a timeseries with itself (shifted by some amount).
|
| If you use autocorrelation to refer to the thing in OP, you'll
| probably confuse people who know statistics, and vice versa.
| ketozhang wrote:
| The more common experience with autocorrelations are with time
| series, but what the author said is correct even in that
| context. A time series autocorrelation relates the same time
| series function at different times. At the simplest you plot
| the arrays X vs X where X[i] = f(t[i]). You then may complicate
| it further by some transformation g(X) vs X (e.g., moving
| average).
| epigramx wrote:
| you might say the article author might have some ..dunning-
| kruger on what autocorrelation is.
| nothrowaways wrote:
| L2 of dk
| xpe wrote:
| > Nonstandard terminology warning: the author is using
| "autocorrelation" in a way I've never seen before.
|
| That's a nice way of putting it. A more accurate description
| would be: the author is butchering the key essence of
| autocorrelation, since they don't clearly mention that it is a
| temporal relationship!
|
| > What is autocorrelation?
|
| > Autocorrelation occurs when you correlate a variable with
| itself.
|
| Groan.
|
| A standard definition is:
|
| > Autocorrelation refers to the degree of correlation of the
| same variables between two successive time intervals. It
| measures how the lagged version of the value of a variable is
| related to the original version of it in a time series.
| Autocorrelation, as a statistical concept, is also known as
| serial correlation.
| gnicholas wrote:
| What term is appropriate to describe what the author is
| referring to?
| anonymouskimmer wrote:
| > If the Dunning-Kruger effect were present, it would show up in
| Figure 11 as a downward trend in the data (similar to the trend
| in Figure 7). Such a trend would indicate that unskilled people
| overestimate their ability, and that this overestimate decreases
| with skill. Looking at Figure 11, there is no hint of a trend.
|
| There certainly _is_ a hint of a trend. Why do people, when
| visualizing data with a distinct trend, say that because the
| "error bars" from a particular statistical test overlap zero that
| no trend exists!?
|
| Freshman _trend_ to over-confidence. Grad students _trend_ to
| under-confidence. Undergrads in general _trend_ to over-
| confidence (though this trend decreases as year in school
| increases), and post-graduates, whether grad students or
| professors, trend to under-confidence.
|
| These "trends" are not statistically significant, but they
| certainly are a trend!
|
| Also, the random data distribution in figure 9 doesn't show the
| same trends as Dunning-Kruger's curve in figure 2. Perhaps there
| is at least one psycho-social mechanism here worth investigating?
| mrkeen wrote:
| > These "trends" are not statistically significant, but they
| certainly are a trend!
|
| This is an oxymoron.
| Dylan16807 wrote:
| Oxymorons only sound contradictory on a surface level.
|
| Something "certainly" being a "trend" is the definition of
| statistical significance, so this is a straight up
| contradiction.
| anonymouskimmer wrote:
| See here: https://news.ycombinator.com/item?id=38416858
|
| "Trend" has multiple meanings. Statistics doesn't get to
| claim all of the meaning.
| anonymouskimmer wrote:
| Show how.
|
| I place mechanistic theory prior to statistics in science.
| Mechanistic theory can be tested, statistics are a kind of
| test.
|
| If a statistically-insignificant result shows consistent,
| though non-significant deviations, such as the kind seen in
| Figure 11, then it tells me it's worth investigating whether
| mechanism(s) are explaining a very small portion of the
| variation that will not, in itself, show up as statistically
| significant, as it's being swamped by variation in other
| parameters.
| Dylan16807 wrote:
| Consistency is a synonym for statistical significance. If
| there's consistency beyond random alignment, then there
| should be a statistical test you can apply over your data
| to extract the signal.
|
| You can extract surprisingly small signals relative to
| variation in other parameters. But if it's _actually_
| swamped, then it might not be real, so go get more data.
| anonymouskimmer wrote:
| > Consistency is a synonym for statistical significance.
|
| So basically you're telling me that if I can visually see
| a consistency that does not show up in their statistical
| test, then they aren't running an appropriate statistical
| test on what I'm seeing.
|
| > But if it's actually swamped, then it might not be
| real, so go get more data.
|
| Even better to design other experiments.
| Dylan16807 wrote:
| > So basically you're telling me that if I can visually
| see a consistency that does not show up in their
| statistical test, then they aren't running an appropriate
| statistical test on what I'm seeing.
|
| _Either_ they 're not doing the right statistics, _or_
| it 's a "consistency" that is much more likely to show up
| randomly than you naively expect, and the study needs to
| be repeated or enhanced.
|
| Sometimes you can see a pattern that's just a figment of
| chance. See also: numerology, jelly bean xkcd
| Dylan16807 wrote:
| If they're actually error bars, you can shrink them with more
| data. That will turn the hint of a trend into an observation of
| a trend. If it wasn't random noise giving a fake hint.
| anonymouskimmer wrote:
| > If they're actually error bars, you can shrink them with
| more data.
|
| Assuming the new data has the same systemic or instrumental
| bias as the old data. Even using a different test date could
| skew results enough to widen the error bars.
| abnry wrote:
| If there is a linear relationship between test score (X, ability)
| and test score self-assessment (Y, self-perception), then the
| random variables are modeled as:
|
| $$ Y \sim aX+b+N $$
|
| Where N is some statistically independent noise, mean zero.
|
| This means the covariance between them is
|
| $$ Cov(Y-X,X) = E[ ((a-1)X+b+N -(a-1)E[X]-b) (X - E[X]) ] $$
|
| Which is
|
| $$ Cov(Y-X,X) = E[(a-1)(X-E[X])(X-E[X])] + E[N(X-E[X])]= (a-1)
| Var[X] $$
|
| To get a "DK effect" we need (a-1) < 0, or a < 1. If a=0, in the
| case of the blog post, then this is absolutely true. If a=1
| (which, along with b=0, is the ideal scenario), then this is
| barely not true. If a > 1, then we'd have a whole new effect
| about arrogant experts.
|
| So the only thing that matters from this "auto-correlation
| perspective" is the rate at which an individual's self-assessment
| increases with their ability. As long as they underestimate the
| increase, a "DK effect" will occur.
|
| However, in the above analysis, we ignored the variable b. If a =
| 0.8 and b=0, we'd never have the so-called "DK effect" even
| though it matches the "auto-correlation perspective" because
| everyone would underestimate their ability.
|
| This tells me that the value of b matters. It is sort of like the
| prior ability everyone assumes they have. What the DK papers
| shows is that b > .5, which I think is in line with the spirit of
| the popular interpretation of the "DK effect". People should not
| be assuming they have, at a minimum, a capacity higher than the
| average.
|
| At the same time, the value b isn't insanely higher than .5,
| which also makes me want to cut those unskilled and unaware some
| slack. It "seems reasonable" to assume your baseline is average.
| That can't be the case, but it feels intuitive.
| concordDance wrote:
| The author fails to make his point quite badly. Of course if
| everyone's self assessment was random the bottom quartile would
| overrate themselves! And that would be half of the Dunning-Kruger
| effect and we could truthfully say "the bottom quartile of people
| overrate themselves"!
|
| The other part where those at the top have a better idea or where
| they rank noticeably does not come out in his toy example.
|
| Honestly, he comes across as not having the slightest
| understanding of how people interpet those graphs...
| im3w1l wrote:
| It's fascinating how great Elo and similar ranking systems are at
| curbing DK. You just get a number, and that's how good (bad) you
| are. It's incredibly precise too, there's just no arguing with
| it.
|
| Also since the topic is D-K I'm a bit scared that I'm the fool
| here, but isn't he misusing the term autocorrelation? What he
| describes sounds like just normal correlation?
| toasted-subs wrote:
| Idk I genuinely feel like after having to deal with 10+ doctors
| who all had different opinions. The last doctor finally made the
| same conclusion as me and he was the last person I had to see.
|
| There's always exceptions. And sometimes reading publications
| pertaining to a very specific thing should give you more say on a
| subject.
|
| I just feel bad American tax payer money and the best years of my
| life was spent on telling medical professionals they don't know
| what they are talking about.
| dclowd9901 wrote:
| I think what this article is missing is "the chart DK should have
| used."
|
| Instead we get a spurious explanation that doesn't make a lot of
| sense based on completely fabricated data. It's entirely natural
| for something that looks like DK to emerge from randomized data,
| especially when the Y axis is represented by some number of the
| mean (actually 50ish in this case).
| a-dub wrote:
| i think of acf as a measure of repeating temporal structure and
| how "strong" and "long" it is, if it exists.
|
| that is, it gives you a notion of if and what order of an ar
| model should fit any repeating structure in the data.
| randomizedalgs wrote:
| Consider the imaginary world that the author describes, in which
| people's estimate of their score is independent of their actual
| score. Wouldn't it be fair to say that, in this imaginary world,
| the DK effect is real?
|
| The point of the effect is that people who score low tend to
| overestimate their score and people who score high tend to
| underestimate. Of course there are lots of rational reasons why
| this could occur (including the toy example the author gave,
| where nobody has any good sense of what their score will be), but
| the phenomenon appears to me to be correct.
| mrkeen wrote:
| If it's a statistical illusion, the correlation is still true,
| it just has no business being studied by psychologists.
|
| If I roll a die, and then roll a second die, I might study the
| behaviour of the second die and wonder why it wants to add up
| to 7 with the first die. Since they're dice, I can dismiss that
| as a stupid idea, but if they were people, I could certainly be
| led astray by psychological theories about them.
| skrebbel wrote:
| Woa of course, this is the point.
|
| The author's example with random points is bad because you
| might reasonably _expect_ people to behave differently than
| uniform random points.
|
| It'd be reasonable to expect that people who are good at a
| thing estimate that they are good at it, and that people who
| are bad at a thing, estimate that they're bad at it. I mean, my
| kids love math and always estimate themselves to do well on
| math tests (and they usually do). They have classmates loudly
| detest math, estimate they'll do badly, and often do (at least
| somewhat). Similarly I'm a bad cook and I have no doubt that if
| I join a cooking contest, I'll get few jury points. The
| _expected_ data is correlated.
|
| So if a study finds that, well actually, the data is not at all
| that correlated! Lots of people who estimate that they'll do
| _fine_ actually don 't, and equally many people who estimate
| that they'll do badly, actually do fine (ie it looks like
| uniform random data), then that's surprising, and that's the
| D-K effect.
|
| Right? I'm no statistician at all so I might be missing
| something.
| ezekiel68 wrote:
| > However, there is a delightful irony to the circumstances of
| their blunder.
|
| Indeed. And I find the tendency of people in this comment section
| to defend the flawed theory is further confirmation of another
| scientific finding: that we decide based on emotion and then
| justify our decision using rationality.
| stubish wrote:
| Even when the article cites the 3 papers it is based on, no
| refutations of the published science by people who grok it.
| notShabu wrote:
| every domain of expertise has two "elo" systems, the niche one
| and the broader one.
|
| e.g. you can learn basic juggling in 30 minutes that you are top
| 10% of your friends/colleagues etc...
|
| however within the juggling community itself this is known as the
| "3 ball cascade" a really simple trick relative to the ones that
| requires years to master. an outsider may not be able to tell the
| difference between the 1 year expert and the 10 year master.
|
| a lot dunning-kruger can be explained by people in one or the
| other not understanding the other system
| lopatin wrote:
| Oh I read about the about the DK effect a while ago. I'm pretty
| much an expert in Psychology now, AMA.
| eagerpace wrote:
| Is this the opposite of imposter syndrome?
| markhahn wrote:
| the numeric experiment does not produce a line identical to what
| DK report. if DK's line where horizontal at 50%, it would indeed
| be nothing but autocorrelation.
| dahart wrote:
| Most people, even here on HN, do not know what the DK effect
| actually claimed to show. It does not show that confident people
| are more likely to be incompetent. Their primary result shows a
| positive correlation between confidence and supposed skill. (What
| skill, you ask?*)
|
| This article suggests DK is even simpler than autocorrelation,
| that it's just regression toward the mean.
| https://www.talyarkoni.org/blog/2010/07/07/what-the-dunning-...
|
| I don't know which statistical artifact it is, but I am quite
| convinced that the so-called DK effect is not demonstrating
| something interesting about human psychology, I don't buy that
| this is a real cognitive bias. I've read the paper several times,
| and the methodology seems to be lacking rigor. They tested a
| small handful of Cornell undergrads volunteering for extra
| credit, not a large sample, not the general population, and
| tested _nobody_ who actually fits the description of
| 'incompetent' in a meaningful way. They primarily measured how
| people rank each other, not what their absolute skill was - and
| ranking each other requires speculating on the skills of others.
| There are obvious bias problems with asking a group of pampered
| Ivy League kids how well they think they rank.
|
| * One of the four "skills" they measured was ability to get a
| joke - "appreciation of humor" - Huh? This is subjective! The
| jokes used aren't given in the paper, either. Another was
| 'grammar' tests.
| TrackerFF wrote:
| The DK effect has gotten WAY more cred than it should. Today, it
| is just anoter feel-good piece that people use to justify their
| feeling that they're (ironically) surrounded by loud idiots.
| austin-cheney wrote:
| The best way to differentiate DK from autocorrection is motive.
| Low performance people will focus on motives that reinforce the
| perception of their competence, for example preferring code style
| over code delivery because while both may be arguably important
| one requires less effort and risk to attain.
|
| There is research to qualify this out of Stanford. People will
| shift motives to attain complements and the types of compliments
| received will dictate the challenges they are willing to accept.
| When a compliment is specific to an action and measurable people
| will strive for continuously more challenging tasks to
| continually receive specific compliments. When compliments are
| generic and directed to the person they will tend to preference
| progressively less challenging tasks so that they continue to
| shine relative to the attempted effort. The differences in
| behavior produces a natural Dunning-Kruger effect wherein people
| seeking less qualified activities are more likely to over
| estimate their potential and degree of success.
|
| This also statistically verified in research that correlates
| predictions to confidence. The more confidence a person is in
| their predictions, such as political talk radio hosts, the less
| accurate their predictions tend to be.
| James_K wrote:
| I think the issue here is a confusion about what "bias" means. If
| they are self-assessing at random, then the high performers will
| all underestimate themselves, but this is not a bias towards
| underestimation as they are choosing randomly.
|
| That said, the chart from D-K seems to show a different bias and
| line up roughly with what you would expect. Someone with no
| knowledge assumes they are average skill and hence inflates their
| position, someone who is very good doesn't want to rate
| themselves the best because they assume others know as much as
| they do. The assumption underlying both groups is that you are
| normal and others are similar to you.
|
| I hypothesise that most people think they're average, which is
| something you could easily test by asking them to rate how well
| they think the average person would do on a test and comparing it
| to that individual's test score. I'm almost certain that high
| performers will overestimate the average, and low performers
| underestimate it.
| riazrizvi wrote:
| A general problem with Dunning Kruger is the assumption that if
| you score low on a test then you are bad at the subject it is
| evaluating. I've taken enough bad quizzes that purportedly
| evaluate skills that I am an expert in, to know that that is a
| leap.
| nitwit005 wrote:
| If self evaluations are random, and you group a bunch of them
| together, then you'll see values around the 50th percentile.
| That's why their self evaluation line is nearly flat.
|
| In the actual data though, the line clearly trends upward. The
| people who did well appear to be scoring themselves non-randomly.
| resource0x wrote:
| Can someone explain the difference between Dunning-Kruger effect
| and "illusory superiority" effect
| (https://en.wikipedia.org/wiki/Illusory_superiority)?
| zeroonetwothree wrote:
| DK says that skilled people tend to underestimate their skill
| while unskilled people tend to overestimate their skill. This
| is likely a statistical artifact.
|
| IS says that people tend to overestimate their own skill
| compared to how other people estimate their skill. This seems
| likely true on average but not necessarily in all cases.
| jongjong wrote:
| This makes sense. IMO, the reason why Dunning-Kruger effect is so
| popular among the upper classes (along with Impostor Syndrome) is
| that it helps to provide justification for social inequalities as
| it corrects inner monologues.
|
| "How come I have so much given that I'm not as skilled as these
| other people? I must suffer from impostor syndrome."
|
| "Look at all these people complaining instead of taking
| responsibility for their own failures, they probably suffer from
| Dunning-Kruger effect. Their work must not be good enough."
|
| But of course this requires a certain detachment from reality
| (hence why many upper class people have blind spots). If they
| actually took a look at the evidence, they may find that some of
| these 'Dunning-Kruger people' are actually far more skilled than
| they imagine. I think it explains why people like Jurgen
| Schmidhuber who made significant contributions to AI tend to be
| ignored. Then because people are ignoring them, they are
| compelled to promote themselves harder to try to get their fair
| share of attention but they are then put in the 'Dunning-Kruger
| basket' until someone with a very good reputation like Elon Musk
| comes along and gives them credit. I think the same could be said
| about the mathematician Srinivasa Ramanujan; many mathematicians
| ignored his work or assumed he was a fraud because he seemed too
| sure of himself for someone who was completely unknown at the
| time. If such gross injustice can happen in a perfectly-
| quantifiable field like math, you can be sure it can happen in
| any field.
| fnord77 wrote:
| wikipedia's article intro on this doesn't state it is invalid :/
| badrabbit wrote:
| In my experience, people abuse flattery too much so it is hard to
| tell if their positive opinions of me are genuine and with merit.
| Generally speaking, I try to see the big picture and realize no
| matter how well I do, in a more global sense at best I am too
| 50th percentile, slightly above average. It is
| chance,relationships and supply/demand economics that ultimately
| decide our ability to apply our talents effectively.
|
| When it comes to others, I wish more people experienced the D&R
| effect. It gets frustrating sometimes dealing with smart and
| talented people who think they are revolutionary rockstars. You
| know the kind, they see other people's work and they are shocked
| how bad everything is, but never fear, they, our heroes are here
| to refactor everything until they leave and another hero looks at
| their work and rescues metropolis from it again. Patience and
| humility are a rare virtue for all of us.
| golol wrote:
| I disagree. Dunning Kruger is not a statement about predicted
| score correlating with actual score in some way. It states that
| predicted score does not correlate well with actual score. This
| can be rephrased as the prediction error having a negative
| correlation with the actual score. The article then claims that
| this negative correlation is autocorrelation. That is true but
| the correlation still exist. The thing is that ideally we EXPECT
| there to be no correlation of the prediction error with the
| actual score, but we find autocorrelation. Going back to
| variables where this autocorrelation is not there, we EXPECTED to
| find a 1:1 positive correlation between predicted score and
| actual score but find no correlation, or a weak correlation.
|
| So finding autocorrelation when you expected to find no
| correlation is pretty much the Dunning-Kruger effect here.
|
| In fact their example with the random data totally makes sense:
| Suppose people uniformly randomly estimate their performance.
| Then the people who are low skilled will consistently over-
| estimate and the people who are high-skilled will consistently
| underestimate. Of course there is no causation here, as the
| people choose randomly, but there is an undeniable correlation. I
| guess the question is if you view the Dunning-Kruger effect as a
| claim to low skill CAUSING positive prediction error, or just
| correlating with it.
| lifeisstillgood wrote:
| The Dunning Kruger effect is simply the same reason expensive
| projects are undertaken and never hit budget - not because we
| cannot estimate costs but because if we did we would never do
| anything.
| CalChris wrote:
| The article's definition of _autocorrelation_ :
| Autocorrelation occurs when you correlate a variable with itself.
|
| Wikipedia's definition of _autocorrelation_ :
| Autocorrelation, sometimes known as serial correlation in the
| discrete time case, is the correlation of a signal with a delayed
| copy of itself as a function of delay.
|
| Of course, 0 delay is the trivial case of time delay but really,
| the article's definition is at best inaccurate. D-K has nothing
| to do with time delay and calling it autocorrelation seems like a
| weird pun that doesn't quite land.
| hyperthesis wrote:
| _If_ unskilled and skilled self-assessed themselves the same on
| average, then unskilled overestimate, and skilled underestimate.
|
| That would be a significant result alone - that no one had any
| idea. (but as https://news.ycombinator.com/item?id=38416100
| notes, there is a correlation).
| chmod600 wrote:
| A related effect that I've wondered about is: perhaps lower-
| skilled people compare themselves to the general public, while
| perhaps skilled people compare themselves to a smaller group of
| skilled peers.
|
| In other words, if you asked me if I'm good at riding a bicycle,
| I'd compare myself to others in the general population and say
| "yes". But if you ask a weekend bicyclist, they'd be better than
| me but perhaps compare themselves to weekend bicyclists, and rate
| themselves lower. And the effect might repeat for competitive
| bicyclists.
|
| If true, this could explain why we intuitively believe the DK
| effect.
| RevEng wrote:
| What Blair Fix's article gets wrong is that there are two stark
| differences between what Fix generated with random data and what
| Dunning and Kruger observed in theirs.
|
| Fix has each person guess randomly between 0 and 99 where they
| will lie in the percentiles. They simulate every person having no
| idea and giving equal probability to being the best or the worst.
| If we then sort them by how well they really did into quartiles
| and then evaluate the average of how well they thought they would
| do, we get what we would expect: each quartile has an equal
| chance of predicting that they will do well or do poorly, with an
| average expected percentile of 50, which is what you would expect
| by a random guess.
|
| Note two key things about this: - All quartiles guessed the same
| - there was no correlation between what they guessed and how well
| they actually did - All quartiles guessed the expected average
| percentile - 50%. This means they were unbiased in how well they
| thought they would do.
|
| If people were unbiased but also unaware, this is the null
| hypothesis we would expect: on average people predict themselves
| to be average and there's no correlation between how well they
| predicted they would do and how well they actually did.
|
| Now compare that to what Dunning and Kruger observed: - The
| quartiles did NOT guess the same. There was a bit of an upwards
| trend, which suggests that people at least somewhat were able to
| determine their actual percentiles, even if only weakly on
| average. - The predictions were biased. All groups estimated they
| would do better than the expected average. That is to say, on
| average, they thought they were above average. This is an
| important bias. - The differentials between quartiles are not
| equal. The first and second quartile typically predicted the
| same, over-estimated value, implying that neither group had any
| idea they were better or worse than each other. However, the
| upper quartile consistently estimates a higher average. That is
| to say, people who perform well, on average, believe they are
| performing even better than those who don't perform well. And
| perhaps most surprisingly, there was often a statistically
| significant dip at the third quantile. Comparing their beliefs,
| people who did well believed they had done worse than the people
| who actually did worse.
|
| Fix also fails to go beyond the first figure of the paper. After
| seeing this inconsistent behaviour between the quartiles, Dunning
| and Kruger then test what happens if the respondents are given an
| opportunity to grade each other - therefore getting an idea of
| what the percentiles actually look like - and to have their
| skills improved - thereby possibly making them better able to
| judge their own and each other's abilities. Again, if Fix's
| premise that this is all just a result of manipulating the
| autocorrelation of an otherwise unbiased random sequence, then
| these interventions should have no discernable effect. Yet,
| Dunning and Kruger find markedly significant changes after these
| interventions, and those changes are different within the
| different quantiles.
|
| It is precisely this difference between quantiles which is the
| Dunning-Kruger effect. Fix effectively makes their point for them
| by building a null model and showing what would happen if there
| were no Dunning-Kruger effect - if people were fully unaware and
| unbiased. Instead, it is the way in which Dunning and Kruger's
| observations deviate from this model that is the very effect that
| bears their name.
|
| Instead, all that Fix manages to do is point out how confusing
| the plot is that Dunning and Kruger produced. The plot can easily
| be misinterpreted to suggest that it's the difference between y
| and y-x that is important. Instead, in their writing, Dunning and
| Kruger actually focus on the differences in how y-x changes when
| the situation changes, demonstrating that it's actually dependent
| on knowledge and how different people respond to that knowledge.
| What they actually show is that delta(y-x) vs x has a nonzero
| relationship and this is particularly interesting.
|
| Perhaps if Dunning and Kruger had not included the example of
| perfect knowledge as a comparison, but instead included the
| example of unbiased and unknowledgeable that Fix produced as the
| thing to compare against, the Dunning-Kruger effect would be much
| better understood.
|
| Further, both could benefit greatly from plotting and tabulating
| not just an average, but the overall distribution within each
| group. Fix should know that variance is just as important as
| bias. Even if all groups are biased in their prediction,
| differences in variance between each group indicates their
| confidence in their belief. Knowledge should help to reduce both
| bias and variance. A guess with high variance tells us little,
| while a guess with low variance tells us quite a bit. Even if all
| quartiles predicted the same average, we wouldn't fault those
| with little ability for guessing a high number if they did so
| with low confidence. On the contrary, we would expect people with
| high ability to be more confident (and correct) in the assessment
| of their ability.
| hgomersall wrote:
| The entire post is pointing out how bad the stats is in the
| original paper. If you want additional critique, go and read
| the references.
| hyperthesis wrote:
| It's Dunning-Krugers all the way down - including this self-
| referential smugness.
| epigramx wrote:
| "Autocorrelation is the statistical equivalent of stating that 5
| = 5." no sure if the author has some ..dunning-kruger there.
| 6510 wrote:
| I was curious if the self assessment is done before or after the
| test.
|
| Bing chat gave me this wild answer:
|
| > The effect is usually measured by comparing self-assessment
| with objective performance. For example, participants may take a
| quiz and estimate their performance afterward, which is then
| compared to their actual results 1. Therefore, people estimate
| their ability before the test by Dunning-Kruger.
|
| In the case estimation _is_ done before: If you 've had training,
| like a soup of ingredients, that matches the priorities and
| biases of the test it would be strange if no measurable effect
| remained.
|
| If it's done after: You can create trick questions specifically
| designed to test if someone learned a specific thing. A good test
| would test for that. If someone didn't learn the specific thing
| they could give/guess the wrong answer with some confidence.
|
| The design of the test has great influence on how poorly you'll
| think you've done. I would argue that the superior test is the
| one designed to fool you. Hans Rosling famously created a
| multiple choice test with 4 answers per question with average
| results below 25%.
|
| On a more fascinating note, unskilled means all areas of
| expertise outside your own.
|
| People who are universally unskilled in all areas are of course
| more likely to think they are unskilled. In reality these people
| know little bits about many things.
|
| This in contrast with people who spend all day, every day, for
| their entire lives pondering topics inside their area of
| expertise. If you are doing one thing you aren't doing all of the
| other things.
|
| Wikipedia had hilarious instances of experts contributing to
| countless articles accidentally ending up on the wrong page.
| Suddenly they have no patience, think they know everything and
| act like children. It's funny because you cant just ban valuable
| contributors.
|
| I would love to see this DK test done with professors furthest
| removed from the area of expertise.
| civilized wrote:
| We discussed this in a previous thread. The author is basically
| hypothesizing that perhaps people are so universally terrible at
| predicting their ability, their self-rating is like an
| unconditional random variable - just a random draw that is not
| influenced by their actual ability level at all.
|
| If this is true, then when your actual ability is high, your
| self-rating is likely to be lower than your ability simply by
| random chance. For example, if ability ranges from 0-100, your
| actual ability is 99, and your self-rating is a uniform random
| number from 0-100, your self-rating is 99% likely to be lower
| than your actual ability. Conversely, if your actual ability is
| low, your self-rating is likely to exceed your actual ability
| level.
|
| When it's explained clearly and simply, the criticism raises a
| lot of questions. Are people _actually_ that bad at rating their
| own ability? I doubt it.
| zw123456 wrote:
| I know I'm not smart enough on statistics or psychology to
| evaluate the article but it always struck me that D&K seemed to
| say something similar to what my grandpa said when I was a wee
| lad, "The more you know, the more you realize how much you don't
| know", I know he wasn't the first person to say that, but he was
| the first person to say it to me. I don't know if D&K is
| autocorrelation or not, but I know that an awful lot of people
| seem to think they know more than maybe they actually do,
| probably me included. Hmmm, maybe the author of that article as
| well? I wonder if that occurred to him, seems like a glaring
| oversight not to at least recognize that possible irony.
| Arch485 wrote:
| In the article, a real study was used as a counterexample to
| the DK effect.
|
| Part of the results was a correlation that people who were
| "less capable" were also worse at predicting their own skill,
| and people who were "more capable" were better at predicting
| their own skill.
|
| While similar to the DK effect, this is different, as the DK
| effect states that "less capable" individuals specifically
| _overestimate_ their skill, as opposed to simply being wrong
| (both over and under -estimating).
|
| With relation to some people "seeming to think they know more
| than they actually know", this is likely confirmation bias in
| the sense that there are an equal number of people who don't
| know much, and know that they don't know much.
| dimask wrote:
| I would call this type of argument a case of regression to the
| mean rather than "autocorrelation". That, of course, in principle
| requires independence between performance and assessment of
| performance. In many cases, it would make little sense to assume
| that the performance and assessment of performance are
| independent. But even then, one can simulate random data with
| some correlation, and still get a DK effect merely as statistical
| artifact. An overview of similar critiques, and a similar
| argument in
| https://www.frontiersin.org/articles/10.3389/fpsyg.2022.8401... .
| 19f191ty wrote:
| That is not an autocorrelation. The OP is equating linear
| dependence with autocorrelation, which not how we use that term.
| Autocorrelation is when a random process is correlated with time
| lagged version of itself.
| dilawar wrote:
| David Dunning response (2022):
| https://www.bps.org.uk/psychologist/dunning-kruger-effect-an...
| BrenBarn wrote:
| Yeah I don't buy this either.
|
| I do think the original Dunning-Kruger plot is a bit of an odd
| presentation. The way I look at it is just to say that people's
| self-estimates of their ability fall into a relatively narrow
| range (e.g., 55-75th percentile on the graph), whereas their
| actual abilities of course cover the whole range from 0-100th
| percentile. You don't really need the plot of "x versus x"
| (average score in each quartile). You just need to say "people's
| self-assessments seem to start unrealistically high and only go
| up a little, even as their ability goes up a lot".
| PeterStuer wrote:
| You can take out the x from both sides, and the y would still not
| be a horizontal line.
|
| In their eagerness to 'deconstruct' the narrative, do the authors
| merely provide another example of Dunning-Kuger by overestimating
| their own cleverness?
| eterevsky wrote:
| I think this article would've made more sense if it had a title
| "The Dunning-Kruger effect is regression toward the mean",
| because that's what the author is actually showing.
| tgv wrote:
| I think your description is the most apt.
|
| OP's own analysis shows that using random data (two variables
| uniformly distributed over the same range!) for both skill and
| self-assessment results in a _different_ graph. The original
| comparison therefor implies another effect on the second
| dimension, which could be interpreted as: people don 't
| estimate their skills correctly, but drift towards the mean.
|
| But then the question becomes: what did they really ask their
| subjects? To pick the percentile or a true test score?
| mattbit wrote:
| This is not 'autocorrelation', it is regression to the mean. I
| find the article unclear and imprecise. For those interested in a
| better overview of the Dunning-Kruger effect, I recommend this
| short article by McIntosh & Della Sala instead:
|
| https://www.bps.org.uk/psychologist/persistent-irony-dunning...
| mattbit wrote:
| This is how McIntosh & Della Sala put it:
|
| > in the academic literature, it has been suggested that the
| signature pattern of the DKE (Figure 1A) might be nothing more
| than a statistical artefact. In a typical study, people's
| tendencies to under- or overestimation are analysed as a
| function of their ability for the task. This involves a 'double
| dipping' into the data because the task performance score is
| used once to rank people for ability, and then again to
| determine whether the self-estimate is an under- or over-
| estimate. This dubious double-dipping makes the analysis prone
| to a slippery statistical phenomenon called 'regression to the
| mean'.
| pmavrodiev wrote:
| Noone seems to have read OP's post in its entirety. A crucial
| point was made by referencing this paper:
| https://digitalcommons.usf.edu/cgi/viewcontent.cgi?article=1....
|
| Figure 2 in this paper shows the result of an experiment where
| skill and perception of one's skill are measured independently.
| To eliminate any statistical artifact of auto-correlation. And lo
| and behold - on average skill is uncorrelated to the accuracy
| one's own assessment. No DK effect at all. What does show up
| actually is that more qualified people are more consistent in
| estimating their skill (i.e. their assessments are less
| variable), but the mean accuracy is still 0.
|
| So indeed, on average actual and perceived skills are
| uncorrelated. That's exactly what the numerical proof with random
| numbers shows and why in many cases we apply Occam's razor.
| psychoslave wrote:
| I went through the whole article, and I am not only very
| skeptical about the claimed debunk but wonder what kind of
| psychological trope you might label as corelative to such an
| article.
|
| I mean "bad science built only on rhetoric" is a double edged
| sword, you know.
|
| To start with, the graph presented at the end does not look like
| the one from the original article, where the self assessment does
| grow significantly, though it starts higher than average and
| grows less quickly than external assessment.
|
| Also the article focus on "random" data set which, but we know
| that there are different classes of apparent noisy plots. Noisy
| distribution of self assessment would actually be an informative
| figure too.
|
| So the biggest issue here is its kind of pretending that whatever
| the way the ordinate value is coupled to, if it includes the
| abscissa in its definition you'll get the same kind of plot as a
| result, which is obviously false. You could easily come with
| arbitrary values coupled to "x" that would look radically
| different.
| rom1v wrote:
| If Y = X + estimation_error, then substracting X (in Y-X) removes
| the correlation rather than adding it.
| Spiwux wrote:
| At the risk of sounding like a complete idiot, isn't the
| hypothesis of the original paper still true? Let's assume self
| assessment score is perfectly random between 0% and 100%, so on
| average every group will always estimate themselves to be 50%
| correct
|
| Then by definition that means people who are unskilled and often
| incorrect will overestimate themselves, while people who are
| often correct will underestimate themselves. Take a complete
| idiot for example. You always get 0% test score. Yet your self-
| assessment is random between 0% and 100%. Hence you overestimate
| yourself much more often than people who always get 100% test
| score.
|
| In fact, if the two are uncorrelated, then that still means that
|
| 1) Idiots don't recognize they're idiots
|
| 2) Skilled people don't recognize they're skilled
| bsza wrote:
| Article claims Dunning-Kruger is present in a population where
| everyone estimates their own skills based on dice rolls. Someone
| who estimates their own skills based on a dice roll is
| objectively crap at estimating their own skills. Dunning-Kruger
| claims people are objectively crap at estimating their own
| skills.
|
| Where is the contradiction?
| dudeinjapan wrote:
| So you're saying that the Dunning-Kruger effect applies to
| Dunning & Kruger.
| falserum wrote:
| Article feels like a personal attack towards D and K.
| powera wrote:
| Nope.
|
| I must object to this paragraph: "To be honest, I'm not
| particularly convinced by the analytic arguments above. It's only
| by using real data that I can understand the problem with the
| Dunning-Kruger effect. So let's have a look at some real
| numbers."
|
| He then goes on to use synthetic data.
|
| Beyond that dishonest slight of hand, this is in the category of
| "one thought experiment didn't prove the phenomenon exists,
| therefore it must not exist" logical errors.
| zephrx1111 wrote:
| A more generalizable explanation is regression towards the mean:
| everybody thinks they are an average person.
___________________________________________________________________
(page generated 2023-11-26 23:01 UTC)