[HN Gopher] The Dunning-Kruger effect is autocorrelation
       ___________________________________________________________________
        
       The Dunning-Kruger effect is autocorrelation
        
       Author : ljosifov
       Score  : 557 points
       Date   : 2023-11-25 18:14 UTC (1 days ago)
        
 (HTM) web link (economicsfromthetopdown.com)
 (TXT) w3m dump (economicsfromthetopdown.com)
        
       | Jensson wrote:
       | Psychologists using their pet theories to explain results and
       | then people taking that explanation as the truth when they should
       | really just look at the data is probably an as large problem as
       | the replication crisis.
        
       | glitchc wrote:
       | Geez, this is eye-opening. Thank you for sharing this.
        
       | tempestn wrote:
       | I don't buy this take, and this rebuttal does a better job than I
       | could of explaining why: https://andersource.dev/2022/04/19/dk-
       | autocorrelation.html
       | 
       | Basically, this autocorrelation take shows that if performance
       | and evaluation of performance were random and independent, you
       | would get a graph like the D-K one, and therefore it states that
       | the effect is just autocorrelation. But in reality, it would be
       | very surprising if performance and evaluation of performance were
       | independent. We expect people to be able to accurately rate their
       | own ability. And D-K did indeed show a correlation between the
       | two, just not as strong of one as we would expect. Rather, they
       | showed a consistent bias. That's the interesting result. They
       | then posit reasons for this. One could certainly debate those
       | reasons. But to say the whole effect is just a statistical
       | artifact because random, independent variables would act in a
       | similar way ignores the fact that these variables aren't expected
       | to be independent.
        
         | Jensson wrote:
         | The effect that the worst overestimate their skill is known
         | since before, that wasn't the main result of Dunning-Kruger.
         | The effect that the best underestimate their skill can be
         | chalked up to auto-correlation.
        
           | tempestn wrote:
           | The best don't tend to overestimate their skill; they
           | underestimate it. The D-K results show a consistent bias in
           | estimates toward (somewhere near) the mean. Hence an
           | overestimate at the bottom and an underestimate at the top.
        
             | Jensson wrote:
             | > The best don't tend to overestimate their skill; they
             | underestimate
             | 
             | I wrote the wrong word, I fixed it. The best can't
             | overestimate their rank, so of course that wasn't what I
             | meant.
        
             | anonymouskimmer wrote:
             | Dunning-Kruger posits this as a psychological effect, yes?
             | On the top half psychological effects such as imposter
             | syndrome could come in to play.
             | 
             | Have sociological factors such as being kind or big fish
             | little pond been considered as likely causes of the
             | misestimates?
        
               | chiefalchemist wrote:
               | I have the same question...why do some get it so wrong?
               | Was there a nudge in the process of the study that caused
               | some to answer what they did?
               | 
               | Heck, I'm wondering if "Honestly, I can't say" was an
               | allowed response. Or were they forced to pick a number?
               | If so, then I'd want to know what happens when you ask
               | 100 ppl to pick a number between 0 and 100. I bet it's
               | not evenly distributed. Maybe the beginners give a
               | "discounted" version of the distribution?
               | 
               | Even if the autocorrection explanation is off, there does
               | now seem to be flaws in DK, at least from the perspective
               | of pure and proper science
        
         | svnt wrote:
         | The author of this assumes the conclusion in order to decide
         | how to analyze his data.
         | 
         | He cannot reasonably say both:
         | 
         | > we have a decision to make: what are we going to assume? How
         | are we going to quantify our surprise from the results?
         | 
         | > The first option is, as in the case of the state census, to
         | assume dependence between X and Y. I.e. to assume that,
         | generally, people are capable of self-assessing their
         | performance.
         | 
         | > The second option conforms with the Research Methods 101
         | rule-of-thumb "always assume independence." Until proven
         | otherwise, we should assume people have no ability to self-
         | assess their performance.
         | 
         | > It seems to me glaringly obvious that the first option is
         | much, much more reasonable than the second.
         | 
         | -- and -
         | 
         | > most notably the claim that the more skilled people are, the
         | better they are at self-assessing their performance. This
         | result is supported by their plot, but in any case, my issue is
         | not with objections to this claim
         | 
         | and then expect to carry any credibility.
         | 
         | The author of this piece both suggests that a key variable is
         | fixed and later admits it varies within the same dataset.
         | 
         | I guess at least they admit it, but this lacks basic self-
         | consistency.
        
           | Jensson wrote:
           | > The author of this piece both suggests that a key variable
           | is fixed and later admits it varies within the same dataset.
           | 
           | I don't see how that variable changes, here is an example how
           | the error variable can be exactly the same for everyone and
           | reproduce the results:
           | 
           | Lets say the overconfidence is always that you feel 50% of
           | those better than you are actually worse than you. So
           | everyone is equally overconfident, just that the top wont
           | move their own placings as much as the bottom since there are
           | much fewer people that they can mistake being worse than
           | them. Then apply noise to this and you get the graph Dunning-
           | Kruger got.
           | 
           | You could say "But they are better at estimating their
           | rank!", but that is just a mathematical artefact, it isn't a
           | psychological result. Even if everyone always guessed that
           | they are number 1, the better you are the better your guess
           | will be, but in that case it is easy to see that everyone
           | overestimates their skill in the same way instead of the
           | better people having a fundamentally different way of
           | evaluating themselves.
        
             | raincole wrote:
             | > Lets say the overconfidence is always that you feel 50%
             | of those better than you are actually worse than you. So
             | everyone is equally overconfident, just that the top wont
             | move their own placings as much as the bottom since there
             | are much fewer people that they can mistake being worse
             | than them. Then apply noise to this and you get the graph
             | Dunning-Kruger got.
             | 
             | But the data of original D-K paper shows that the top 25%
             | people _underestimate_ their placings. So this whole
             | paragraph, while logically true, has little to do with the
             | original D-K effect.
             | 
             | > You could say "But they are better at estimating their
             | rank!", but that is just a mathematical artefact, it isn't
             | a psychological result. Even if everyone always guessed
             | that they are number 1...
             | 
             | If everyone always guessed that they are number 1, it's a
             | huge psychological result: it means people are extremely
             | irrational when it comes to self-evaluation.
        
               | Jensson wrote:
               | > But the data of original D-K paper shows that the top
               | 25% people underestimate their placings. So this whole
               | paragraph, while logically true, has little to do with
               | the original D-K effect.
               | 
               | That is what you would expect under my model, due to the
               | randomness being limited upwards for the high placings
               | but still go downwards. That is the effect the article we
               | are talking about refers to when they say
               | "Autocorrelation".
        
             | svnt wrote:
             | Both analyses seem to agree on one finding: people's skill
             | at estimating their own ability increases with that skill.
             | It can't be a purely mathematical artifact because you
             | would see a tapering at either end, or a narrowing
             | distribution of errors at the bottom end, not just a
             | narrowing toward the top end.
             | 
             | This should be unsurprising for anyone who has become
             | sufficiently skilled at something. Beginners can't even
             | discern the differences the experts are discussing, and
             | frequently make errors in classes they don't even
             | understand.
        
               | chiefalchemist wrote:
               | Beginners, by definition, are guessing 100%. Some will
               | guess high, others low, and the rest in between. But they
               | are all guessing. Perhaps There's a cultural bias to
               | over-estimate their skill? Perhaps there's a nudge in the
               | process of the study that led them to overestimate?
               | 
               | The lede isn't that people over-estimate their skill
               | level. The lede is, why would that be as they have
               | nothing else to go on. That is the trigger or triggers?
               | And to say, the more experienced estimate better? Well,
               | duh.
        
           | contravariant wrote:
           | I'm utterly confused. The latter statements it just the
           | author explaining which parts they didn't discuss in their
           | article; it has no bearing whatsoever on the section before
           | it.
        
             | svnt wrote:
             | It discloses the cognitive dissonance in his position. He
             | seems to be saying both "skill at assessing ability is
             | random and mathematically bounded only" while admitting
             | "skill at assessing ability changes with ability."
        
         | atleastoptimal wrote:
         | The issue is people have differing personal definitions of
         | Dunning Kruger. The generally demonstrated effect in the sample
         | of people Dunning and Kruger analyzed was "people tend to
         | estimate the percentile of their own skill as closer to the
         | average than it really is, with a slight bias towards an above-
         | average mean. This leads to overestimation of relative ability
         | by those in lower percentiles, and the opposite for those in
         | higher percentiles"
         | 
         | However when people cite Dunning Kruger in popular culture they
         | mean "below average people think they're above average, and
         | above average people assume they're below average", which was
         | not shown in the original study, and wouldn't show up in an
         | analysis attempting to justify it via a misunderstanding of
         | autocorrelation.
         | 
         | The general point in the rebuttal is correct. A completely
         | noisy graph of people's estimations of their own ability would
         | show a Dunning-Kruger resembling residual graph (x-y vs x).
         | However, one wouldn't expect people in the 1st percentile to
         | have an equal distribution of perceived skill as people in the
         | 50th or 99th percentile. If that were true, it would be worth
         | reporting.
        
           | ShamelessC wrote:
           | > "below average people think they're above average, and
           | above average people assume they're below average"
           | 
           | There's no way to know if you're wrong, but when I see it
           | used it seems to be pointing out - "some (not all) under
           | qualified people tend to defer to their own beliefs rather
           | than the views/statements from experts, even when that is
           | demonstrably silly."
           | 
           | ^ Referring to the pop-sci interpretation, not in
           | disagreement with the general point.
        
             | staunton wrote:
             | Which also has nothing at all to do with this study by
             | Dunning and Kruger. So you agree with the general point of
             | parent.
        
               | ShamelessC wrote:
               | Yes. Just clarifying a small disagreement about the pop-
               | sci interpretation of the phrase.
        
         | crazygringo wrote:
         | Yup. Assuming the sample sizes are statistically significant,
         | the original paper clearly shows:
         | 
         | - On average, people estimate their ability around the 65th
         | percentile (actual results) rather than the 50th (simulated
         | random results) -- a significant difference
         | 
         | - That people's self-estimation _increases with their actual
         | ability_ , but only by a surprisingly small degree (actual
         | results show a slight upwards trend, simulated random results
         | are flat) -- another significant difference
         | 
         | The author's entire discussion of "autocorrelation" is a red
         | herring that has nothing to do with anything. Their randomly-
         | generated results do _not_ match what the original paper shows.
         | 
         | None of this really sheds much light on to what degree the
         | results can be or have been robustly replicated, of course. But
         | there's nothing inherently problematic whatsoever about the way
         | it's visualized. (It would be nice to see bars for variance,
         | though.)
        
           | ketozhang wrote:
           | The autocorrelation is important to show that it's
           | transformation to D-K plot will always give you the D-K
           | affect for independent variables.
           | 
           | However, the focus on autocorrelation is not very
           | illuminating. We can explain the behaviors found quite
           | easily:
           | 
           | - If everyone's self-assessment score are (uniformally)
           | random guesses, then the average self-assessment score for
           | any quantile is 50%. Then of course those of lower quantile
           | (less skilled) are overestimating.
           | 
           | - If self-assessment score vs actual score are dependent
           | proportionally, then the average of each quantile is always
           | at least it's quantile value. This is the D-K effect, which
           | is weaker as the correlation grows.
           | 
           | -The opposite is true for disproportional relation.
           | 
           | So, the D-K plot is extremely sensitive to correlations and
           | can easily over-exaggerate the weakest of correlations.
        
           | cortesoft wrote:
           | > That people's self-estimation increases with their actual
           | ability, but only by a surprisingly small degree (actual
           | results show a slight upwards trend, simulated random results
           | are flat) -- another significant difference
           | 
           | If everyone thinks they are slightly above average, isn't
           | this inevitable? If everyone thinks they are slightly above
           | average, people who are slightly above average are going to
           | be the most accurate at predicting where they land?
        
             | zuminator wrote:
             | Even if "people tend to slightly overrate their own
             | ability," was the only takeaway, it would still refute the
             | author's conclusion that DK has nothing to do with human
             | psychology.
        
             | IanCal wrote:
             | Yes but then you'd see a flat line for people's estimates,
             | which wasn't the result.
        
             | cycomanic wrote:
             | Have you not just summarized the Dunning-Kruger effect in
             | other words?
             | 
             | That essentially follows from everyone assume they are
             | slightly above average. That's also the crux of the
             | refutation and why the whole autocorrelation is a red
             | hering, even if we all would just self assess completely
             | randomly, that actually confirms the Dunning-Kruger effect
             | is real (because if we self assess randomly worse
             | performance are more likely to overestimate).
             | 
             | We could argue that this is not surprising, but the
             | "surprising" bit is that the curves show that better
             | performers are actually more skilled at assessing their
             | performance, which incidentally was also confirmed by the
             | followup studies.
        
               | quetzthecoatl wrote:
               | Is it though? Everyone overestimating their ability a bit
               | isn't DK effect. It's when people with less knowledge and
               | ability vastly over estimate their ability (because they
               | don't know how little they know - while others do), and
               | the opposite for those who are truly more able and
               | knowledgeable (again because they understand how vast the
               | topic is and though they know more and are capable more
               | than the average person, they also understand how little
               | they truly know compared to what they don't know)
        
             | dahart wrote:
             | > If everyone thinks they are slightly above average, isn't
             | this inevitable? If everyone thinks they are slightly above
             | average, people who are slightly above average are going to
             | be the most accurate at predicting where they land?
             | 
             | Yes, it's inevitable. And this study only asked Cornell
             | undergrads what they think of themselves - people who were
             | taught to believe they are above average, and also people
             | who got into a selective school and probably all had higher
             | than average scores on standardized tests. Is it surprising
             | in any way that this group estimated their ability at above
             | average?
        
           | somenameforme wrote:
           | > "On average, people estimate their ability around the 65th
           | percentile (actual results) rather than the 50th (simulated
           | random results) -- a significant difference"
           | 
           | This is a different issue than D-K. The D-K hypothesis is
           | that self assessment and actual performance are less
           | correlated for weaker than higher performing individuals.
           | People think they're better than average is a different (and
           | much less controversial) bias.
           | 
           | ---
           | 
           | [DK-Effect] : I totally know I scored at least a 30% on that
           | test, and that's certainly way better than average (it's
           | not). [Actually scored 10%]
           | 
           | [No DK-Effect] : I totally know I scored at least a 30% on
           | that test, and that's certainly way better than average (it's
           | not). [Actually scored 30%]
           | 
           | ---
        
             | kstenerud wrote:
             | > The D-K hypothesis is that self assessment and actual
             | performance are less correlated for weaker than higher
             | performing individuals.
             | 
             | Isn't that what the graph shows? The bottom quartile group
             | is guessing almost 50 percentile points higher than their
             | actual performance, whereas the top quartile is at most 15
             | points off.
             | 
             | They're all guessing somewhere between the 60th and 75th
             | percentiles (i.e. "I'm a bit better than average") - with
             | some upwards trend since the high performers seem to at
             | least know they have some skill, although not very
             | accurately. It's just that for the poor performers, a guess
             | of the 60th percentile wayyy off the mark.
        
               | somenameforme wrote:
               | EDIT: Something important for the rest of this post. In
               | case it's not clear, the graph is showing your percentile
               | ranking within the group - not your actual score.
               | 
               | Nope, because there's an interesting statistical trick in
               | play. Imagine you take 100 highly skilled physicists and
               | give them some lengthy series of otherwise relatively
               | basic physics questions. Everybody is going to rate their
               | predicted performance as high. But some people will miss
               | some questions simply due to silly mistakes or whatever.
               | And those people would end up on the bottom 10% of this
               | group, even if the difference between #1 and #100 was
               | e.g. 0.5 points. Graph it as D-K did, and you'd show a
               | huge Dunning Kruger effect, even when there is obviously
               | nothing of the sort.
               | 
               | In fact the _fewer_ differences in ability within a
               | group, and the greater the relative ease of a task, the
               | _bigger_ the Dunning-Kruger effect you 'd show. Because
               | everybody will rate themselves relatively high, but you
               | will always have a bottom 10%, even if they are
               | practically identical to the top 10%.
               | 
               | You can see this most clearly in the original paper. They
               | carried out 4 experiments. The one that was most
               | objective and least subject to confounding variables was
               | #2, where they asked people a series of LSAT based logic
               | questions, and assessed their predicted vs actual
               | results. And there was very little difference. Quoting
               | the paper, "Participants did not, however, overestimate
               | how many questions they answered correctly, M = 13.3
               | (perceived) vs. 12.9 (actual), t < 1. As in Study 1,
               | perceptions of ability were positively related to actual
               | ability, although in this case, not to a significant
               | degree." Yet look at the graph for it, and again it shows
               | some seemingly large D-K effect.
               | 
               | And there's even more issues with D-K, and especially
               | experiment #1 (which is the one with the prettiest graph
               | by far), but that's outside the scope of this post. I'm
               | happy to get into it, if you are though. I find this all
               | just kind of shocking and exceptionally interesting! I've
               | referenced the D-K effect countless times in the past,
               | never again after today!
               | 
               | [1] - https://sci-hub.se/10.1037/0022-3514.77.6.1121
        
               | dahart wrote:
               | Yes yes yes! I'm in the very same boat, and came to an
               | epiphany that the ranking trick here, combined with some
               | subjective questions (ability to appreciate humor -
               | seriously!?), that these things hide almost everything
               | about actual skill. Not only does it amplify mistakes, it
               | also forces the participants to have to know something
               | about their cohort. Having to guess your ranking fully
               | explains the less than perfect correlation. It also
               | undermines all claims about competence and incompetence.
               | They're not testing skill, they're only testing ability
               | to randomly guess the skill of others.
               | 
               | What about the slight bias upwards? Well, what _exactly_
               | was the question they asked? It's not given in the paper.
               | They were polling only Cornell undergrads looking for
               | extra credit. What if the question somehow accidentally
               | or subtly implied they were asking about the ranking
               | against the general population, and then they turned
               | around and tested the answers against a small Cornell
               | cohort? I just went and looked at the paper again and
               | noticed that the descriptions of the ranking question
               | changed between the various "studies" with the first one
               | comparing to the "average Cornell student" (not their
               | experiment cohort!). The others suggest they're asking a
               | question about ranking relative to the class in which
               | they're receiving extra credit. Curiously study 4 refers
               | to the ranking method of study 2 specifically, and not 3.
               | The class used in study 4 was a different subject than 2
               | & 3. How they asked this question could have an enormous
               | influence on the result, and they didn't say what they
               | actually asked.
               | 
               | Cornell undergrads are a group of kids that got accepted
               | to an elite school and were raised to believe they're
               | better than average. Whether or not all people believe
               | they're better than average, this group was primed for
               | it, and also have at least one piece of actual evidence
               | that they really are better than average. If these were
               | majority freshmen undergrads, they might be especially in
               | calibrated to the skills of their classmates.
               | 
               | In short, the sample population is definitely biased, and
               | the potential for the study to amplify that bias is
               | enormous. The paper uses suggestions and jumps to
               | hyperbolic conclusions throughout. I'm really surprised
               | that evidence and methodology this weak claims to show
               | something about all of humanity and got so much
               | attention.
        
             | dahart wrote:
             | > The D-K hypothesis is that self assessment and actual
             | performance are less correlated for weaker than higher
             | performing individuals.
             | 
             | I'm not sure that's an accurate summary. The correlation of
             | the perceived ability is effectively the slope of the line,
             | and the slope is more or less constant. The paper suggests
             | that the _bias_ of the bottom quartile is higher than the
             | bias of the upper quartile, not that the correlation is any
             | different.
             | 
             | But it's strange that the DK paper makes an example of the
             | lower performers, since the bias of the scores appears to
             | be constant; it appears the high performers have pretty
             | much the same bias as the low performers -- it's a
             | straightish line that goes through 65% in the middle rather
             | than the expected straight line that goes through 50% in
             | the middle. If the 'high performers' had a different bias,
             | then the line wouldn't be so straight.
        
               | JamesBarney wrote:
               | Yeah my understanding is
               | 
               | 1. the slope of self-perceived ability is lower than
               | actual ability
               | 
               | 2. The y intercept is dependent on difficulty of test
               | 
               | Therefore with an easier test the better testies are more
               | accurate, and with a very difficult test the worse
               | testies are more accurate because of where the lines
               | intersect. Meaning DK is artifact of test difficulty.
               | 
               | This also means if the test was difficult enough you
               | could create a bizarro-DK effect where the better testies
               | were less accurate.
        
               | dahart wrote:
               | For 1, the data is based on guessing, so it's zero
               | surprise that self-perceived ability doesn't correlate
               | perfectly with actual ability. It would be extremely
               | surprising and unbelievable if the slopes were the same,
               | right?
               | 
               | For 2, the DK paper shows one thing, but the replication
               | attempts have show this effect doesn't even exist for
               | very complex tasks, like being an engineer or lawyer. The
               | DK effect doesn't generalize, and doesn't even measure
               | exactly what it claims to measure, which is why we don't
               | need to speculate about the bizarro-DK reversal effect -
               | we already have evidence that it doesn't happen, and we
               | already have a big enough problem with people mistakenly
               | believing that DK showed an inverse correlation between
               | confidence and competence, when they did no such thing.
        
             | dragonwriter wrote:
             | > This is a different issue than D-K.
             | 
             | No, its literally the D-K finding.
             | 
             | > The D-K hypothesis is that self assessment and actual
             | performance are less correlated for weaker than higher
             | performing individuals
             | 
             | That may have been a _hypothesis_ Dunning and Kruger had at
             | some point, its not the effect they actually identified
             | from their research. But I don 't think its even that, its
             | an "effect" people have associated with D-K because they
             | heard discussion of the D-K research that got dustorted at
             | multiple steps from the original work, and then that
             | misunderstanding, because it made a nice taunt, replicated
             | widely and became popular.
        
               | dahart wrote:
               | To be fair, the paper itself uses hyperbolic language
               | that completely distorts it's own data. It heavily pushes
               | and leads the reader into one possible dramatic
               | explanation for their results, while downplaying and
               | ignoring a bunch of other less dramatic explanations.
               | Using words like "incompetent" are almost completely
               | unfounded based on what they actually did. Section
               | headings like "competence begets calibration", "it takes
               | one to know one", and "the burden of expertise" are
               | uncurious platitudes and jumping to conclusions. I'm
               | kind-of stunned at the popular longevity of this paper
               | given how unscientific is it and how often replication
               | results with better methodology have shown conflicting
               | results.
        
               | somenameforme wrote:
               | This is straight from their paper [1]:
               | 
               | "Perhaps more controversial is the third point, the one
               | that is the focus of this article. We argue that when
               | people are incompetent in the strategies they adopt to
               | achieve success and satisfaction, they suffer a dual
               | burden: Not only do they reach erroneous conclusions and
               | make unfortunate choices, but their incompetence robs
               | them of the ability to realize it."
               | 
               | [1] - https://sci-hub.se/10.1037/0022-3514.77.6.1121
        
           | singingfish wrote:
           | > assuming the sample sizes are statistically significant
           | 
           | Nitpick: should read "assuming the sample sizes provide
           | sufficient statistical power"
        
         | IAmGraydon wrote:
         | So what we have here is some scientists trying to prove that
         | the Dunning-Kruger effect doesn't exist and instead they give
         | us a perfect example of the Dunning-Kruger effect.
        
           | wyldfire wrote:
           | > The irony is that the situation is actually reversed. In
           | their seminal paper, Dunning and Kruger are the ones
           | broadcasting their (statistical) incompetence by conflating
           | autocorrelation for a psychological effect. In this light,
           | the paper's title may still be appropriate. It's just that it
           | was the authors (not the test subjects) who were 'unskilled
           | and unaware of it'.
        
         | t_mann wrote:
         | I was surprised by the figure from the original article, imho
         | that's the strongest rebuttal: perceived ability grows strictly
         | mononotonically with actual ability, no sign of the famous non-
         | monotonic U-curve. Yeah, the slope is less than one, and it
         | grows a bit faster from the second to the third quartile than
         | from the first to the second, but none of that changes the fact
         | that people tend to slot themselves correctly. The chart is
         | interesting in that it confirms that everyone perceives
         | themselves to be slightly above average in terms of ability,
         | which of course can't be true in practice. But what it also
         | shows is that when they think they'll be below or above that
         | (false) baseline, they're actually correct about it. So pretty
         | much the exact opposite of what the Dunning-Kruger effect
         | claims.
        
           | jampekka wrote:
           | The slope will be less than one if there's e.g. any random
           | guessing in the test even if the self-assesment is perfect
           | (apart from whether they know if their guess is right or
           | wrong of course) [1].
           | 
           | I think this is the effect that the post is dancing around,
           | but doesn't seem to really understand (and how
           | "autocorrelation" and indepence are discussed is very
           | nonstandard to be charitable).
           | 
           | [1] https://en.m.wikipedia.org/wiki/Regression_dilution
        
             | t_mann wrote:
             | I agree, the statistical analysis in the original post
             | makes me very uneasy. I think it could be a case where the
             | conclusion is correct, even though argument isn't
             | necessarily.
             | 
             | And yes, the fact that the slope is less than one is fairly
             | uninteresting.
             | 
             | The real problem here is that the Dunning-Kruger effect, as
             | it's classically stated, claims that if you asked four
             | people to rank themselves in terms of ability, the result
             | would be 1-3-2-4, ie the people who know a little would put
             | themselves above the people who know a lot but aren't quite
             | experts. The problem is the data shows that they'd actually
             | rank themselves correctly 1-2-3-4. But such a boring
             | finding probably wouldn't have made the authors quite as
             | famous, which might be why they tried bit of data mangling,
             | and they found this really cool story that everyone would
             | secretly love to be true.
             | 
             | Which is a shame, because I think the fact that the mean of
             | perceived ability is too high (and the variance too low) is
             | really interesting too, and perfectly supported by the raw
             | data.
        
               | jampekka wrote:
               | Yes. The methodology in the original D&K is quite shoddy,
               | and vulnerable to e.g. good old regression to the mean,
               | and the interpretations are too strong. This is sadly
               | very common in psychology (and many other fields I'd
               | guess) and even researchers don't care so much if the
               | story is juicy enough.
               | 
               | The pop version of the DK effect seems to be something
               | like a 4-3-2-1 ranking, which is obviously not supported
               | by the data.
        
               | tempestn wrote:
               | But they wouldn't. They'd rank themselves something like
               | 1,2,2,3. We're not dealing with a population
               | collaborating to all rank themselves in order, but rather
               | each person individually estimating where their abilities
               | lie in the population.
               | 
               | The point is that if you ask someone in the, say, 5th
               | percentile of ability what their ability is compared to
               | the population, they might say 25th percentile. Ask
               | someone at the 25th,and they might say 40th. At the 40th
               | they could say 55th. And at the 90th, maybe they'll say
               | 80th. So yes, if you order their guesses, they will be in
               | roughly the correct order. But, crucially, that doesn't
               | mean that they are ranking themselves correctly!
        
           | tempestn wrote:
           | > The chart is interesting in that it confirms that everyone
           | perceives themselves to be slightly above average in terms of
           | ability, which of course can't be true in practice.
           | 
           | No, everyone biases their self-assessments _toward_ a point
           | slightly above the mean. That 's not the same as saying
           | everyone thinks they're slightly above average, nor that
           | people's self-assessments have no predictive power
           | whatsoever. The lowest performers still think they're below
           | average, just not as much as they should. The highest
           | performers still think they're considerably above average.
           | But they all have a bias toward (slightly above) the middle.
           | 
           | So yes, people are generally correct in the direction that
           | they deviate from that median self-assessment, but that just
           | shows that people's self-assessments aren't completely
           | without basis. Which D-K certainly didn't claim.
        
             | t_mann wrote:
             | D-K claim a non-monotonic relationship, which simply isn't
             | supported by that data, as you yourself point out: people
             | rank themselves correctly (ordinally). I didn't mean to say
             | that all self-assessments are the same, if that was the
             | misunderstanding. My point is that the self-assessments
             | indeed are meaningful, even more so than D-K claim.
        
               | RevEng wrote:
               | Check the original paper by D-K. Fix only focused on the
               | first plot which has a monotonically increasing trend.
               | The later plots show varying degrees of nonmonotonicity,
               | though sadly they don't include error bars to indicate
               | how statistically significant the differences between
               | groups is.
        
             | zeroonetwothree wrote:
             | But we don't know their true ability, only the results on
             | one test. It could be they accurately predicted their
             | ability but because of random chance they did better/worse
             | than their guess. Then you would get the exact data that is
             | observed.
        
               | lokar wrote:
               | I thought they were estimating their performance on the
               | test relative to others. There was no "real world"
               | element.
        
         | raincole wrote:
         | > And D-K did indeed show a correlation between the two, just
         | not as strong of one as we would expect. Rather, they showed a
         | consistent bias. That's the interesting result.
         | 
         | "D-K effect in its original form" vs "D-K effect in pop
         | culture" is the biggest D-K effect live example. Of course I
         | mean D-K effect in pop culture here.
         | 
         | Interestingly, the "interesting" part of the original result is
         | that the correlation between actual performance and perceived
         | performance is less than people intuitively think.
         | 
         | But as the "D-K effect in pop culture" spreads, people's
         | collective intuition changes. Today if you explained the
         | original D-K effect to a random person on the internet, they
         | might find it interesting because the correlation is _greater_
         | than they thought: they thought the correlation would be
         | negative!
        
           | hoosieree wrote:
           | D-K effect effect is almost as entertaining as the Butterfly
           | effect effect[1].
           | 
           | [1]: Which is the far-away effect attributed to having
           | watched the movie The Butterfly Effect.
        
         | expazl wrote:
         | > But in reality, it would be very surprising if performance
         | and evaluation of performance were independent. We expect
         | people to be able to accurately rate their own ability.
         | 
         | This seems to be attacking an irrelevant point in the analysis.
         | The argument goes as such: Researcher carries out all the
         | studies needed to prove the Dunning-Kruger effect, then trips
         | and drops all the results into a vat of acid. But he's ashamed
         | and quickly generates random numbers for the results, and
         | somehow the data still proves the Dunning-Kruger effect. Not
         | just that, repeating the same exercise again and again with
         | completely random data leads to the same result, the effect is
         | always present. So is the Dunning-kruger effect so powerful
         | that it exists in the very fabric of the universe devoid of any
         | human interaction, or is something amiss?
         | 
         | In this situation we are forced to look at the test we have
         | that concluded from the data that the Dunning-Kruger effect
         | exists and conclude that it's a bad test, we need something
         | different.
         | 
         | You seem to be arguing "oh no, you can't look at random data,
         | because we wouldn't expect the experiment to yield random
         | data!". But that doesn't work as an argument for why the test
         | should still be considered good. If it's supposed to have any
         | worth, then the test has to be able to come to one of two
         | conclusions: The Dunning-Kruger effect exists or the Dunning-
         | Kruger effect doesn't exist. And if the test is set up such
         | that for positive experimental results, or just random noise,
         | it comes out in the positive, and only in extremely unlikely
         | and a narrow band of the possible outcome space come out
         | negative, then the test is bad.
         | 
         | If we want to try to rephrase everything a bit to make the
         | issue much clearer. Lets set up a coin-toss competition between
         | ChatGPT and a group of 100 people. Each participant goes 1:1
         | against ChatGPT where both parties toss a coin and whoever has
         | the most heads wins, on draws toss again, in case a pair goes
         | into an infinite loop that doesn't end before our allotted
         | trial time, they get removed from the study. A human assistant
         | tosses on the behalf of ChatGPT on account of it not having
         | arms yet.
         | 
         | Now we ask each person how they would rate their ability vs.
         | ChatGPT in a coin-toss, everyone answers 50/50, for obvious
         | reasons.
         | 
         | So we run the experiment, the line for "ability plotted against
         | ability" is a straight diagonal line. The line for estimated
         | ability vs actual ability is a a straight flat line at 50%.
         | 
         | Eureka! To the presses! we have just proven the Dunning-Coin-
         | Kruger effect! People who are worse at throwing coins tend to
         | over estimate their ability, and people who are better at
         | throwing coins underestimate their ability! What a marvelous
         | bit of psychological insight, it really tells us something
         | about how the human mind works, and has broader insights about
         | our society! But naturally we always expected this outcome,
         | people who are bad a tossing coins are dumb and of cause they
         | are overconfident, not like people who are good at tossing
         | coins who have a remarkable Intellect about themselves and are
         | therefore humble in their self estimation... and so on and on
         | about preconceived biases that have nothing to do with the
         | actual test we performed.
        
         | cool_dude85 wrote:
         | Yeah this must be some high end satire where the guy Dunning-
         | Krugers up an explanation of Dunning-Kruger. Since even an
         | economist is supposed to understand ANOVA I have to conclude
         | that this article is a joke.
        
           | nickelpro wrote:
           | The incorrect usage of "autocorrelation" made me double take
           | and wonder if this was satire the first time it was posted.
        
         | xpe wrote:
         | The rebuttal by Daniel (andersource.dev) is useful, generally.
         | However, when he writes ...
         | 
         | > The history of statistics is well out of scope for this post,
         | but very succinctly, my answer is that statistics is an attempt
         | to objectively quantify surprise.
         | 
         | ... I cannot agree. Statistics is not this; it is much broader.
         | One may or may not be surprised by particular statistics, sure,
         | but there are _specific_ concepts that map more directly to
         | surprise, such as entropy from information theory.
        
           | vasco wrote:
           | If entropy is defined as statistical disorder than I think
           | the definition of "quantifying surprise" is great.
        
             | xpe wrote:
             | You aren't suggesting that statistics as a field defined a
             | notion of "order", prior to thermodynamic entropy or
             | Shannon entropy, are you? To me, that would be circular.
             | 
             | Based on my knowledge, it seems likely the first published
             | quantification of disorder arose in the study of
             | thermodynamic entropy. Later, Shannon defined entropy in
             | information-theoretic terms, independent of physics. It can
             | be interpreted as a notion of 'surprise' or what he called
             | information.
             | 
             | My claims:
             | 
             | First, the field of statistics is _not_ historically rooted
             | around concepts such as: "order/ordering" or
             | "information/surprise".
             | 
             | Second, the field of statistics, as a directed graph of
             | abstractions, is not rooted in ordering nor surprise.
             | 
             | Third, in teaching statistics, practically or conceptually,
             | the concept of surprise isn't foundational. The idea of
             | _variation_, on the other hand, is central.
             | 
             | I'll add a few more comments. To talk meaningfully about
             | 'surprise', there has to be a stated or assumed baseline or
             | 'expectation' about what is _not_ surprising. For Shannon,
             | if the probability of an event is certain, there is no
             | surprise. Probability and statistics work together, but
             | they are conceptually separable. This is particularly clear
             | when you compare descriptive statistics with, say,
             | probabilities over combinatorics problems.
        
               | vasco wrote:
               | > The field of statistics is not organized around
               | concepts relating to "order" or "ordering".
               | 
               | Sure but reduced to the simplest form, statistics are
               | used to predict things, the most basic thing in the
               | Universe being "is this particle gonna stay put or move a
               | little in a given direction", which is related to
               | entropy, so to me intuitively these two things seem very
               | related. The fact that in statistics we don't use the
               | words "order" and "disorder" doesn't mean it doesn't
               | reduce to that.
               | 
               | Btw I'm an electrical engineer that isn't amazing at
               | statistics or thermodynamics so beware I might just be
               | talking nonsense.
        
               | xpe wrote:
               | > ... reduced to the simplest form, statistics are used
               | to predict things
               | 
               | Inferential statistics is not the simplest kind of
               | statistics. Descriptive statistics are both simpler and
               | foundational for inference.
               | 
               | P.S. I should say that I am a bit of a stickler regarding
               | discussions along the lines of e.g. "these things are
               | related". Yes, many things are related, but it is really
               | nice when we can clearly tease things apart and specify
               | what depends on what.
        
         | zeroonetwothree wrote:
         | This rebuttal seems weak because it's using unbounded datasets
         | (population). A big issue with the DK research is using bounded
         | data (test scores). For example if I get 100% right it's
         | mathematically impossible to have overestimated.
        
         | kkoyung wrote:
         | I agree. Using the terminologies from the author, the DK paper
         | was trying to show that dy/dx < 1 = dx/dx, rather than the
         | correlation of y-x vs x.
        
         | bradley13 wrote:
         | I have to agree. You cannot separate the statistical analysis
         | from the _meaning_ of the study. In the article, the author 's
         | random data is _exactly_ an extreme replication of Dunning-
         | Kruger. Why? Because, in his random data, people with low test
         | scores almost always overestimate their ability, while people
         | with high test scores almost always underestimate.
         | 
         | That is precisely the premise of the Dunning-Kruger effect. The
         | fact that the original Dunning-Kruger paper shows a less
         | extreme effect? That just shows that people are slightly better
         | than random at estimating their own abilities - but still
         | nowhere accurate.
        
           | jgilias wrote:
           | So that's what the Dunning-Kruger effect basically boils down
           | to, right? That people in general are just bad at assessing
           | their skills.
        
         | mnky9800n wrote:
         | I really appreciate that he points out that the use of the term
         | in the original article of autocorrelation is nonstandard.
         | Because it is nonstandard but it's a rather flippant way to
         | dismiss the rest of the article.
        
         | somenameforme wrote:
         | I found two very interesting things in the original D-K paper
         | [1] that challenge your otherwise reasonable point. The first
         | is that the graph everybody associates with D-K, the one
         | showing the beautifully perfect linear result, is one of 4. The
         | other 3 graphs are far messier, and indeed the paper discusses
         | the fact that the correlations tend to be weaker and in some
         | cases nonexistent.
         | 
         | The second thing is that that beautiful perfectly linear graph
         | everybody references, was measuring 'humor'!!! Humor is going
         | to be something that's all but guaranteed to create near
         | complete noise between self evaluation and 'expert'
         | (professional comedians in this case) evaluation. And if
         | everybody is essentially randomly guessing on their
         | performance, then it will always show an extremely strong D-K
         | effect with the top performers underestimating themselves, and
         | the bottom performers overestimating themselves.
         | 
         | The experiment that most simply and directly measured
         | 'intelligence', without complicating matters in a potentially
         | confounding fashion, is #2. It was based on logic problems from
         | the LSAT. And the resultant graph is just all over the place.
         | Quoting the paper's evaluation of this study:
         | 
         | ---
         | 
         | "Participants did not, however, overestimate how many questions
         | they answered correctly, M = 13.3 (perceived) vs. 12.9
         | (actual), t < 1. As in Study 1, perceptions of ability were
         | positively related to actual ability, although in this case,
         | not to a significant degree."
         | 
         | ---
         | 
         | This is really looking like another Zimbardo.
         | 
         | [1] - https://sci-hub.se/10.1037/0022-3514.77.6.1121
        
           | mike_hearn wrote:
           | Yes, D-K is another one of those "classic" psychology studies
           | that everyone knows about but is actually rubbish and
           | shouldn't be cited for anything. You're not the first to
           | notice this, I pointed it out on HN last year:
           | 
           | https://news.ycombinator.com/item?id=31119836
           | 
           | At some point I should write up a proper blog post on the D-K
           | paper in the hope that it eventually surfaces in search
           | results, because it's past time for this paper to be put to
           | bed. The problems you cite aren't even the full set. The
           | whole thing was (of course) a study on a handful of psych
           | undergrads, their selection method for expert comedians has
           | circular logic in it and it all goes downhill from there.
        
         | gwd wrote:
         | > And D-K did indeed show a correlation between the two, just
         | not as strong of one as we would expect. Rather, they showed a
         | consistent bias. That's the interesting result.
         | 
         | Right, so:
         | 
         | 1. If the data were truly random, with no correlation, we'd
         | expect the line to be straight across the middle, with the
         | first quartile at 50% and the last quartile also at 50%
         | 
         | 2. If the data were 100% accurate and precise [1], we'd expect
         | the line to be diagonal, with the first quartile at 12.5% and
         | the last quartile at 87.5%.
         | 
         | 3. If the data were accurate but not precise (i.e., basically
         | right but with some randomness built in), we'd expect the line
         | to be in between #1 and #2 -- basically, changing from #2 into
         | #1 as the randomness increases, but with the intersection at
         | 50%.
         | 
         | That's because someone in the 2nd percentile _can 't_
         | underestimate themselves as much as they can overestimate
         | themselves, and someone in the 98th percentile can't
         | oversetimate themselves as much as they can underestimate
         | themselves. But in any case, the "0 bias" case looks symmetric.
         | 
         | 4. But what we actually see is none of the above: we see the
         | 1st quartile being at (eyeballing the chart) 60%, and the last
         | quartile at 75%.
         | 
         | That shows that there is indeed some ability for self-
         | evaluation, but it's off. The fourth quartile could indeed just
         | be random, the effect of clipping at the top meaning that the
         | upper quartile _cannot_ overestimate themselves as much as they
         | understimate themselves. But there 's no getting around the
         | fact that the bottom quartile are overestimating themselves.
         | 
         | [1] https://en.wikipedia.org/wiki/Accuracy_and_precision
        
           | bitshiftfaced wrote:
           | > But there's no getting around the fact that the bottom
           | quartile are overestimating themselves.
           | 
           | It's because higher competence goes along with more accurate
           | self-assessment but not less bias. So the high performers
           | underestimate with less magnitude than the low performers
           | overestimate, but they both under and over estimate
           | themselves with the same frequency.
        
       | dmbche wrote:
       | Isn't it ironic that they fooled themselves?
        
         | ulizzle wrote:
         | It was actually hilarious but I don't think many people here
         | got the irony
        
           | DangitBobby wrote:
           | Literally the closing paragraph of TFA is about that exact
           | irony.
        
             | robwwilliams wrote:
             | And here it is from OP (which made me laugh--right or
             | wrong). And leave your hubris at home unless you rate
             | yourself a damn fine statistician ;-)
             | 
             | "However, there is a delightful irony to the circumstances
             | of their blunder. Here are two Ivy League professors7
             | arguing that unskilled people have a 'dual burden': not
             | only are unskilled people 'incompetent' ... they are
             | unaware of their own incompetence.
             | 
             | "The irony is that the situation is actually reversed. In
             | their seminal paper, Dunning and Kruger are the ones
             | broadcasting their (statistical) incompetence by conflating
             | autocorrelation for a psychological effect. In this light,
             | the paper's title may still be appropriate. It's just that
             | it was the authors (not the test subjects) who were
             | 'unskilled and unaware of it'.
        
       | lencastre wrote:
       | Wasn't this DK effect already debunked?
        
         | jahewson wrote:
         | I don't know much about it but I'm sure you're right.
        
         | hasch wrote:
         | Article mentions 2016 somewhere. They explain a bit on top of
         | that, with more depth ... at least my rough take on this
        
         | xbar wrote:
         | Yes. This article highlights the 2016, 2017 and 2020 debunkings
         | of DK. But it hangs on as an oft repeated scientific fallacy.
         | 
         | The fact that anyone has to ask if it has debunked shows how
         | desirable some people find the DK myth. Even in the comments
         | here, people are not willing to be skeptical of DK. That's
         | interesting psychology.
        
         | mrkeen wrote:
         | Yes but some claim to have debunked the debunking also. [1]
         | 
         | This paper (2023) claims "the magnitude of the effect was
         | minimal; bringing its meaningfulness into question." [2]
         | 
         | [1] https://andersource.dev/2022/04/19/dk-autocorrelation.html
         | 
         | [2]
         | https://www.sciencedirect.com/science/article/abs/pii/S01602...
        
       | pie_flavor wrote:
       | This take is a perfect example of Dunning-Kruger itself,
       | ironically. https://andersource.dev/2022/04/19/dk-
       | autocorrelation.html
        
         | dahart wrote:
         | How so? DK shows a positive correlation between confidence and
         | competence.
        
       | mewpmewp2 wrote:
       | My take on Dunning Kruger:
       | 
       | 1. People really like the idea of smart people being humble and
       | arrogance meaning stupidity, so they like to believe that DK is
       | true, and they like to repeat this.
       | 
       | 2. Some smart/skilled people are humble, some are arrogant.
       | 
       | 3. Some smart/skilled people underestimate their skills, some
       | overestimate.
       | 
       | 4. Some stupid people are humble, some are arrogant.
       | 
       | 5. Some stupid people underestimate their skills, some
       | overestimate.
       | 
       | Overall, even if there is a correlation, you can't tell by just
       | arrogance of a person whether we are dealing with DK or whether
       | it's an effect at all. People's personalities, skills and
       | everything are a bit more complex than that.
       | 
       | Overall bringing DK up seems like some sort of social
       | justice/fairness effort rather than something that is actually
       | true given any situation where someone is arrogant.
        
         | spacebacon wrote:
         | Maybe this shows how effective dumb people are at keeping smart
         | people hammered down with thought stopping arguments.
        
       | greenthrow wrote:
       | Lmao this article is an example of Dunning-Kruger at work. The
       | author thinks they have found and are revealing something but
       | they are just failing to fully understand the subject. Amazing.
        
         | flappyeagle wrote:
         | Try reading the article again and understanding the argument.
        
           | greenthrow wrote:
           | Oh I did. Completely.
        
             | mattxxx wrote:
             | Wait... but what if this is DK? What if my comment is DK??
        
       | joefourier wrote:
       | So from my understanding, the Dunning-Kruger Effect paper doesn't
       | show the distribution of the perceived test scores nor the
       | standard deviation, only an average, which rises with actual test
       | score level.
       | 
       | If they showed the spread bar in each bin, you could form very
       | different conclusions. Do low skilled people consistently
       | estimate their score at around 60, or do they give effectively
       | random results centred around 60?
       | 
       | Assuming the latter, it could mean that low skilled individuals
       | are completely unable to evaluate their performance while higher
       | skilled people are slightly better at it but still not very good,
       | giving a slightly positive correlation which... is very distinct
       | from what the DK effect implied.
        
       | xanderlewis wrote:
       | Naive take: I've always felt like Dunning-Kruger is just the
       | result of the fact that when guessing the value of anything
       | people tend towards some common mean, and so if the true value is
       | low your guess tends to be high, and vice versa. This assumes
       | nothing about what is being guessed, but does assume (perhaps
       | wrongly) that there is a commonly believed mean value and that
       | people tend to imagine they are close to it.
        
         | wavemode wrote:
         | That's essentially the plain-language interpretation of what
         | the author of this article is pointing out - when you plot
         | (actual score) against (difference between test score and
         | actual score), you will always find a trend that
         | underperformers overestimate and overperformers underestimate -
         | for the exact reason you state.
        
       | r0uv3n wrote:
       | The discussion between Nicolas Boneel and the author in the
       | comments of the article is interesting and Nicolas expresses the
       | doubts I had when reading this. The whole point of the DK effect
       | is that people are bad at estimating their skill, so if you
       | assume that they randomly guess their skill level then of course
       | you will replicate the results.
       | 
       | The correct model for a world without DK should be something like
       | (estimated test scores)=(actual test scores)+noise, and then the
       | only form of spurious DK you'd expect is caused by the fact that
       | there's a minimum and maximum test score. But this effect would
       | be proportional to the variance of the noise, and I assume the
       | variance on the additional dataset is too low to fully understand
       | the effect seen there.
       | 
       | Also, in this model on average everyone should still guess
       | correctly in which half of the distribution they are, but even
       | the bottom quartile seemed to estimate their abilities as above
       | the 50th percentile
        
         | svnt wrote:
         | Just because the data appear random doesn't mean you've gotten
         | at the cause though.
         | 
         | From those charts it could equally be low skill throughout, or
         | something nuanced like lack of skill at estimating at the
         | bottom, improving skill in estimating through the middle, and
         | high skill and learned modesty at the top.
        
         | Jensson wrote:
         | > Also, in this model on average everyone should still guess
         | correctly in which half of the distribution they are, but even
         | the bottom quartile seemed to estimate their abilities as above
         | the 50th percentile
         | 
         | Depends on the noise applied. If the noise is -10% to +100% for
         | everyone then you get roughly the graph Dunning-Kruger got. So
         | there is no reason to believe that the best are better at
         | estimating their abilities, just that you can't estimate your
         | own rank as better than the best.
        
           | tempestn wrote:
           | That's a great observation. For what it's worth though, it
           | does seem logical to me that the best would also be best at
           | estimating their skill. Not necessarily because they're
           | better at it per se (though there's likely some of that too,
           | for the reasons originally posited by D-K), but also because
           | they have an easier problem to solve. When you know something
           | well, it's fairly obvious that that's the case. (Think of the
           | experience of acing a math test. It's entirely possible you'd
           | know you answered everything correctly.) When you struggle
           | somewhat though, it's much more difficult to estimate how
           | much you're struggling compared to how others would fare.
        
         | jampekka wrote:
         | The correct model is probably (estimated test score +
         | estimation noise) = (actual test score + test noise). The test
         | contains a random element, e.g. guessing, that the person can't
         | estimate.
         | 
         | https://en.m.wikipedia.org/wiki/Regression_dilution
         | 
         | https://en.m.wikipedia.org/wiki/Errors-in-variables_models
        
       | hn_throwaway_99 wrote:
       | Previous discussion:
       | https://news.ycombinator.com/item?id=31036800
        
       | bitshiftfaced wrote:
       | The authors did "X - Y vs X," but that's not even the biggest
       | problem. The authors subtracted two measures that had been
       | transformed and bounded from 0 to 1 (think percentiles). What
       | happens at the extremes of those bounds? How much can your top
       | performers overestimate their performance? They're almost at 1
       | already, so not much. If they were to overestimate and
       | underestimate at the same rate and by the same magnitude in terms
       | of raw values, the ceiling effect on the transformed values means
       | that the graph will make it look like they underestimate more
       | often. The opposite problem happens for the worst performers.
       | 
       | See "Random Number Simulations Reveal How Random Noise Affects
       | the Measurements and Graphical Portrayals of Self-Assessed
       | Competency." Numeracy 9, Iss. 1 (2016), particularly figures 7,
       | 8, and 9.
        
         | anonymouskimmer wrote:
         | This can be dealt with to an extent by truncating the extreme
         | ends. Even the middle quartiles in the graphs in the linked
         | article show the same trends.
        
           | bitshiftfaced wrote:
           | Not that simple. This article demonstrates why enforcing
           | bounds results in the changes in slope that you see in the
           | expected grades (figure 2 and 4): https://www.frontiersin.org
           | /articles/10.3389/fpsyg.2022.8401...
        
         | ImaCake wrote:
         | Thanks for stating just how much of a statistical minefield
         | this is. The reference does a great job showing just how wrong
         | the DK studies are. Unfortunately, most people have already
         | made up their minds and are happy to link conflicting blog
         | posts as evidence.
        
           | Probiotic6081 wrote:
           | Probably in another year or two they'll find another
           | statistic that will render the old one moot like again and
           | again.
        
           | concordDance wrote:
           | > wrong the DK studies are
           | 
           | The DK studies are not wrong, they are misinterpreted by
           | people who don't know what they're talking about (e.g. what
           | tge DK effect actually is), like this blogger.
           | 
           | "People have worse self assessment ability as their real
           | ability declines" would be a valid interpretation of the DK
           | data and notably would NOT be a valid conclusion from the
           | random data in the blog post.
        
             | ImaCake wrote:
             | You should read the reference we are discussing which makes
             | no such mistakes.
        
         | dclowd9901 wrote:
         | I think if people at all levels of skill were reasonably good
         | at measuring their own ability, we would see two curves that
         | roughly overlap. Instead we see the graph given.
         | 
         | The fact that random noise can generate a mean curve on the Y
         | axis doesn't mean DK doesn't exist. It just means DK's mean
         | self analysis resembles a middling random mean, which if you
         | think about it, makes sense. Most people will probably self
         | evaluate as average, regardless of their actual skill. This
         | means DK is right as rain.
        
           | expazl wrote:
           | > I think if people at all levels of skill were reasonably
           | good at measuring their own ability, we would see two curves
           | that roughly overlap. Instead we see the graph given.
           | 
           | Actually, due to the construction of the test, the ability to
           | evaluate your own absolute ability in a subject isn't
           | sufficient for the two lines to be able to overlap.
           | 
           | It's a percentile axis, so you need to be able to reasonably
           | accurately estimate the ability of everyone taking the test,
           | and where you fall in the quartile range of those
           | participants.
        
         | SamBam wrote:
         | Exactly, that was my thought. How would it be _possible_ to get
         | anything other than the D-K effect, even if it wasn 't just
         | averaging to the mean?
         | 
         | The lowest quartile can't say they're below the lowest
         | quartile, so any error at all will be counted as
         | "overconfidence." The top quartile can't say they're above the
         | top quartile, so any error at all will be counted as
         | "underconfidance."
        
           | anonymouskimmer wrote:
           | > Exactly, that was my thought. How would it be possible to
           | get anything other than the D-K effect, even if it wasn't
           | just averaging to the mean?
           | 
           | Quite easily with the method they demonstrate in the study in
           | figure 11. In that study test participants are not rating
           | themselves in terms of population percentages, but in terms
           | of the percentage correct they got on the test. In such a
           | case the test could be designed to have a huge ceiling that
           | even the most knowledgeable participants would have trouble
           | reaching. And could have such a low floor that even the least
           | knowledgeable participants would still get some answers
           | correct (unless they weren't even trying, which would allow
           | throwing out their data points).
           | 
           | With 20 questions you could have four gimmes and four
           | impossible questions, bounding the worst participants to
           | about 20% and the best to about 80%.
        
             | SamBam wrote:
             | Right. To clarify, I meant: with the original study design,
             | how could they not have gotten the result they did? (And
             | that's rhetorical.)
        
               | anonymouskimmer wrote:
               | It would have been noteworthy in the original design if
               | more than one group of participants were, on average,
               | within their quartiles on the guessing. I also find it
               | noteworthy that the average guess of the lowest quartile
               | is lower than the average guess of the second lowest
               | quartile, and on up the quartiles. On one hand this shows
               | some awareness of relative ability along a massively
               | smooshed logarithmic scale. On the other hand I wonder if
               | this laddering follows as the averages are split into
               | quintiles and deciles.
        
           | jmpeax wrote:
           | I wonder if estimating on the logit scale would solve this
           | problem.
        
         | dimask wrote:
         | The boundedness of the data is also the main argument here
         | https://www.frontiersin.org/articles/10.3389/fpsyg.2022.8401...
        
         | wjnc wrote:
         | Lognormality of data is killing for the methods of social
         | scientists. If I were to hypothesize the underlying mechanism
         | then it would be that raw skill is lognormally distributed for
         | those taking tests at all (at least participating in these test
         | usually entails an implicit lower bound on IQ, but also from
         | the long tail of high performance in say sports), tests try to
         | measure performance but with a reduction to normality (or 4
         | categories) and then people estimate their own skills based on
         | their task and grading experiences which are also reduction to
         | a normal or constant distribution. ("I was always a B- in math
         | in high school and expect that to have distribution X and this
         | test to follow that distribution").
         | 
         | It's three places where reductions in dimensionality take place
         | both implicitly and explicitly. I don't envy researchers trying
         | to unpeel this onion. I do like the unraveling of all these
         | problems that pop up in pretty accessible designed experiments.
         | It makes for better understanding.
        
         | skue wrote:
         | Dunning himself addressed this back in 2011:
         | 
         | > 4.1. Regression to the mean
         | 
         | > The most common critique of our metacognitive account of lack
         | of self-insight into ignorance centers on the statistical
         | notion of regression to the mean. Recall from elementary
         | statistics classes that no two variables are ever perfectly
         | correlated with one another. This means that if one selects the
         | poorest performers along one variable, one will see that their
         | scores on the second variable will not be so extreme.
         | Similarly, if one selects the best performers along a variable,
         | one is guaranteed to see that their scores on the second
         | variable will be lower...
         | 
         | His full response is longer than is appropriate to quote here,
         | but you can easily find the chapter online.
         | 
         | Dunning, David (1 January 2011). "Chapter Five - The Dunning-
         | Kruger Effect: On Being Ignorant of One's Own Ignorance".
         | Advances in Experimental Social Psychology. Vol. 44. Academic
         | Press. pp. 247-296. doi:10.1016/B978-0-12-385522-0.00005-6.
         | ISBN 9780123855220
        
           | bitshiftfaced wrote:
           | The author continues,
           | 
           | > Some scholars observe that Fig. 5.2 looks like a regression
           | effect, and then claim that this constitutes a complete
           | explanation for the Dunning-Kruger phenomenon. What these
           | critics miss, however, is that just dismissing the Dunning-
           | Kruger effect as a regression effect is not so much
           | explaining the phenomenon as it is merely relabeling it. What
           | one has to do is to go further to elucidate why perception
           | and reality of performance are associated so imperfectly. Why
           | is the relation so regressive? What drives such a disconnect
           | for top and bottom performers between what they think they
           | have achieved and what they actually have? [...] As can be
           | seen in the figure, correcting for measurement unreliability
           | has only a negligible impact on the degree to which bottom
           | performers overestimate their per-formance (see also Kruger &
           | Dunning, 2002). The phenomenon remains largely intact.
           | 
           | The DK effect says roughly, "low performers tend to
           | overestimate their abilities." Yet when researchers analyzed
           | the data, they found that high and low performers
           | overestimate and underestimate with the same frequency. [0]
           | It's just that high performers are more accurate than low
           | performers (note how this statement differs from the DK
           | effect). Since you can completely explain the "X graph" by
           | the random noise combined with the ceiling effect, and since
           | beginners' self evaluations are noisier than experts', you
           | don't even need regression to the mean to explain why you get
           | the "X graph."
           | 
           | 0. Nuhfer, Edward, Steven Fleisher, Christopher Cogan, Karl
           | Wirth, and Eric Gaze. "How Random Noise and a Graphical
           | Convention Subverted Behavioral Scientists' Explanations of
           | Self-Assessment Data: Numeracy Underlies Better
           | Alternatives." Numeracy 10, Iss. 1 (2017): Article 4. DOI:
           | http://dx.doi.org/10.5038/ 1936-4660.10.1.4
        
       | chiefalchemist wrote:
       | DK for me is simply: "You don't know what you don't know." When
       | that happens, it's easy - surprise, surprise! - to misjudge your
       | skill level. In a way, it almost feels cruel to ask someone with
       | too few points of reference to say how much they know. The fact
       | is whether high, low, or in the middle...they are guessing.
       | 
       | On the other hand, with enough experience the depth and breadth
       | of your context improves, as it should. At that point, mis-self-
       | assessment is the result of arrogance, bravado, etc. That's a
       | different problem than simply not knowing.
       | 
       | If nothing else, DK has a case of apple v oranges.
        
       | thewanderer1983 wrote:
       | The Dunning-Kruger effect isn't as the article first quotes. It's
       | an effect that everyone experiences. We as humans tend to over
       | simplify things we don't understand well or at all. Therefore we
       | over estimate our expertise on these subjects. We also tend to
       | under estimate how much an expert on subjects we do know well.
       | Everyone does this. It's not just dumb people.
        
         | Jensson wrote:
         | > We also tend to under estimate how much an expert on subjects
         | we do know well
         | 
         | Any evidence for this, except Dunning-Kruger? To me it looks
         | like everyone overestimates themselves. There are a lot of
         | professionals who think they are undervalued and that people
         | worse than them gets all the rewards and fame.
        
       | vismwasm wrote:
       | The author measures the Dunning Kruger effect on his random data
       | exactly because he assumes it when generating his random data.
       | 
       | By modelling skill and perceived skill as uniform draws between 0
       | and 100, the unskilled (e.g. skill=0) will over-estimate their
       | skills (estimated skill = 50, the mean on the uniform random
       | variable) and the skilled (e.g. skill=100) will underestimate it
       | (as 50 as well, again the mean of the same random variable). The
       | only ones who will be correct (on average) are the average
       | skilled ones (skill=50).
        
       | beltsazar wrote:
       | I don't know if I agree that it's an autocorrelation, but one way
       | to explain The Dunning-Krugger Effect is by acknowledging this
       | simple fact:
       | 
       | Most people think that they are an average person, but they can't
       | be all average--there must be some people substantially below the
       | median. Therefore, those people must overestimate their
       | abilities.
       | 
       | This also applies to other aspects, such as attractiveness. Less
       | attractive people would overestimate their attractiveness.
        
         | anonymouskimmer wrote:
         | For all of the tests and rebuttals of the Dunning-Kruger effect
         | the people tested are not drawing from the totality of other
         | people, but trying to compare themselves solely to those who
         | also took the same test.
         | 
         | Anyone in a position to take such a test is almost guaranteed
         | to be above average compared to the general population (which
         | includes babies for intellectual tests, or the extremely old
         | for attractiveness tests).
         | 
         | I think this complicates personal evaluation.
        
       | salty_biscuits wrote:
       | It's just correlation, why do they keep calling it
       | autocorrelation.
        
         | stubish wrote:
         | auto correlation, or self correlation. A correlation between
         | different things may indicate an actual relation (smoking is
         | correlated with early mortality). A self correlation is a
         | tautology.
        
       | snarkconjecture wrote:
       | Nonstandard terminology warning: the author is using
       | "autocorrelation" in a way I've never seen before. There is a
       | much more common usage of "autocorrelation" to refer to the
       | correlation of a timeseries with itself (shifted by some amount).
       | 
       | If you use autocorrelation to refer to the thing in OP, you'll
       | probably confuse people who know statistics, and vice versa.
        
         | ketozhang wrote:
         | The more common experience with autocorrelations are with time
         | series, but what the author said is correct even in that
         | context. A time series autocorrelation relates the same time
         | series function at different times. At the simplest you plot
         | the arrays X vs X where X[i] = f(t[i]). You then may complicate
         | it further by some transformation g(X) vs X (e.g., moving
         | average).
        
         | epigramx wrote:
         | you might say the article author might have some ..dunning-
         | kruger on what autocorrelation is.
        
           | nothrowaways wrote:
           | L2 of dk
        
         | xpe wrote:
         | > Nonstandard terminology warning: the author is using
         | "autocorrelation" in a way I've never seen before.
         | 
         | That's a nice way of putting it. A more accurate description
         | would be: the author is butchering the key essence of
         | autocorrelation, since they don't clearly mention that it is a
         | temporal relationship!
         | 
         | > What is autocorrelation?
         | 
         | > Autocorrelation occurs when you correlate a variable with
         | itself.
         | 
         | Groan.
         | 
         | A standard definition is:
         | 
         | > Autocorrelation refers to the degree of correlation of the
         | same variables between two successive time intervals. It
         | measures how the lagged version of the value of a variable is
         | related to the original version of it in a time series.
         | Autocorrelation, as a statistical concept, is also known as
         | serial correlation.
        
         | gnicholas wrote:
         | What term is appropriate to describe what the author is
         | referring to?
        
       | anonymouskimmer wrote:
       | > If the Dunning-Kruger effect were present, it would show up in
       | Figure 11 as a downward trend in the data (similar to the trend
       | in Figure 7). Such a trend would indicate that unskilled people
       | overestimate their ability, and that this overestimate decreases
       | with skill. Looking at Figure 11, there is no hint of a trend.
       | 
       | There certainly _is_ a hint of a trend. Why do people, when
       | visualizing data with a distinct trend, say that because the
       | "error bars" from a particular statistical test overlap zero that
       | no trend exists!?
       | 
       | Freshman _trend_ to over-confidence. Grad students _trend_ to
       | under-confidence. Undergrads in general _trend_ to over-
       | confidence (though this trend decreases as year in school
       | increases), and post-graduates, whether grad students or
       | professors, trend to under-confidence.
       | 
       | These "trends" are not statistically significant, but they
       | certainly are a trend!
       | 
       | Also, the random data distribution in figure 9 doesn't show the
       | same trends as Dunning-Kruger's curve in figure 2. Perhaps there
       | is at least one psycho-social mechanism here worth investigating?
        
         | mrkeen wrote:
         | > These "trends" are not statistically significant, but they
         | certainly are a trend!
         | 
         | This is an oxymoron.
        
           | Dylan16807 wrote:
           | Oxymorons only sound contradictory on a surface level.
           | 
           | Something "certainly" being a "trend" is the definition of
           | statistical significance, so this is a straight up
           | contradiction.
        
             | anonymouskimmer wrote:
             | See here: https://news.ycombinator.com/item?id=38416858
             | 
             | "Trend" has multiple meanings. Statistics doesn't get to
             | claim all of the meaning.
        
           | anonymouskimmer wrote:
           | Show how.
           | 
           | I place mechanistic theory prior to statistics in science.
           | Mechanistic theory can be tested, statistics are a kind of
           | test.
           | 
           | If a statistically-insignificant result shows consistent,
           | though non-significant deviations, such as the kind seen in
           | Figure 11, then it tells me it's worth investigating whether
           | mechanism(s) are explaining a very small portion of the
           | variation that will not, in itself, show up as statistically
           | significant, as it's being swamped by variation in other
           | parameters.
        
             | Dylan16807 wrote:
             | Consistency is a synonym for statistical significance. If
             | there's consistency beyond random alignment, then there
             | should be a statistical test you can apply over your data
             | to extract the signal.
             | 
             | You can extract surprisingly small signals relative to
             | variation in other parameters. But if it's _actually_
             | swamped, then it might not be real, so go get more data.
        
               | anonymouskimmer wrote:
               | > Consistency is a synonym for statistical significance.
               | 
               | So basically you're telling me that if I can visually see
               | a consistency that does not show up in their statistical
               | test, then they aren't running an appropriate statistical
               | test on what I'm seeing.
               | 
               | > But if it's actually swamped, then it might not be
               | real, so go get more data.
               | 
               | Even better to design other experiments.
        
               | Dylan16807 wrote:
               | > So basically you're telling me that if I can visually
               | see a consistency that does not show up in their
               | statistical test, then they aren't running an appropriate
               | statistical test on what I'm seeing.
               | 
               |  _Either_ they 're not doing the right statistics, _or_
               | it 's a "consistency" that is much more likely to show up
               | randomly than you naively expect, and the study needs to
               | be repeated or enhanced.
               | 
               | Sometimes you can see a pattern that's just a figment of
               | chance. See also: numerology, jelly bean xkcd
        
         | Dylan16807 wrote:
         | If they're actually error bars, you can shrink them with more
         | data. That will turn the hint of a trend into an observation of
         | a trend. If it wasn't random noise giving a fake hint.
        
           | anonymouskimmer wrote:
           | > If they're actually error bars, you can shrink them with
           | more data.
           | 
           | Assuming the new data has the same systemic or instrumental
           | bias as the old data. Even using a different test date could
           | skew results enough to widen the error bars.
        
       | abnry wrote:
       | If there is a linear relationship between test score (X, ability)
       | and test score self-assessment (Y, self-perception), then the
       | random variables are modeled as:
       | 
       | $$ Y \sim aX+b+N $$
       | 
       | Where N is some statistically independent noise, mean zero.
       | 
       | This means the covariance between them is
       | 
       | $$ Cov(Y-X,X) = E[ ((a-1)X+b+N -(a-1)E[X]-b) (X - E[X]) ] $$
       | 
       | Which is
       | 
       | $$ Cov(Y-X,X) = E[(a-1)(X-E[X])(X-E[X])] + E[N(X-E[X])]= (a-1)
       | Var[X] $$
       | 
       | To get a "DK effect" we need (a-1) < 0, or a < 1. If a=0, in the
       | case of the blog post, then this is absolutely true. If a=1
       | (which, along with b=0, is the ideal scenario), then this is
       | barely not true. If a > 1, then we'd have a whole new effect
       | about arrogant experts.
       | 
       | So the only thing that matters from this "auto-correlation
       | perspective" is the rate at which an individual's self-assessment
       | increases with their ability. As long as they underestimate the
       | increase, a "DK effect" will occur.
       | 
       | However, in the above analysis, we ignored the variable b. If a =
       | 0.8 and b=0, we'd never have the so-called "DK effect" even
       | though it matches the "auto-correlation perspective" because
       | everyone would underestimate their ability.
       | 
       | This tells me that the value of b matters. It is sort of like the
       | prior ability everyone assumes they have. What the DK papers
       | shows is that b > .5, which I think is in line with the spirit of
       | the popular interpretation of the "DK effect". People should not
       | be assuming they have, at a minimum, a capacity higher than the
       | average.
       | 
       | At the same time, the value b isn't insanely higher than .5,
       | which also makes me want to cut those unskilled and unaware some
       | slack. It "seems reasonable" to assume your baseline is average.
       | That can't be the case, but it feels intuitive.
        
       | concordDance wrote:
       | The author fails to make his point quite badly. Of course if
       | everyone's self assessment was random the bottom quartile would
       | overrate themselves! And that would be half of the Dunning-Kruger
       | effect and we could truthfully say "the bottom quartile of people
       | overrate themselves"!
       | 
       | The other part where those at the top have a better idea or where
       | they rank noticeably does not come out in his toy example.
       | 
       | Honestly, he comes across as not having the slightest
       | understanding of how people interpet those graphs...
        
       | im3w1l wrote:
       | It's fascinating how great Elo and similar ranking systems are at
       | curbing DK. You just get a number, and that's how good (bad) you
       | are. It's incredibly precise too, there's just no arguing with
       | it.
       | 
       | Also since the topic is D-K I'm a bit scared that I'm the fool
       | here, but isn't he misusing the term autocorrelation? What he
       | describes sounds like just normal correlation?
        
       | toasted-subs wrote:
       | Idk I genuinely feel like after having to deal with 10+ doctors
       | who all had different opinions. The last doctor finally made the
       | same conclusion as me and he was the last person I had to see.
       | 
       | There's always exceptions. And sometimes reading publications
       | pertaining to a very specific thing should give you more say on a
       | subject.
       | 
       | I just feel bad American tax payer money and the best years of my
       | life was spent on telling medical professionals they don't know
       | what they are talking about.
        
       | dclowd9901 wrote:
       | I think what this article is missing is "the chart DK should have
       | used."
       | 
       | Instead we get a spurious explanation that doesn't make a lot of
       | sense based on completely fabricated data. It's entirely natural
       | for something that looks like DK to emerge from randomized data,
       | especially when the Y axis is represented by some number of the
       | mean (actually 50ish in this case).
        
       | a-dub wrote:
       | i think of acf as a measure of repeating temporal structure and
       | how "strong" and "long" it is, if it exists.
       | 
       | that is, it gives you a notion of if and what order of an ar
       | model should fit any repeating structure in the data.
        
       | randomizedalgs wrote:
       | Consider the imaginary world that the author describes, in which
       | people's estimate of their score is independent of their actual
       | score. Wouldn't it be fair to say that, in this imaginary world,
       | the DK effect is real?
       | 
       | The point of the effect is that people who score low tend to
       | overestimate their score and people who score high tend to
       | underestimate. Of course there are lots of rational reasons why
       | this could occur (including the toy example the author gave,
       | where nobody has any good sense of what their score will be), but
       | the phenomenon appears to me to be correct.
        
         | mrkeen wrote:
         | If it's a statistical illusion, the correlation is still true,
         | it just has no business being studied by psychologists.
         | 
         | If I roll a die, and then roll a second die, I might study the
         | behaviour of the second die and wonder why it wants to add up
         | to 7 with the first die. Since they're dice, I can dismiss that
         | as a stupid idea, but if they were people, I could certainly be
         | led astray by psychological theories about them.
        
         | skrebbel wrote:
         | Woa of course, this is the point.
         | 
         | The author's example with random points is bad because you
         | might reasonably _expect_ people to behave differently than
         | uniform random points.
         | 
         | It'd be reasonable to expect that people who are good at a
         | thing estimate that they are good at it, and that people who
         | are bad at a thing, estimate that they're bad at it. I mean, my
         | kids love math and always estimate themselves to do well on
         | math tests (and they usually do). They have classmates loudly
         | detest math, estimate they'll do badly, and often do (at least
         | somewhat). Similarly I'm a bad cook and I have no doubt that if
         | I join a cooking contest, I'll get few jury points. The
         | _expected_ data is correlated.
         | 
         | So if a study finds that, well actually, the data is not at all
         | that correlated! Lots of people who estimate that they'll do
         | _fine_ actually don 't, and equally many people who estimate
         | that they'll do badly, actually do fine (ie it looks like
         | uniform random data), then that's surprising, and that's the
         | D-K effect.
         | 
         | Right? I'm no statistician at all so I might be missing
         | something.
        
       | ezekiel68 wrote:
       | > However, there is a delightful irony to the circumstances of
       | their blunder.
       | 
       | Indeed. And I find the tendency of people in this comment section
       | to defend the flawed theory is further confirmation of another
       | scientific finding: that we decide based on emotion and then
       | justify our decision using rationality.
        
         | stubish wrote:
         | Even when the article cites the 3 papers it is based on, no
         | refutations of the published science by people who grok it.
        
       | notShabu wrote:
       | every domain of expertise has two "elo" systems, the niche one
       | and the broader one.
       | 
       | e.g. you can learn basic juggling in 30 minutes that you are top
       | 10% of your friends/colleagues etc...
       | 
       | however within the juggling community itself this is known as the
       | "3 ball cascade" a really simple trick relative to the ones that
       | requires years to master. an outsider may not be able to tell the
       | difference between the 1 year expert and the 10 year master.
       | 
       | a lot dunning-kruger can be explained by people in one or the
       | other not understanding the other system
        
       | lopatin wrote:
       | Oh I read about the about the DK effect a while ago. I'm pretty
       | much an expert in Psychology now, AMA.
        
       | eagerpace wrote:
       | Is this the opposite of imposter syndrome?
        
       | markhahn wrote:
       | the numeric experiment does not produce a line identical to what
       | DK report. if DK's line where horizontal at 50%, it would indeed
       | be nothing but autocorrelation.
        
       | dahart wrote:
       | Most people, even here on HN, do not know what the DK effect
       | actually claimed to show. It does not show that confident people
       | are more likely to be incompetent. Their primary result shows a
       | positive correlation between confidence and supposed skill. (What
       | skill, you ask?*)
       | 
       | This article suggests DK is even simpler than autocorrelation,
       | that it's just regression toward the mean.
       | https://www.talyarkoni.org/blog/2010/07/07/what-the-dunning-...
       | 
       | I don't know which statistical artifact it is, but I am quite
       | convinced that the so-called DK effect is not demonstrating
       | something interesting about human psychology, I don't buy that
       | this is a real cognitive bias. I've read the paper several times,
       | and the methodology seems to be lacking rigor. They tested a
       | small handful of Cornell undergrads volunteering for extra
       | credit, not a large sample, not the general population, and
       | tested _nobody_ who actually fits the description of
       | 'incompetent' in a meaningful way. They primarily measured how
       | people rank each other, not what their absolute skill was - and
       | ranking each other requires speculating on the skills of others.
       | There are obvious bias problems with asking a group of pampered
       | Ivy League kids how well they think they rank.
       | 
       | * One of the four "skills" they measured was ability to get a
       | joke - "appreciation of humor" - Huh? This is subjective! The
       | jokes used aren't given in the paper, either. Another was
       | 'grammar' tests.
        
       | TrackerFF wrote:
       | The DK effect has gotten WAY more cred than it should. Today, it
       | is just anoter feel-good piece that people use to justify their
       | feeling that they're (ironically) surrounded by loud idiots.
        
       | austin-cheney wrote:
       | The best way to differentiate DK from autocorrection is motive.
       | Low performance people will focus on motives that reinforce the
       | perception of their competence, for example preferring code style
       | over code delivery because while both may be arguably important
       | one requires less effort and risk to attain.
       | 
       | There is research to qualify this out of Stanford. People will
       | shift motives to attain complements and the types of compliments
       | received will dictate the challenges they are willing to accept.
       | When a compliment is specific to an action and measurable people
       | will strive for continuously more challenging tasks to
       | continually receive specific compliments. When compliments are
       | generic and directed to the person they will tend to preference
       | progressively less challenging tasks so that they continue to
       | shine relative to the attempted effort. The differences in
       | behavior produces a natural Dunning-Kruger effect wherein people
       | seeking less qualified activities are more likely to over
       | estimate their potential and degree of success.
       | 
       | This also statistically verified in research that correlates
       | predictions to confidence. The more confidence a person is in
       | their predictions, such as political talk radio hosts, the less
       | accurate their predictions tend to be.
        
       | James_K wrote:
       | I think the issue here is a confusion about what "bias" means. If
       | they are self-assessing at random, then the high performers will
       | all underestimate themselves, but this is not a bias towards
       | underestimation as they are choosing randomly.
       | 
       | That said, the chart from D-K seems to show a different bias and
       | line up roughly with what you would expect. Someone with no
       | knowledge assumes they are average skill and hence inflates their
       | position, someone who is very good doesn't want to rate
       | themselves the best because they assume others know as much as
       | they do. The assumption underlying both groups is that you are
       | normal and others are similar to you.
       | 
       | I hypothesise that most people think they're average, which is
       | something you could easily test by asking them to rate how well
       | they think the average person would do on a test and comparing it
       | to that individual's test score. I'm almost certain that high
       | performers will overestimate the average, and low performers
       | underestimate it.
        
       | riazrizvi wrote:
       | A general problem with Dunning Kruger is the assumption that if
       | you score low on a test then you are bad at the subject it is
       | evaluating. I've taken enough bad quizzes that purportedly
       | evaluate skills that I am an expert in, to know that that is a
       | leap.
        
       | nitwit005 wrote:
       | If self evaluations are random, and you group a bunch of them
       | together, then you'll see values around the 50th percentile.
       | That's why their self evaluation line is nearly flat.
       | 
       | In the actual data though, the line clearly trends upward. The
       | people who did well appear to be scoring themselves non-randomly.
        
       | resource0x wrote:
       | Can someone explain the difference between Dunning-Kruger effect
       | and "illusory superiority" effect
       | (https://en.wikipedia.org/wiki/Illusory_superiority)?
        
         | zeroonetwothree wrote:
         | DK says that skilled people tend to underestimate their skill
         | while unskilled people tend to overestimate their skill. This
         | is likely a statistical artifact.
         | 
         | IS says that people tend to overestimate their own skill
         | compared to how other people estimate their skill. This seems
         | likely true on average but not necessarily in all cases.
        
       | jongjong wrote:
       | This makes sense. IMO, the reason why Dunning-Kruger effect is so
       | popular among the upper classes (along with Impostor Syndrome) is
       | that it helps to provide justification for social inequalities as
       | it corrects inner monologues.
       | 
       | "How come I have so much given that I'm not as skilled as these
       | other people? I must suffer from impostor syndrome."
       | 
       | "Look at all these people complaining instead of taking
       | responsibility for their own failures, they probably suffer from
       | Dunning-Kruger effect. Their work must not be good enough."
       | 
       | But of course this requires a certain detachment from reality
       | (hence why many upper class people have blind spots). If they
       | actually took a look at the evidence, they may find that some of
       | these 'Dunning-Kruger people' are actually far more skilled than
       | they imagine. I think it explains why people like Jurgen
       | Schmidhuber who made significant contributions to AI tend to be
       | ignored. Then because people are ignoring them, they are
       | compelled to promote themselves harder to try to get their fair
       | share of attention but they are then put in the 'Dunning-Kruger
       | basket' until someone with a very good reputation like Elon Musk
       | comes along and gives them credit. I think the same could be said
       | about the mathematician Srinivasa Ramanujan; many mathematicians
       | ignored his work or assumed he was a fraud because he seemed too
       | sure of himself for someone who was completely unknown at the
       | time. If such gross injustice can happen in a perfectly-
       | quantifiable field like math, you can be sure it can happen in
       | any field.
        
       | fnord77 wrote:
       | wikipedia's article intro on this doesn't state it is invalid :/
        
       | badrabbit wrote:
       | In my experience, people abuse flattery too much so it is hard to
       | tell if their positive opinions of me are genuine and with merit.
       | Generally speaking, I try to see the big picture and realize no
       | matter how well I do, in a more global sense at best I am too
       | 50th percentile, slightly above average. It is
       | chance,relationships and supply/demand economics that ultimately
       | decide our ability to apply our talents effectively.
       | 
       | When it comes to others, I wish more people experienced the D&R
       | effect. It gets frustrating sometimes dealing with smart and
       | talented people who think they are revolutionary rockstars. You
       | know the kind, they see other people's work and they are shocked
       | how bad everything is, but never fear, they, our heroes are here
       | to refactor everything until they leave and another hero looks at
       | their work and rescues metropolis from it again. Patience and
       | humility are a rare virtue for all of us.
        
       | golol wrote:
       | I disagree. Dunning Kruger is not a statement about predicted
       | score correlating with actual score in some way. It states that
       | predicted score does not correlate well with actual score. This
       | can be rephrased as the prediction error having a negative
       | correlation with the actual score. The article then claims that
       | this negative correlation is autocorrelation. That is true but
       | the correlation still exist. The thing is that ideally we EXPECT
       | there to be no correlation of the prediction error with the
       | actual score, but we find autocorrelation. Going back to
       | variables where this autocorrelation is not there, we EXPECTED to
       | find a 1:1 positive correlation between predicted score and
       | actual score but find no correlation, or a weak correlation.
       | 
       | So finding autocorrelation when you expected to find no
       | correlation is pretty much the Dunning-Kruger effect here.
       | 
       | In fact their example with the random data totally makes sense:
       | Suppose people uniformly randomly estimate their performance.
       | Then the people who are low skilled will consistently over-
       | estimate and the people who are high-skilled will consistently
       | underestimate. Of course there is no causation here, as the
       | people choose randomly, but there is an undeniable correlation. I
       | guess the question is if you view the Dunning-Kruger effect as a
       | claim to low skill CAUSING positive prediction error, or just
       | correlating with it.
        
       | lifeisstillgood wrote:
       | The Dunning Kruger effect is simply the same reason expensive
       | projects are undertaken and never hit budget - not because we
       | cannot estimate costs but because if we did we would never do
       | anything.
        
       | CalChris wrote:
       | The article's definition of _autocorrelation_ :
       | Autocorrelation occurs when you correlate a variable with itself.
       | 
       | Wikipedia's definition of _autocorrelation_ :
       | Autocorrelation, sometimes known as serial correlation in the
       | discrete time case, is the correlation of a signal with a delayed
       | copy of itself as a function of delay.
       | 
       | Of course, 0 delay is the trivial case of time delay but really,
       | the article's definition is at best inaccurate. D-K has nothing
       | to do with time delay and calling it autocorrelation seems like a
       | weird pun that doesn't quite land.
        
       | hyperthesis wrote:
       | _If_ unskilled and skilled self-assessed themselves the same on
       | average, then unskilled overestimate, and skilled underestimate.
       | 
       | That would be a significant result alone - that no one had any
       | idea. (but as https://news.ycombinator.com/item?id=38416100
       | notes, there is a correlation).
        
       | chmod600 wrote:
       | A related effect that I've wondered about is: perhaps lower-
       | skilled people compare themselves to the general public, while
       | perhaps skilled people compare themselves to a smaller group of
       | skilled peers.
       | 
       | In other words, if you asked me if I'm good at riding a bicycle,
       | I'd compare myself to others in the general population and say
       | "yes". But if you ask a weekend bicyclist, they'd be better than
       | me but perhaps compare themselves to weekend bicyclists, and rate
       | themselves lower. And the effect might repeat for competitive
       | bicyclists.
       | 
       | If true, this could explain why we intuitively believe the DK
       | effect.
        
       | RevEng wrote:
       | What Blair Fix's article gets wrong is that there are two stark
       | differences between what Fix generated with random data and what
       | Dunning and Kruger observed in theirs.
       | 
       | Fix has each person guess randomly between 0 and 99 where they
       | will lie in the percentiles. They simulate every person having no
       | idea and giving equal probability to being the best or the worst.
       | If we then sort them by how well they really did into quartiles
       | and then evaluate the average of how well they thought they would
       | do, we get what we would expect: each quartile has an equal
       | chance of predicting that they will do well or do poorly, with an
       | average expected percentile of 50, which is what you would expect
       | by a random guess.
       | 
       | Note two key things about this: - All quartiles guessed the same
       | - there was no correlation between what they guessed and how well
       | they actually did - All quartiles guessed the expected average
       | percentile - 50%. This means they were unbiased in how well they
       | thought they would do.
       | 
       | If people were unbiased but also unaware, this is the null
       | hypothesis we would expect: on average people predict themselves
       | to be average and there's no correlation between how well they
       | predicted they would do and how well they actually did.
       | 
       | Now compare that to what Dunning and Kruger observed: - The
       | quartiles did NOT guess the same. There was a bit of an upwards
       | trend, which suggests that people at least somewhat were able to
       | determine their actual percentiles, even if only weakly on
       | average. - The predictions were biased. All groups estimated they
       | would do better than the expected average. That is to say, on
       | average, they thought they were above average. This is an
       | important bias. - The differentials between quartiles are not
       | equal. The first and second quartile typically predicted the
       | same, over-estimated value, implying that neither group had any
       | idea they were better or worse than each other. However, the
       | upper quartile consistently estimates a higher average. That is
       | to say, people who perform well, on average, believe they are
       | performing even better than those who don't perform well. And
       | perhaps most surprisingly, there was often a statistically
       | significant dip at the third quantile. Comparing their beliefs,
       | people who did well believed they had done worse than the people
       | who actually did worse.
       | 
       | Fix also fails to go beyond the first figure of the paper. After
       | seeing this inconsistent behaviour between the quartiles, Dunning
       | and Kruger then test what happens if the respondents are given an
       | opportunity to grade each other - therefore getting an idea of
       | what the percentiles actually look like - and to have their
       | skills improved - thereby possibly making them better able to
       | judge their own and each other's abilities. Again, if Fix's
       | premise that this is all just a result of manipulating the
       | autocorrelation of an otherwise unbiased random sequence, then
       | these interventions should have no discernable effect. Yet,
       | Dunning and Kruger find markedly significant changes after these
       | interventions, and those changes are different within the
       | different quantiles.
       | 
       | It is precisely this difference between quantiles which is the
       | Dunning-Kruger effect. Fix effectively makes their point for them
       | by building a null model and showing what would happen if there
       | were no Dunning-Kruger effect - if people were fully unaware and
       | unbiased. Instead, it is the way in which Dunning and Kruger's
       | observations deviate from this model that is the very effect that
       | bears their name.
       | 
       | Instead, all that Fix manages to do is point out how confusing
       | the plot is that Dunning and Kruger produced. The plot can easily
       | be misinterpreted to suggest that it's the difference between y
       | and y-x that is important. Instead, in their writing, Dunning and
       | Kruger actually focus on the differences in how y-x changes when
       | the situation changes, demonstrating that it's actually dependent
       | on knowledge and how different people respond to that knowledge.
       | What they actually show is that delta(y-x) vs x has a nonzero
       | relationship and this is particularly interesting.
       | 
       | Perhaps if Dunning and Kruger had not included the example of
       | perfect knowledge as a comparison, but instead included the
       | example of unbiased and unknowledgeable that Fix produced as the
       | thing to compare against, the Dunning-Kruger effect would be much
       | better understood.
       | 
       | Further, both could benefit greatly from plotting and tabulating
       | not just an average, but the overall distribution within each
       | group. Fix should know that variance is just as important as
       | bias. Even if all groups are biased in their prediction,
       | differences in variance between each group indicates their
       | confidence in their belief. Knowledge should help to reduce both
       | bias and variance. A guess with high variance tells us little,
       | while a guess with low variance tells us quite a bit. Even if all
       | quartiles predicted the same average, we wouldn't fault those
       | with little ability for guessing a high number if they did so
       | with low confidence. On the contrary, we would expect people with
       | high ability to be more confident (and correct) in the assessment
       | of their ability.
        
         | hgomersall wrote:
         | The entire post is pointing out how bad the stats is in the
         | original paper. If you want additional critique, go and read
         | the references.
        
       | hyperthesis wrote:
       | It's Dunning-Krugers all the way down - including this self-
       | referential smugness.
        
       | epigramx wrote:
       | "Autocorrelation is the statistical equivalent of stating that 5
       | = 5." no sure if the author has some ..dunning-kruger there.
        
       | 6510 wrote:
       | I was curious if the self assessment is done before or after the
       | test.
       | 
       | Bing chat gave me this wild answer:
       | 
       | > The effect is usually measured by comparing self-assessment
       | with objective performance. For example, participants may take a
       | quiz and estimate their performance afterward, which is then
       | compared to their actual results 1. Therefore, people estimate
       | their ability before the test by Dunning-Kruger.
       | 
       | In the case estimation _is_ done before: If you 've had training,
       | like a soup of ingredients, that matches the priorities and
       | biases of the test it would be strange if no measurable effect
       | remained.
       | 
       | If it's done after: You can create trick questions specifically
       | designed to test if someone learned a specific thing. A good test
       | would test for that. If someone didn't learn the specific thing
       | they could give/guess the wrong answer with some confidence.
       | 
       | The design of the test has great influence on how poorly you'll
       | think you've done. I would argue that the superior test is the
       | one designed to fool you. Hans Rosling famously created a
       | multiple choice test with 4 answers per question with average
       | results below 25%.
       | 
       | On a more fascinating note, unskilled means all areas of
       | expertise outside your own.
       | 
       | People who are universally unskilled in all areas are of course
       | more likely to think they are unskilled. In reality these people
       | know little bits about many things.
       | 
       | This in contrast with people who spend all day, every day, for
       | their entire lives pondering topics inside their area of
       | expertise. If you are doing one thing you aren't doing all of the
       | other things.
       | 
       | Wikipedia had hilarious instances of experts contributing to
       | countless articles accidentally ending up on the wrong page.
       | Suddenly they have no patience, think they know everything and
       | act like children. It's funny because you cant just ban valuable
       | contributors.
       | 
       | I would love to see this DK test done with professors furthest
       | removed from the area of expertise.
        
       | civilized wrote:
       | We discussed this in a previous thread. The author is basically
       | hypothesizing that perhaps people are so universally terrible at
       | predicting their ability, their self-rating is like an
       | unconditional random variable - just a random draw that is not
       | influenced by their actual ability level at all.
       | 
       | If this is true, then when your actual ability is high, your
       | self-rating is likely to be lower than your ability simply by
       | random chance. For example, if ability ranges from 0-100, your
       | actual ability is 99, and your self-rating is a uniform random
       | number from 0-100, your self-rating is 99% likely to be lower
       | than your actual ability. Conversely, if your actual ability is
       | low, your self-rating is likely to exceed your actual ability
       | level.
       | 
       | When it's explained clearly and simply, the criticism raises a
       | lot of questions. Are people _actually_ that bad at rating their
       | own ability? I doubt it.
        
       | zw123456 wrote:
       | I know I'm not smart enough on statistics or psychology to
       | evaluate the article but it always struck me that D&K seemed to
       | say something similar to what my grandpa said when I was a wee
       | lad, "The more you know, the more you realize how much you don't
       | know", I know he wasn't the first person to say that, but he was
       | the first person to say it to me. I don't know if D&K is
       | autocorrelation or not, but I know that an awful lot of people
       | seem to think they know more than maybe they actually do,
       | probably me included. Hmmm, maybe the author of that article as
       | well? I wonder if that occurred to him, seems like a glaring
       | oversight not to at least recognize that possible irony.
        
         | Arch485 wrote:
         | In the article, a real study was used as a counterexample to
         | the DK effect.
         | 
         | Part of the results was a correlation that people who were
         | "less capable" were also worse at predicting their own skill,
         | and people who were "more capable" were better at predicting
         | their own skill.
         | 
         | While similar to the DK effect, this is different, as the DK
         | effect states that "less capable" individuals specifically
         | _overestimate_ their skill, as opposed to simply being wrong
         | (both over and under -estimating).
         | 
         | With relation to some people "seeming to think they know more
         | than they actually know", this is likely confirmation bias in
         | the sense that there are an equal number of people who don't
         | know much, and know that they don't know much.
        
       | dimask wrote:
       | I would call this type of argument a case of regression to the
       | mean rather than "autocorrelation". That, of course, in principle
       | requires independence between performance and assessment of
       | performance. In many cases, it would make little sense to assume
       | that the performance and assessment of performance are
       | independent. But even then, one can simulate random data with
       | some correlation, and still get a DK effect merely as statistical
       | artifact. An overview of similar critiques, and a similar
       | argument in
       | https://www.frontiersin.org/articles/10.3389/fpsyg.2022.8401... .
        
       | 19f191ty wrote:
       | That is not an autocorrelation. The OP is equating linear
       | dependence with autocorrelation, which not how we use that term.
       | Autocorrelation is when a random process is correlated with time
       | lagged version of itself.
        
       | dilawar wrote:
       | David Dunning response (2022):
       | https://www.bps.org.uk/psychologist/dunning-kruger-effect-an...
        
       | BrenBarn wrote:
       | Yeah I don't buy this either.
       | 
       | I do think the original Dunning-Kruger plot is a bit of an odd
       | presentation. The way I look at it is just to say that people's
       | self-estimates of their ability fall into a relatively narrow
       | range (e.g., 55-75th percentile on the graph), whereas their
       | actual abilities of course cover the whole range from 0-100th
       | percentile. You don't really need the plot of "x versus x"
       | (average score in each quartile). You just need to say "people's
       | self-assessments seem to start unrealistically high and only go
       | up a little, even as their ability goes up a lot".
        
       | PeterStuer wrote:
       | You can take out the x from both sides, and the y would still not
       | be a horizontal line.
       | 
       | In their eagerness to 'deconstruct' the narrative, do the authors
       | merely provide another example of Dunning-Kuger by overestimating
       | their own cleverness?
        
       | eterevsky wrote:
       | I think this article would've made more sense if it had a title
       | "The Dunning-Kruger effect is regression toward the mean",
       | because that's what the author is actually showing.
        
         | tgv wrote:
         | I think your description is the most apt.
         | 
         | OP's own analysis shows that using random data (two variables
         | uniformly distributed over the same range!) for both skill and
         | self-assessment results in a _different_ graph. The original
         | comparison therefor implies another effect on the second
         | dimension, which could be interpreted as: people don 't
         | estimate their skills correctly, but drift towards the mean.
         | 
         | But then the question becomes: what did they really ask their
         | subjects? To pick the percentile or a true test score?
        
       | mattbit wrote:
       | This is not 'autocorrelation', it is regression to the mean. I
       | find the article unclear and imprecise. For those interested in a
       | better overview of the Dunning-Kruger effect, I recommend this
       | short article by McIntosh & Della Sala instead:
       | 
       | https://www.bps.org.uk/psychologist/persistent-irony-dunning...
        
         | mattbit wrote:
         | This is how McIntosh & Della Sala put it:
         | 
         | > in the academic literature, it has been suggested that the
         | signature pattern of the DKE (Figure 1A) might be nothing more
         | than a statistical artefact. In a typical study, people's
         | tendencies to under- or overestimation are analysed as a
         | function of their ability for the task. This involves a 'double
         | dipping' into the data because the task performance score is
         | used once to rank people for ability, and then again to
         | determine whether the self-estimate is an under- or over-
         | estimate. This dubious double-dipping makes the analysis prone
         | to a slippery statistical phenomenon called 'regression to the
         | mean'.
        
       | pmavrodiev wrote:
       | Noone seems to have read OP's post in its entirety. A crucial
       | point was made by referencing this paper:
       | https://digitalcommons.usf.edu/cgi/viewcontent.cgi?article=1....
       | 
       | Figure 2 in this paper shows the result of an experiment where
       | skill and perception of one's skill are measured independently.
       | To eliminate any statistical artifact of auto-correlation. And lo
       | and behold - on average skill is uncorrelated to the accuracy
       | one's own assessment. No DK effect at all. What does show up
       | actually is that more qualified people are more consistent in
       | estimating their skill (i.e. their assessments are less
       | variable), but the mean accuracy is still 0.
       | 
       | So indeed, on average actual and perceived skills are
       | uncorrelated. That's exactly what the numerical proof with random
       | numbers shows and why in many cases we apply Occam's razor.
        
       | psychoslave wrote:
       | I went through the whole article, and I am not only very
       | skeptical about the claimed debunk but wonder what kind of
       | psychological trope you might label as corelative to such an
       | article.
       | 
       | I mean "bad science built only on rhetoric" is a double edged
       | sword, you know.
       | 
       | To start with, the graph presented at the end does not look like
       | the one from the original article, where the self assessment does
       | grow significantly, though it starts higher than average and
       | grows less quickly than external assessment.
       | 
       | Also the article focus on "random" data set which, but we know
       | that there are different classes of apparent noisy plots. Noisy
       | distribution of self assessment would actually be an informative
       | figure too.
       | 
       | So the biggest issue here is its kind of pretending that whatever
       | the way the ordinate value is coupled to, if it includes the
       | abscissa in its definition you'll get the same kind of plot as a
       | result, which is obviously false. You could easily come with
       | arbitrary values coupled to "x" that would look radically
       | different.
        
       | rom1v wrote:
       | If Y = X + estimation_error, then substracting X (in Y-X) removes
       | the correlation rather than adding it.
        
       | Spiwux wrote:
       | At the risk of sounding like a complete idiot, isn't the
       | hypothesis of the original paper still true? Let's assume self
       | assessment score is perfectly random between 0% and 100%, so on
       | average every group will always estimate themselves to be 50%
       | correct
       | 
       | Then by definition that means people who are unskilled and often
       | incorrect will overestimate themselves, while people who are
       | often correct will underestimate themselves. Take a complete
       | idiot for example. You always get 0% test score. Yet your self-
       | assessment is random between 0% and 100%. Hence you overestimate
       | yourself much more often than people who always get 100% test
       | score.
       | 
       | In fact, if the two are uncorrelated, then that still means that
       | 
       | 1) Idiots don't recognize they're idiots
       | 
       | 2) Skilled people don't recognize they're skilled
        
       | bsza wrote:
       | Article claims Dunning-Kruger is present in a population where
       | everyone estimates their own skills based on dice rolls. Someone
       | who estimates their own skills based on a dice roll is
       | objectively crap at estimating their own skills. Dunning-Kruger
       | claims people are objectively crap at estimating their own
       | skills.
       | 
       | Where is the contradiction?
        
       | dudeinjapan wrote:
       | So you're saying that the Dunning-Kruger effect applies to
       | Dunning & Kruger.
        
       | falserum wrote:
       | Article feels like a personal attack towards D and K.
        
       | powera wrote:
       | Nope.
       | 
       | I must object to this paragraph: "To be honest, I'm not
       | particularly convinced by the analytic arguments above. It's only
       | by using real data that I can understand the problem with the
       | Dunning-Kruger effect. So let's have a look at some real
       | numbers."
       | 
       | He then goes on to use synthetic data.
       | 
       | Beyond that dishonest slight of hand, this is in the category of
       | "one thought experiment didn't prove the phenomenon exists,
       | therefore it must not exist" logical errors.
        
       | zephrx1111 wrote:
       | A more generalizable explanation is regression towards the mean:
       | everybody thinks they are an average person.
        
       ___________________________________________________________________
       (page generated 2023-11-26 23:01 UTC)