https://lemire.me/blog/2023/03/15/precision-recall-and-why-you-shouldnt-crank-up-the-warnings-to-11/

Skip to content

Daniel Lemire's blog

Daniel Lemire is a computer science professor at the Data Science
Laboratory of the Universite du Quebec (TELUQ) in Montreal. His
research is focused on software performance and data engineering. He
is a techno-optimist and a free-speech advocate.

Menu and widgets

  * My home page
  * My papers
  * My software

Subscribe

Join over 12,500 subscribers:
Email Address [                    ]
[                    ]
[Subscribe by email]
You can also follow this blog on telegram.
You can find me on twitter as @lemire or on Mastodon.
Search for: [                    ] [Search]
Support my work!

I do not accept any advertisement. However, you can you can sponsor
my open-source work on GitHub.

Recent Posts

  * Runtime asserts are not free
  * Precision, recall and why you shouldn't crank up the warnings to
    11
  * Science and Technology links (March 11 2023)
  * Trimming spaces from strings faster with SVE on an Amazon
    Graviton 3 processor
  * Float-parsing benchmark: Regular Visual Studio, ClangCL and Linux
    GCC

Recent Comments

  * -.- on Trimming spaces from strings faster with SVE on an Amazon
    Graviton 3 processor
  * Daniel Lemire on Trimming spaces from strings faster with SVE on
    an Amazon Graviton 3 processor
  * -.- on Trimming spaces from strings faster with SVE on an Amazon
    Graviton 3 processor
  * Daniel Lemire on Trimming spaces from strings faster with SVE on
    an Amazon Graviton 3 processor
  * Daniel Lemire on Trimming spaces from strings faster with SVE on
    an Amazon Graviton 3 processor

Pages

  * A short history of technology
  * About me
  * Book recommendations
  * Checkout-Result
  * Cognitive biases
  * Interviews and talks
  * My bets
  * My favorite articles
  * My favorite quotes
  * My readers
  * My sayings
  * Predictions
  * Privacy Policy
  * Products
  * Recommended video games
  * Terms of use
  * Write good papers

Archives

Archives [Select Month        ]

Boring stuff

  * Log in
  * Entries feed
  * Comments feed
  * WordPress.org

Precision, recall and why you shouldn't crank up the warnings to 11

Recently, the code hosting site GitHub deployed widely a tool called
CodeQL with rather agressive settings. It does static analysis on the
code and it attempts to flag problems. I use the phrase "static
analysis" to refer to an analysis that does not run the code. Static
analysis is limited: it can identify a range of actual bugs, but it
tends also to catch false positives: code patterns that it thinks are
bug but aren't.

Recently, several Intel engineers proposed code to add AVX-512
support to a library I help support. We got the following scary
warnings:

[FrRtLdvXgAI0X23]

CodeQL is complaining that we are taking as an input a pointer to
8-byte words, and treating it if it were a pointer to 64-byte words.
If you work with AVX-512, and are providing optimized replacements
for existing function, such code is standard. And no compiler that I
know of, even at the most extreme settings, will ever issue a
warning, let alone a scary "High severity Check Failure".

On its own, this is merely a small annoyance that I can ignore.
However, I fear that it is part of a larger trend where people come
to rely more or more on overbearing static analysis to judge code
quality. The more warnings, the better, they think.

And indeed, surely, the more warnings that a linter/checker can
generate, the better it is ?

No. It is incorrect for several reasons:

 1. Warnings and false errors may consume considerable programmer
    time. You may say 'ignore them' but even if you do, others will
    be exposed to the same warnings and they will be tempted to
    either try to fix your code, or to report the issue to you. Thus,
    unless you work strictly alone or in a closed group, it is
    difficult to escape the time sink.
 2. Training young programmers to avoid non-issues may make them less
    productive. The two most important features of software is (in
    order): correctness (whether it does what you say it does) and
    performance (whether it is efficient). Fixing shallow warnings is
    easy work, but it often does not contribute to either correctness
    (i.e., it does not fix bugs) nor does it make the code any
    faster. You may feel productive, and it may look like you are
    changing much code, but what are you gaining?
 3. Modifying code to fix a non-issue has a non-zero chance of
    introducing a real bug. If you have code that has been running in
    production for a long time, without error... trying to "fix it"
    (when it is not broken) may actually break it. You should be
    conservative about fixing code without strong evidence that there
    is a real issue. Your default behaviour should be to refuse to
    change the code unless you can see the benefits. There are
    exceptions but almost all code changes should either fix a real
    bug, introduce a new feature or improve the performance.
 4. When programming, you need to clear your mental space.
    Distractions are bad. They make you dumber. So your programming
    environment should not have unnecessary features.

Let us use some mathematics. Suppose that my code has bugs, and that
a static checker has some probability of catching a bug each time it
issues a warning. In my experience, this probability can be low... but
the exact percentage is not important to the big picture. Let me use
a reasonable model. Given B bugs per 1000 lines the probability that
my warning has caught a bug follows a logistic functions, say 1/
(1+exp(10 - B)). So if I have 10 bugs per 1000 lines of code, then
each warning has a 50% probability of being useful. It is quite
optimistic.

The recall is how many of the bugs I have caught. If I have 20 bugs
in my code per 1000 lines, then having a million warnings will almost
ensure that all bugs are caught. But the human beings would need to
do a lot of work.

So given B, how many warnings should I issue? Of course, in the real
world I do not know B, and I do not know that the usefulness of the
warnings follows a logistic function, but humour me.

A reasonable answer is that we want to maximize the F-score: the
harmonic mean between to the precision and the recall.

I hastily coded a model in Python, where I vary the number of
warnings. The recall always increases while the precision always
fall. The F-score follows a model distribution: having no warnings in
terrible, but having too many is just as bad. With a small number of
warnings, you can maximize the F-score.

[plot]

A more intuitive description of the issue is that the more warnings
you produce, the more likely you are to waste programmer time. You
are also more likely to catch bugs. One is negative, one is positive.
There is a trade-off. When there is a trade-off, you need to seek the
sweet middle point.

The trend toward an ever increasing number of warnings does not
improve productivity. In fact, at the margin, disabling the warnings
entirely might be just as productive as having the warning: the
analysis has zero value.

I hope that it is not a symptom of a larger trend where programming
becomes bureaucratic. Software programming is one of the key industry
where productivity has been fantastic and where we have been able to
innovate at great speed.

Published by

[2ca999]

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).
View all posts by Daniel Lemire

Posted on March 15, 2023March 15, 2023Author Daniel LemireCategories 
 

Leave a Reply Cancel reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four
spaces:

    This will be displayed in a monospaced font. The first four
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
Comment * [                                             ]

Name * [                              ]

Email * [                              ]

Website [                              ]

[ ] Save my name, email, and website in this browser for the next
time I comment.

Receive Email Notifications? [no, do not subscribe      ] 
[instantly    ]
Or, you can subscribe without commenting.

[Post Comment] 

 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
D[                                             ] 

You may subscribe to this blog by email.

Post navigation

Previous Previous post: Science and Technology links (March 11 2023)
Next Next post: Runtime asserts are not free
Terms of use Proudly powered by WordPress