[HN Gopher] Data scientists need to learn about significant digits
___________________________________________________________________
Data scientists need to learn about significant digits
Author : ibobev
Score : 29 points
Date : 2023-12-25 20:13 UTC (2 hours ago)
(HTM) web link (lemire.me)
(TXT) w3m dump (lemire.me)
| tobinfricke wrote:
| Omg yes. My coworkers often specify parameters (the results of
| what we would have called "regression" or "fitting" but what is
| now called "machine learning") to like 14 decimal places when we
| only know them to 20%. It would have yielded a failing grade in
| my high school chemistry class.
|
| They are like "Chesterton's digits" when they appear unexplained
| in parameter files. Do we really know this number or is it just a
| wild-assed guess?
| dickersnoodle wrote:
| _puts on grumpy old man hat_ Yeah, back in my day you _had_ to
| learn about significant digits because we still used slide
| rules and the calculators didn 't have as many functions and we
| _liked_ it! _removes grumpy old man hat_
| bigbillheck wrote:
| > Do we really know this number or is it just a wild-assed
| guess?
|
| Does it matter? You have 23 bits of mantissa to fill one way or
| the other, and if you don't know the lower, say, 12, one
| pattern of bits is as good as any other pattern.
| 7402 wrote:
| I was taught about significant digits in Junior High School
| science classes. (in a public school in US, late 1960s.) Answers
| were marked _wrong_ if they had too many digits that weren 't
| significant. Is this no longer taught in school?
| atkion wrote:
| This was definitely still taught in my public high school in
| the 2010s - if they stopped teaching this, it was very
| recently.
|
| Granted, the teacher that tried to teach me this had no
| understanding of it themselves and would incorrectly mark
| answers wrong when they had the correct number of significant
| digits, so it was pretty botched in my case. Still was
| attempted in the curriculum though.
| 2muchcoffeeman wrote:
| I think some of the problem is that you can get into computing
| through many pathways now, and you don't have to come through
| some science and math stream from high school. Even when I was
| doing physics I always had to remind myself of the accuracy.
| koito17 wrote:
| Significant figures have always been taught in chemistry
| courses. I remember students getting confused at notation like
| "100. g" representing 100 grams with 3 significant figures.
| SoftTalker wrote:
| I don't remember it in junior high, which for me was the late
| 1970s. High school yes, definitely.
| lamename wrote:
| Do people really report 5 decimal places?
|
| In my experience reporting models in industry, even 1 or 2
| decimal places are irrelevant to the overall decision to be made,
| at least if the number is around "87%" as stated in the example.
|
| Of course it's context dependent. If you're evaluating
| incremental improvements perhaps that warrants more careful
| precision.
|
| After all, by definition each decimal place is 10x less
| important.
| ta988 wrote:
| Highly field dependent. In most of biology, two figures is
| often way above pipetting skills of most mortals (and robots).
| In physics in my meager understanding they are looking at close
| to 10 in some cases.
| lamename wrote:
| Agreed. Based on the title, I took the context of the article
| as generic data science/analytics in industry, not cutting
| edge bio or physics.
| fipar wrote:
| > After all, by definition each decimal place is 10x less
| important.
|
| I'm not sure about that, since it depends on the error margin
| tolerance one has.
|
| Regardless, my biggest gripe with most reports of quantitative
| information I stumble across is that, lacking mention of
| significant digits accuracy, it's very easy (and I've seen this
| happen a lot) for people to attribute more accuracy than what
| is there to a value.
|
| One example: AWS cloudwatch reports external replication delay
| in MySQL Aurora in milliseconds, which is impossible, since the
| value reported by MySQL is in seconds. Yet many times I've been
| involved in discussions with people who say "our system can
| tolerate at most 500ms" of replication delay for reads, because
| they see the value reported in that unit.
| atoav wrote:
| [delayed]
| SamBam wrote:
| The other thing people get wrong is treating significant figures
| and decimal places as the same thing.
|
| 123.45 Kg and 7.32 Kg have the same decimal places, but the
| former has more significant figures, and implies greater
| precision.
| prepend wrote:
| I hate how values are rounded for presentation and precision is
| lost.
| ayhanfuat wrote:
| I think a distinction should be made for measurements and
| predictions. If you are measuring something then you have a
| scale, and you probably have a sense of the margin of error. That
| is easy to translate into significant figures. When you make
| predictions, things are much harder. There is also the question
| of whether you should take the variance of the model into
| account, or its accuracy.
| montebicyclelo wrote:
| In addition to the uncertainty of the value, there's also the
| context of how it will be used. E.g. for some quick business
| decision that needs to be made, it may be fine to round to 1 sig
| fig, e.g. 71.244% -> 70% -> the majority.
| uxp8u61q wrote:
| How can someone call themselves a "scientist" and not know about
| significant digits? "Data technician" would be a more accurate
| job title.
___________________________________________________________________
(page generated 2023-12-25 23:00 UTC)