[HN Gopher] Data scientists need to learn about significant digits
       ___________________________________________________________________
        
       Data scientists need to learn about significant digits
        
       Author : ibobev
       Score  : 29 points
       Date   : 2023-12-25 20:13 UTC (2 hours ago)
        
 (HTM) web link (lemire.me)
 (TXT) w3m dump (lemire.me)
        
       | tobinfricke wrote:
       | Omg yes. My coworkers often specify parameters (the results of
       | what we would have called "regression" or "fitting" but what is
       | now called "machine learning") to like 14 decimal places when we
       | only know them to 20%. It would have yielded a failing grade in
       | my high school chemistry class.
       | 
       | They are like "Chesterton's digits" when they appear unexplained
       | in parameter files. Do we really know this number or is it just a
       | wild-assed guess?
        
         | dickersnoodle wrote:
         | _puts on grumpy old man hat_ Yeah, back in my day you _had_ to
         | learn about significant digits because we still used slide
         | rules and the calculators didn 't have as many functions and we
         | _liked_ it! _removes grumpy old man hat_
        
         | bigbillheck wrote:
         | > Do we really know this number or is it just a wild-assed
         | guess?
         | 
         | Does it matter? You have 23 bits of mantissa to fill one way or
         | the other, and if you don't know the lower, say, 12, one
         | pattern of bits is as good as any other pattern.
        
       | 7402 wrote:
       | I was taught about significant digits in Junior High School
       | science classes. (in a public school in US, late 1960s.) Answers
       | were marked _wrong_ if they had too many digits that weren 't
       | significant. Is this no longer taught in school?
        
         | atkion wrote:
         | This was definitely still taught in my public high school in
         | the 2010s - if they stopped teaching this, it was very
         | recently.
         | 
         | Granted, the teacher that tried to teach me this had no
         | understanding of it themselves and would incorrectly mark
         | answers wrong when they had the correct number of significant
         | digits, so it was pretty botched in my case. Still was
         | attempted in the curriculum though.
        
         | 2muchcoffeeman wrote:
         | I think some of the problem is that you can get into computing
         | through many pathways now, and you don't have to come through
         | some science and math stream from high school. Even when I was
         | doing physics I always had to remind myself of the accuracy.
        
         | koito17 wrote:
         | Significant figures have always been taught in chemistry
         | courses. I remember students getting confused at notation like
         | "100. g" representing 100 grams with 3 significant figures.
        
         | SoftTalker wrote:
         | I don't remember it in junior high, which for me was the late
         | 1970s. High school yes, definitely.
        
       | lamename wrote:
       | Do people really report 5 decimal places?
       | 
       | In my experience reporting models in industry, even 1 or 2
       | decimal places are irrelevant to the overall decision to be made,
       | at least if the number is around "87%" as stated in the example.
       | 
       | Of course it's context dependent. If you're evaluating
       | incremental improvements perhaps that warrants more careful
       | precision.
       | 
       | After all, by definition each decimal place is 10x less
       | important.
        
         | ta988 wrote:
         | Highly field dependent. In most of biology, two figures is
         | often way above pipetting skills of most mortals (and robots).
         | In physics in my meager understanding they are looking at close
         | to 10 in some cases.
        
           | lamename wrote:
           | Agreed. Based on the title, I took the context of the article
           | as generic data science/analytics in industry, not cutting
           | edge bio or physics.
        
         | fipar wrote:
         | > After all, by definition each decimal place is 10x less
         | important.
         | 
         | I'm not sure about that, since it depends on the error margin
         | tolerance one has.
         | 
         | Regardless, my biggest gripe with most reports of quantitative
         | information I stumble across is that, lacking mention of
         | significant digits accuracy, it's very easy (and I've seen this
         | happen a lot) for people to attribute more accuracy than what
         | is there to a value.
         | 
         | One example: AWS cloudwatch reports external replication delay
         | in MySQL Aurora in milliseconds, which is impossible, since the
         | value reported by MySQL is in seconds. Yet many times I've been
         | involved in discussions with people who say "our system can
         | tolerate at most 500ms" of replication delay for reads, because
         | they see the value reported in that unit.
        
         | atoav wrote:
         | [delayed]
        
       | SamBam wrote:
       | The other thing people get wrong is treating significant figures
       | and decimal places as the same thing.
       | 
       | 123.45 Kg and 7.32 Kg have the same decimal places, but the
       | former has more significant figures, and implies greater
       | precision.
        
       | prepend wrote:
       | I hate how values are rounded for presentation and precision is
       | lost.
        
       | ayhanfuat wrote:
       | I think a distinction should be made for measurements and
       | predictions. If you are measuring something then you have a
       | scale, and you probably have a sense of the margin of error. That
       | is easy to translate into significant figures. When you make
       | predictions, things are much harder. There is also the question
       | of whether you should take the variance of the model into
       | account, or its accuracy.
        
       | montebicyclelo wrote:
       | In addition to the uncertainty of the value, there's also the
       | context of how it will be used. E.g. for some quick business
       | decision that needs to be made, it may be fine to round to 1 sig
       | fig, e.g. 71.244% -> 70% -> the majority.
        
       | uxp8u61q wrote:
       | How can someone call themselves a "scientist" and not know about
       | significant digits? "Data technician" would be a more accurate
       | job title.
        
       ___________________________________________________________________
       (page generated 2023-12-25 23:00 UTC)