[HN Gopher] New Computer Language Benchmarks Game metric: time +...
       ___________________________________________________________________
        
       New Computer Language Benchmarks Game metric: time + source code
       size
        
       Author : benstrumental
       Score  : 37 points
       Date   : 2022-06-01 15:19 UTC (7 hours ago)
        
 (HTM) web link (benchmarksgame-team.pages.debian.net)
 (TXT) w3m dump (benchmarksgame-team.pages.debian.net)
        
       | sidkshatriya wrote:
       | Geometric mean of (time + gzipped source code size in bytes)
       | seems statistically wrong.
       | 
       | What if you shifted time to nanoseconds ? Or source code size in
       | terms of Megabytes. The rankings could change. The culprit is the
       | '+'
       | 
       | I would think Geometric mean of (time x gzipped source code size)
       | is the correct way to compare languages together. It would not
       | matter what the units of time or size are in that case.
       | 
       | [Here the geometric mean is the geometric mean of (time x gzipped
       | size) of all benchmark programs of a particular language.]
        
         | ntoskrnl wrote:
         | Yep this is correct. Adding disparate units is almost always
         | nonsensical. You can confirm with a scientific calculator like
         | insect:                 $ insect '5s + 10MB'         Conversion
         | error:                Cannot convert unit MB (base units: bit)
         | to unit s            $ insect '5s * 10MB'       50 s*MB
        
           | smegsicle wrote:
           | units, frink, insect oh my
        
         | igouy wrote:
         | > The culprit is the '+'
         | 
         | That annotation does seem to have caused much frothing and
         | gnashing.
         | 
         | Here's how the calculation is made -- "How not to lie with
         | statistics: The correct way to summarize benchmark results."
         | 
         | [pdf]
         | http://www.cse.unsw.edu.au/~cs9242/11/papers/Fleming_Wallace...
        
           | yorwba wrote:
           | That paper is only about the reasoning behind taking the
           | geometric mean, it doesn't have anything to say on the "time
           | + gzipped source code size in bytes" part.
        
         | dwattttt wrote:
         | It's not necessarily wrong to add disparate units like this.
         | It's implicitly weighting one unit to the other. Changing to
         | nanoseconds just gives more weight to the time metric in the
         | unified benchmark. You could instead explicitly weight them
         | without changing units, if you cared about the size more you
         | could add a multiplier to it.
        
           | sidkshatriya wrote:
           | You really don't know what weight is the right weight to
           | balance time and gripped size. Multiplying them together
           | sidesteps the whole issue and puts time and size on par with
           | each other regardless of the individual unit scaling.
           | 
           | The whole point of benchmarks is to protect against
           | accidental bias in your calculations. Adding them seems
           | totally against my intuition. If you _did_ want to give time
           | more weight then I would raise it to some power. Example:
           | geometric mean of (time x time x source size) would give time
           | much more importance in an arguably more principled way.
        
             | dwattttt wrote:
             | Multiplying them is another way of expressing them as a
             | unified value. It's not a question of accidental bias,
             | you're explicitly choosing how important one second is
             | compared to one byte.
             | 
             | You could imagine there's a 1 sec/byte multiplier on the
             | bytes value, saying in effect "for every byte of gzipped
             | source, penalise the benchmark by one second".
        
               | sidkshatriya wrote:
               | > You could imagine there's a 1 sec/byte multiplier on
               | the bytes value, saying in effect "for every byte of
               | gzipped source, penalise the benchmark by one second".
               | 
               | Your explanation makes sense. However the main issue is
               | we don't know if this "penalty" is fair or correct or has
               | some justifiable basis. In absence of any explanation it
               | would make more sense to multiply them together as a
               | "sane default". Later, having done some research we can
               | attach some weightage perhaps appealing to some physical
               | laws or information theory. Even then I doubt that +
               | would be the operator I would use to combine them.
        
             | igouy wrote:
             | > Adding them...
             | 
             | Read '+' as '&'.
        
       | Thaxll wrote:
       | The thing they should change is to forbid the nonsense like:
       | 
       | https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
       | 
       | Actually if you look at all the top net core submissions the only
       | one fast are the one using low level intrinsics etc ...
        
         | spullara wrote:
         | All of the languages now have that trash in them. I'd like a
         | "naive" benchmarks game where you write the code straight
         | forwardly in a normal style for the language.
        
           | igouy wrote:
           | "simple" (2nd link on the homepage.)
           | 
           | https://benchmarksgame-
           | team.pages.debian.net/benchmarksgame/...
        
             | weberer wrote:
             | >Java: 40 seconds
             | 
             | >Python 3: 1h 09 minutes
             | 
             | Well damn.
        
         | igouy wrote:
         | > Actually if you look at all the top net core submissions the
         | only one fast are the one using low level intrinsics etc ...
         | 
         | Do you mean "fast" like a C program using low level intrinsics?
        
       | NeutralForest wrote:
       | This presentation is pretty bad, there should be more context,
       | some kind of color scheme or labels instead of text in the
       | background, spacing between the languages represented, other
       | benchmarks than the geometric mean, etc.
        
         | gus_massa wrote:
         | > _other benchmarks than the geometric mean_
         | 
         | The text is not clear enough, but "geometric mean" is not the
         | benchmark. The 11 problems are listed in
         | https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
         | 
         | The results of the 11 problems are combined using the
         | "geometric mean" into a single number. Some people prefer the
         | "geometric mean", other people prefer the "arithmetic mean" to
         | combine the numbers, other people prefer the maximum, and there
         | rare many other methods (like the average excluding both
         | borders).
        
           | NeutralForest wrote:
           | >The text is not clear enough, but "geometric mean" is not
           | the benchmark.
           | 
           | Thanks that makes more sense, that's another issue for
           | context then. I don't have anything against geometric means
           | but there should be basic statistics like average, max,
           | min,... available as well.
        
             | igouy wrote:
             | > ... basic statistics like...
             | 
             | median, quartiles
             | 
             | https://benchmarksgame-
             | team.pages.debian.net/benchmarksgame/...
        
               | NeutralForest wrote:
               | Could you guide me to the ones I mentioned, I'm not
               | seeing them.
        
               | igouy wrote:
               | The bar in the middle of the box is an "average" - the
               | median.
               | 
               | https://www.merriam-webster.com/dictionary/average
               | 
               | https://www.itl.nist.gov/div898/handbook/eda/section3/box
               | plo...
        
               | gus_massa wrote:
               | It would be nice to be able to see the numbers in a table
               | (perhaps in a auxiliary page, instead of the main page).
               | Sometimes people want to rearrange the data or use
               | another representation. (log scale? sort by 75% quartile?
               | ...)
        
               | igouy wrote:
               | For people want to rearrange the data or use another
               | representation, there are data files --
               | 
               | https://salsa.debian.org/benchmarksgame-
               | team/benchmarksgame/...
        
               | stonemetal12 wrote:
               | The linked page has box and whisker plots. On a box and
               | whisker plot the lower bar is the min, the upper bar is
               | the max. The box goes from 25th percentile to 75th
               | percentile while the bar in the middle of the box is the
               | 50th percentile.
               | 
               | Therefore the stats you mentioned are all there min, max,
               | and average with two different definitions of average
               | given (geometric mean, and 50th percentile).
        
       | IshKebab wrote:
       | With a totally arbitrary conversion of 1 second = 1 gzipped byte.
       | 
       | This is basically meaningless. I don't see why you'd even need to
       | do this. You can easily show code size _and_ performance on the
       | same graph.
        
       | kibwen wrote:
       | For comparing multiple implementations of a single benchmark in a
       | single language, this sort of data would be interesting as a 2D
       | plot, to see how many lines it takes to improve performance by
       | how much. But for cross-language benchmarking this seems somewhat
       | confounding, as the richness of standard libraries varies between
       | languages (and counting the lines of external dependencies sounds
       | extremely annoying, not only because you have to decide whether
       | to include standard libraries (including libc), you also need to
       | find a way not to penalize those for having many lines devoted to
       | tests).
        
         | mrtranscendence wrote:
         | I'm not sure I see the problem. What does it matter that
         | program A is shorter than program B because language A has a
         | richer standard library? Program A still required less code.
        
           | kibwen wrote:
           | Because that's not what's being measured here, you're also
           | mixing in performance, and it's impossible to tell at a
           | glance whether a score is attributable to one or the other or
           | both.
        
         | benstrumental wrote:
         | Not exactly what you're looking for, but here are some 2D plots
         | of code size vs. execution time with geometric means of fastest
         | entries and smallest code size entries of each language:
         | 
         | https://twitter.com/ChapelLanguage/status/152442889069266944...
        
         | simion314 wrote:
         | And when you want to make the code readable you try to space
         | things out, split things in small functions, use longer and
         | clear variables name. I guess they are asking for running the
         | code trough a minifier so their implementation gains some
         | points.
        
           | benstrumental wrote:
           | I can't find the documentation for it, but you can see here
           | that they measure the size of the source file after gzip
           | compression, which reduces advantage of code-golf solutions:
           | 
           | https://salsa.debian.org/benchmarksgame-
           | team/benchmarksgame/...
        
             | igouy wrote:
             | "How source code size is measured"
             | 
             | https://benchmarksgame-
             | team.pages.debian.net/benchmarksgame/...
        
           | kibwen wrote:
           | On other benchmarks they measure the size of source code
           | after it's been run through compression, as a way to
           | normalize that. Not sure if that's been done here, but it
           | should be.
        
             | igouy wrote:
             | Yes, they're the same measurements.
        
       | [deleted]
        
       | _b wrote:
       | I'd be interested to see "C compiled with Clang" added as another
       | language to the benchmark games. In part, digging into Clang vs
       | gcc benchmarks is always interesting, and in part, as Rust &
       | Clang share the same LLVM backend, it would shed light on how
       | much of the C vs Rust difference is from frontend language stuff
       | vs backend code gen stuff.
        
         | igouy wrote:
         | Already done:
         | 
         | https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
         | 
         | https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
        
       | hexo wrote:
       | I dont buy these results at all. Julia at second place looks like
       | plain lie and complete nonsense, to the point I'm gonna look into
       | this and run it myself.
       | 
       | After trying hard to use julia for about a year and I came to
       | conclusion it's one of the slowest things around. Maybe the stuff
       | changed? Maybe, but julia code still remains incorrect.
       | 
       | I hope they fix both things, speed (including start up speed, it
       | counts A LOT) and correctness.
        
         | ChrisRackauckas wrote:
         | Note that these benchmarks include compilation time for Julia,
         | while it does not include compilation time for C, Rust, etc.
        
           | igouy wrote:
           | Julia is presented like this --
           | 
           | "Julia features optional typing, multiple dispatch, and good
           | performance, _achieved using type inference and just-in-time
           | (JIT) compilation_ , implemented using LLVM."
           | 
           | Julia 1.7 Documentation, Introduction
           | 
           | https://docs.julialang.org/en/v1/
        
       | cpurdy wrote:
       | Predictably, "The Computer Language Benchmarks Game" once again
       | proves the worthlessness of "The Computer Language Benchmarks
       | Game".
       | 
       | This thing has been a long running joke in the software industry,
       | exceeded only by the level of their defensiveness.
       | 
       | SMH.
        
       | agentgt wrote:
       | I really wish they aggregated the metric of build time (+
       | whatever).
       | 
       | That is a huge metric I care about.
       | 
       | You can figure out it somewhat by clicking on each language
       | benchmark but it is not aggregated.
       | 
       | BTW as biased guy in the Java world I can tell you this is one
       | area Java is actually mostly the winner even beating out many
       | scripting languages apparently.
        
         | igouy wrote:
         | Do Java "build time" measurements include class loading and JIT
         | compilation? :-)
        
       ___________________________________________________________________
       (page generated 2022-06-01 23:01 UTC)