[HN Gopher] Rust is now overall faster than C in benchmarks
       ___________________________________________________________________
        
       Rust is now overall faster than C in benchmarks
        
       Author : wiineeth
       Score  : 319 points
       Date   : 2021-01-03 18:16 UTC (4 hours ago)
        
 (HTM) web link (benchmarksgame-team.pages.debian.net)
 (TXT) w3m dump (benchmarksgame-team.pages.debian.net)
        
       | [deleted]
        
       | Animats wrote:
       | Two charts with different languages. Unclear what is being
       | measured. Is this a humor article?
        
         | nindalf wrote:
         | This site has a page where they explain what they're doing -
         | https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
         | 
         | Basically, it's toy programs written in each of these
         | languages. They measure how fast each one is executed. This
         | methodology does have it's limitations, which that page is
         | upfront about.
        
       | moldavi wrote:
       | Shouldnt they be about the same? (At least until the LLVM
       | immutability optimizations happen)
       | 
       | I suspect surprising factors are in play.
        
         | jug wrote:
         | I'd look at the algorithms here. Looking a bit fishy with
         | sometimes quite serious differences...
        
         | skohan wrote:
         | The ceilings should be similar, but there is an argument that
         | there are cases where idiomatic Rust might outperform idiomatic
         | C or vice versa.
         | 
         | For instance, in real-world usage, you might do a bit more
         | redundant copying in C in places you would avoid it with the
         | borrow checker in Rust.
         | 
         | Conversely, enforcing RAII in Rust might cost performance in
         | some scenarios relative to C.
        
       | yudlejoza wrote:
       | When Rust is faster than C in a benchmark in which C++ is also
       | faster than C, I know I can safely ignore such benchmark.
        
         | [deleted]
        
         | tetromino_ wrote:
         | The problem is not with C the language, but with the C standard
         | library, which has many inefficiency warts. Examples:
         | 
         | * Strings. Any form of string access other than scanning the
         | string's characters one by one from start to finish can benefit
         | from knowing the string's length in advance. In C++,
         | std::string and std::string_view know their lengths; plain C
         | strings don't. Thus, in performance-optimal plain C, _almost
         | any_ function that takes a string parameter ought to also take
         | the string length as a separate parameter - but most C standard
         | library functions neglect to do so.
         | 
         | * Callbacks. When the callback is static, you want to give the
         | compiler the opportunity to inline it into the call site. In
         | C++, this is natural: pass the callback as a template
         | parameter. In performance-optimal C, you'd want to provide
         | macro version of functions that take callbacks. But the C
         | standard library only supports callbacks as function pointers
         | (e.g. in bsearch(3), qsort(3) etc.), which means unnecessary
         | pointer dereferences and which makes inlining impossible.
        
           | gambiting wrote:
           | >>plain C strings
           | 
           | But.....there is no such thing. There are arrays of chars,
           | the whole principle of C is "if you want to know the length
           | of a string....just store it yourself". It's a bit like
           | saying that two wooden planks don't have the same
           | functionality as a cupboard. Like, you're technically
           | correct, but the whole idea is that you can use the planks to
           | build your own cupboard or literally anything else.
        
         | coldtea wrote:
         | > _I know I can safely ignore such benchmark._
         | 
         | C++ makes certain intentions clear to the compiler in ways that
         | C doesn't, which makes certain optimizations possible.
        
           | guenthert wrote:
           | That has always been the case, but it appears that the GNU
           | compiler finally makes good on that promise. Kudos to all
           | involved!
        
         | lambda wrote:
         | > I know I can safely ignore such benchmark
         | 
         | And yet, rather than ignoring it, you are commenting on it,
         | with a pithy retort which dismisses the entire benchmark
         | without actually providing any additional insight.
         | 
         | Programming languages, compilers, library ecosystems, the
         | groups of people who decide to sit down and try to produce a
         | better result for a given language, and the benchmark
         | maintainers who decide what submissions count for a given
         | language (does a C solution that just uses entirely inline ASM
         | count?) are incredibly complex systems. Any single metric is
         | never going to capture the full richness of the language, is
         | never going to be representative of the experience you will
         | have for every single program, etc.
         | 
         | But does that make metrics useless? No, it just means that you
         | should be informed about their limitations. You shouldn't just
         | look at a single number, but instead make sure you understand
         | well enough what is being measured to know how well that number
         | represents anything useful.
         | 
         | So rather than just dismissing this benchmark, it would be
         | useful to ask "why are the C++ results better than the C
         | results on this benchmark?"
         | 
         | Some benchmark challenges like this allow pretty much any
         | program that accepts the right input and produces the right
         | output; which means you get results in which no computation is
         | actually done, the output is simply hard-coded and you are
         | basically just measuring the startup time or request time of
         | the language or library.
         | 
         | This particular set of benchmarks imposes some constraints to
         | avoid that kind of behavior. Programs have to follow the same
         | basic algorithm, so you can't figure out some clever
         | algorithmic optimization which applies only for the particular
         | input used in this benchmark. For things like the regex
         | challenge, you are expected to use either the built-in regex in
         | your languages standard library, or a common general-purpose
         | regex implementation, not a specialized regex implementation
         | optimized just for this particular benchmark.
         | 
         | The goal of this set of benchmarks is to provide a reasonable
         | set of reasonably realistic small problems, implemented using
         | the same algorithm, and using the normal language and library
         | features. It uses small simple problems in order to make it
         | easy to read the programs and learn about the performance
         | characteristics of the language.
         | 
         | So rather than dismissing, why don't we take a look at the
         | fastest C and C++ implementations of some of the problems?
         | 
         | Here's the fastest implementation of the k-nucleotide problem
         | in C and C++:
         | 
         | https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
         | https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
         | 
         | I haven't sat down to do detailed profiling and performance
         | comparison of each; but just off the top of my head, here are a
         | few things I see which could be relevant.
         | 
         | The C++ implementation takes advantage of a number of C++
         | features; it uses a hash table from GNU pb_ds, a C++ container
         | library which allows for a good amount of customization based
         | on template parameters. The C implementation uses khash, a very
         | fast hash table implementation which uses macros for similar
         | customization.
         | 
         | The C++ implementation makes note of using move semantics in a
         | number of places, which potentially allows for certain
         | optimizations that wouldn't be possible if the compiler had to
         | copy data.
         | 
         | The C++ implementation uses insertion into a templated ordered
         | map to sort the results, while the C implementation uses the
         | standard library qsort. This allows the comparison function to
         | be inlined into the C++ sort, while it's called through a
         | function pointer in the C implementation.
         | 
         | Without actually doing some experimentation and profiling, it's
         | hard to say which of these makes a difference, or if it's
         | something else. But this does show that C++ provides facilities
         | for generic container and algorithm types that C does not. In
         | the C implementation, macros are used to work around this for
         | the hash table case, while function pointers are used for
         | sorting.
         | 
         | Anyhow, rather than simply dismissing these results, why not
         | dig into where the difference really lies, and provide a better
         | implementation in C if you think that you can?
         | 
         | No one set of benchmark results should be taken as gospel. But
         | I think this particular benchmark game is fairly useful for
         | getting a rough sense of "if I write all of my code in this
         | particular language, using either the standard library or
         | commonly available off the shelf libraries, how much of a
         | performance penalty am I likely to pay?"
         | 
         | I also find that the grouping of languages that he does, based
         | on the minima of the kernel density estimation of their
         | geometric mean scores, to be a bit more informative than
         | absolute ranking within those groups. That gives a sense of the
         | general class of languages. There's one group for C, C++, and
         | Rust; languages which allow for performance without compromise,
         | at the expense of lack of safety, higher complexity or learning
         | curve, or both.
         | 
         | There's a next big group with a lot of languages; most of them
         | have been around for a while, or been designed with an eye
         | towards performance, but still have some amount of overhead due
         | to GC or pointer chasing or greater thread synchronizaiton
         | overhead or any number of other reasons; this group includes
         | Fortran, Ada, C#, Java, Go, Haskell, etc.
         | 
         | Then there are a few groups of fairly high-level, dynamic
         | languages, designed for scripting or rapid development, and
         | which require you to trade off a fairly significant amount of
         | performance for this. Dart, PHP, Python, Erlang, Ruby.
         | 
         | And finally, there's Matz's Ruby, all alone in a group at the
         | end, slower than pretty much everything else. I'm not quite
         | sure why it's separated out from Ruby, which seems to refer to
         | yarv, but maybe it's so people who come here wondering what
         | they can do about their slow Ruby programs can see that they
         | can at least get a big boost by switching to yarv.
         | 
         | Anyhow, this benchmark and this grouping helps if you're
         | considering what to do about some performance bottleneck you
         | have in some code base; or if you're starting a project for
         | something which will potentially be performance critical.
         | Moving between languages in one group isn't all that likely to
         | make a substantial difference; but moving to a language in
         | another group would. For example, it lets you know that there's
         | a pretty good chance that just rewriting a Python program
         | that's a performance bottleneck in Go would improve that
         | performance; but rewriting a Go program in Java, or vice versa,
         | is less likely to be a performance win.
        
           | steveklabnik wrote:
           | Don't know why you're downvoted, this is a great comment.
           | 
           | By the way, YARV has been "ruby" since 2007; these benchmarks
           | were run with ruby 3.0.0preview1. The name has probably just
           | never been changed.
        
         | zarkov99 wrote:
         | Why is that? There are cases where c++ is faster than C, a
         | typical example is qsort vs std::sort.
        
           | skohan wrote:
           | How does c++ manage to win here? I'm not doubting, just
           | curious
        
             | steveklabnik wrote:
             | The classic example is quicksort. qsort takes a function
             | pointer, but std::sort is a template. This means that the
             | machinery can be inlined more often in C++ than C.
             | 
             | https://stackoverflow.com/questions/18002087/qsort-vs-
             | stdsor...
        
             | zarkov99 wrote:
             | In he case of std::sort the C++ compiler can inline the
             | comparison function when it generates the actual sort
             | implementation from the std::sort template. In C the qsort
             | implementation is fixed and must call the comparison via an
             | indirect reference.
        
               | gpderetta wrote:
               | Technically libc couldhave an inline version of qsort in
               | the header or the linker (even the dynamic linker) could
               | do LTO, but bechmarks are done on actual implementations
               | not the mythical sufficiently good compiler.
        
               | rurban wrote:
               | Don't use a libc qsort then. The compiler could generated
               | it eg. Or the CTL is a header only STL for C, which
               | inlines the sort function and the comparator. And because
               | of the C++ bloat and indirect calls the C version is
               | faster.
               | 
               | Inlining with macros == constexpr with templates.
               | 
               | You can decide which syntax you prefer.
        
             | tlb wrote:
             | C++ can inline the comparison function (which is often just
             | a few instructions) at compile time. It can also statically
             | optimize for the stride, using shifts or addressing modes
             | instead of multiplies. C can't do either with qsort, which
             | takes the stride and comparison function pointer as
             | arguments.
        
             | _jordan wrote:
             | The constexpr specifier should give C++ an advantage
        
               | mhh__ wrote:
               | Probably not, the compiler doesn't need permission to
               | constant fold of its own accord if it doesn't launch any
               | missiles.
        
         | jandrewrogers wrote:
         | C++ is faster than C now for most things because C++ enables
         | many optimizations that are impractical in C.
        
           | arcticbull wrote:
           | Certainly constexpr haha.
        
         | [deleted]
        
       | FartyMcFarter wrote:
       | Looking at the _reverse-complement_ code, it appears that the
       | Rust and C implementations are using different algorithms:
       | 
       | https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
       | 
       | https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
       | 
       | On a quick inspection:
       | 
       | - The Rust code is about twice as long.
       | 
       | - The Rust code has CPU feature detection and SSE intrinsics,
       | while the C code is more idiomatic.
       | 
       | - The lookup table is larger in the Rust code.
        
         | fsloth wrote:
         | What does "idiomatic" C even mean? It's a high level assembler
         | and as such _should_ not limit the creativity of programmers
         | using it.
         | 
         | C code that pretends it does not need to care about it's
         | platform is not idiomatic, it's just suboptimal.
        
           | FartyMcFarter wrote:
           | By "idiomatic C" I meant any of the following:
           | 
           | - Code that most C books/courses would teach you how to write
           | 
           | - Portable C code (arguably portability is one of C's biggest
           | successes!)
           | 
           | - Code that you'd expect to find in the K&R book
        
             | pjmlp wrote:
             | One of C's biggest marketing successes.
             | 
             | All high level languages are portable, including some older
             | than C.
             | 
             | It helps when one drags the OS the language was born on,
             | along via parallel standards like POSIX.
        
         | gameswithgo wrote:
         | cpu feature detection is way easier in rust. its just built
         | into the core libs.
        
           | bsder wrote:
           | And?
           | 
           | These are the kind of things that make working in one
           | language different from another.
           | 
           | I'd actually like to see the SSE version in C so I can
           | compare the two implementations and how much grief you have
           | to go through.
        
           | colejohnson66 wrote:
           | Doesn't gcc have a CPU feature branching system? You use it
           | by attaching attributes to functions that say what
           | instructions are required. Granted, it's not as "elegant" as
           | Rust, but it does exist.
        
         | vitus wrote:
         | I recall an anecdote about how Haskell actually outperformed C
         | on various tree benchmarks because it was using a better
         | implementation. At some point, the C programmers got fed up
         | with the airs of superiority from Haskell programmers, ported
         | the Haskell implementation, and reclaimed their position.
         | 
         | I wouldn't be surprised if there's something similar happening
         | here.
        
           | _wldu wrote:
           | Yes, it seems everyone is always trying to beat C, but
           | really, no one can.
        
             | pjmlp wrote:
             | Well, junior Assembly developers had it pretty easy on 80's
             | hardware.
             | 
             | Also C++ and Fortran don't have to spend much effort to
             | achieve that.
        
             | lawn wrote:
             | To be fair, if there's a language that has a fair chance
             | it's Rust.
        
           | loxias wrote:
           | It's exactly the same thing. I've been seeing this tossed
           | around for years now, and one of these days it'll make me
           | grumpy enough to fix the benchmark.
        
           | Blikkentrekker wrote:
           | The _Haskell_ code on some pathological examples of these
           | implementations that I have seen has so many unsafe
           | construct, strictness annotations, inlining annotations and
           | so forth that it 's practically _C_ in a different syntax.
           | 
           | It is not idiomatic _Haskell_ at all and loses all of the
           | touted benefits.
           | 
           | Is there a separate benchmark that only accepts idiomatic
           | code?
        
             | pjmlp wrote:
             | Just like C is pimped up BCPL, a language designed to
             | bootstrap the CPL compiler, 10 years younger than
             | ESPOL/NEWP, 10 years younger than Jovial, 6 years younger
             | than PL/I, among other possible examples from the 60's.
             | 
             | Also C was known for not being a fast language on 80's home
             | hardware, ruled by Assembly.
             | 
             | So low level coding for OS isn't coding like C and is just
             | the way for mechanical sympathy, regardless of the
             | language.
        
           | harporoeder wrote:
           | There was a long period where a fairly unknown theorem
           | proving language ATS (1) was beating C in many test cases on
           | the benchmark game (2) the benchmarks were removed though
           | (3). I expect many languages could be made to win with
           | sufficient effort.
           | 
           | 1. http://www.ats-lang.org/ 2. http://web.archive.org/web/201
           | 21218042116/http://shootout.al... 3.
           | https://stackoverflow.com/questions/26958969/why-was-the-
           | ats...
        
             | steinuil wrote:
             | ATS is not really a theorem proving language, it's almost a
             | superset of C with a very long list of type system and
             | language features that lets you write anything from C code
             | with no safety to very high level recursive functional code
             | with lifetime tracking which will translate to efficient C
             | code, if the transformations are proved to be correct. It's
             | a weird beast, but I'm not surprised it outperformed some C
             | implementations, because it _is_ basically C with a lot
             | more features.
        
           | spiffytech wrote:
           | That kind of tit-for-tat in benchmarks seems like it's
           | counter to the goal of benchmarks: what kind of performance
           | could I expect to see using technology $FOO? Crucially, that
           | question depends on how someone will realistically implement
           | $FOO.
           | 
           | I like PyPy as an example: on the surface, implementing a
           | Python runtime in Python and expecting performance gains
           | seems crazy. PyPy manages to outperform CPython because
           | although a C implementation should theoretically be faster,
           | realistically the increased expressiveness of Python lets the
           | PyPy devs opt into optimizations the CPython devs find out of
           | reach.
           | 
           | I don't know C or Rust well enough to comment on these
           | specific scenarios, but if two technologies _can_ be fast,
           | and one makes that speed accessible while the other
           | encourages implementors to leave performance on the table,
           | that 's much more useful information to me than seeing a
           | back-and-forth where folks compete to implement optimizations
           | I'll never make in my own projects.
        
             | vitus wrote:
             | You raise a good point. I've always been fascinating by
             | PyPy's performance, personally -- anecdotally, I've
             | achieved ~10x speedups from just running a script with
             | `pypy` instead of `python`. I always attributed that to
             | better performance of the JIT, but I could be wrong.
             | 
             | I have nothing against Rust personally, but it's ultimately
             | not an apples-to-apples comparison if they're not
             | implementing the same algorithm, or even if they're not
             | using the same mechanisms (e.g. Rust explicitly using SSE
             | intrinsics, which are certainly available to C in just as
             | idiomatic a fashion).
        
               | gameswithgo wrote:
               | comparing apples to apples is pointless, they are both
               | apples.
               | 
               | and you can always compare "equivalent" algorithms as
               | some languages may not be able to efficiently express the
               | same algorithm as another. i know what you are after, but
               | trying to have some benchmark that is "fair" according to
               | some spec that is important to your needs will just be
               | seen as pointless to others. benchmark game at least lets
               | us see what the performance ceiling is of various
               | languages, and anytime you object to a language having a
               | worse implementation you may go fix it.
               | 
               | i think what anyone with a clue understands is that C C++
               | and Rust all have roughly equivalent performance
               | ceilings. Nobody really thinks Rust is faster than C now.
        
               | vitus wrote:
               | > Nobody really thinks Rust is faster than C now.
               | 
               | The submission title would beg to differ.
               | 
               | (I know, it explicitly calls out "benchmarks" as the
               | context.)
               | 
               | I think languages like Rust or Swift have significant
               | advantages around safety over C/C++, while not
               | sacrificing much in terms of performance. But if one
               | language's benchmark contributors are willing to put in
               | more effort than another's to eke out additional
               | performance, then you're going to see skewed results in
               | favor of whichever has the more fervent evangelists or
               | whichever language has more to prove.
               | 
               | If the goal is to compare performance of two languages
               | which can express the same optimization in exactly the
               | same way, and only one uses it, then the benchmarks fail
               | in that respect.
        
               | gameswithgo wrote:
               | go fix the c implementation then.
               | 
               | no single person can maintain optimally implemented
               | solutions over dozens of languages. its up to we the
               | people to help out. you want perfectly equivalent
               | comparisons, make it happen
        
               | sli wrote:
               | I admit I'm destroying the metaphor here but there are a
               | lot of useful comparisons to be made between different
               | cultivars of apple and those comparisons have
               | implications on their applications.
        
               | zeroimpl wrote:
               | That's the whole point of the metaphor - comparing apples
               | to apples is good, apples to oranges is bad.
        
               | gameswithgo wrote:
               | and its a bad metaphor. there are many reasonable
               | comparisons to be made between any two things even when
               | both are not fruits
        
             | FartyMcFarter wrote:
             | > I like PyPy as an example: on the surface, implementing a
             | Python runtime in Python and expecting performance gains
             | seems crazy.
             | 
             | Not if it includes a JIT compiler, as PyPy does. I don't
             | know much about PyPy, but if most of the time is spent in
             | JITted machine code, the fact that it's written in Python
             | may not affect performance much.
        
               | pjmlp wrote:
               | Kind of, PyPy is a metacircular JIT compiler, it uses
               | RPython a Python subset.
        
             | cygx wrote:
             | _realistically the increased expressiveness of Python lets
             | the PyPy devs opt into optimizations the CPython devs find
             | out of reach_
             | 
             | Not really: It's the compilation strategy (that whole meta-
             | tracing JIT compiler thing, compared to a simple bytecode
             | interpreter) that makes the difference, not the surface
             | syntax of the implementation language - which is actually
             | not 'real' Python, but a restricted, less dynamic subset
             | known as RPython.
             | 
             | Also note that the CPython is deliberately kept 'dumb'.
        
             | mkl wrote:
             | > That kind of tit-for-tat in benchmarks seems like it's
             | counter to the goal of benchmarks
             | 
             | Yep. There are good reasons why this set of benchmarks is
             | called The Computer Language Benchmarks _Game_.
        
             | moonchild wrote:
             | > PyPy manages to outperform CPython because although a C
             | implementation should theoretically be faster,
             | realistically the increased expressiveness of Python lets
             | the PyPy devs opt into optimizations the CPython devs find
             | out of reach.
             | 
             | Pypy outperforms cpython for the simple reason that pypy is
             | a jit where cpython is a basic bytecode interpreter.
             | Anything else is icing on the cake.
             | 
             | The only reason python seems slow as an implementation
             | language is because it was traditionally slow; the reason
             | it was traditionally slow is that cpython, the only major
             | implementation, is slow. Common lisp, for instance, is
             | similarly dynamic (moreso, in fact), also is also generally
             | natively compiled by itself, and is quite performant.
        
           | FartyMcFarter wrote:
           | > I wouldn't be surprised if there's something similar
           | happening here.
           | 
           | In this case it seems like benchmark code is allowed to use
           | intrinsics, which can degenerate into a situation where a
           | benchmark in language X is more "glorified x86 Assembly code"
           | than actual code in language X.
           | 
           | This is not very useful for comparing languages IMO.
           | Especially since all of Rust, C, C++ can use this strategy
           | and become almost identical in both code and performance.
        
             | gameswithgo wrote:
             | So three languages with identical performance ceilings end
             | up with essentially identical performance results. sounds
             | like a job well done.
             | 
             | if you want to know something more than that, like how
             | performance tends to end up after inputs of identical
             | effort and talent you will need to recruit a lot of people
             | and money to run really large scale experiments and when
             | you are done people on HN will just endlessly find nits to
             | pick with your results whenever they don't agree with their
             | preconceived notions
        
         | pornel wrote:
         | Nothing stops someone from copying and submitting other
         | implementation's algorithm. There are multiple implementations
         | of each benchmark for every language:
         | 
         | * https://benchmarksgame-
         | team.pages.debian.net/benchmarksgame/...
         | 
         | * https://benchmarksgame-
         | team.pages.debian.net/benchmarksgame/...
         | 
         | * https://benchmarksgame-
         | team.pages.debian.net/benchmarksgame/...
         | 
         | It's possible that someone has already submitted both
         | algorithms for both languages, and different approaches won for
         | language-specific reasons.
        
           | FartyMcFarter wrote:
           | These are all the C versions I can find:
           | 
           | https://benchmarksgame-
           | team.pages.debian.net/benchmarksgame/...
           | 
           | https://benchmarksgame-
           | team.pages.debian.net/benchmarksgame/...
           | 
           | https://benchmarksgame-
           | team.pages.debian.net/benchmarksgame/...
           | 
           | https://benchmarksgame-
           | team.pages.debian.net/benchmarksgame/...
           | 
           | None of them have SSE intrinsics or are quite as long as the
           | Rust version.
           | 
           | I find it doubtful that SSE intrinsics wouldn't help the C
           | version, if they are indeed helping the Rust version. This
           | seems fairly easy to check as the Rust version has a non-SSE
           | fallback code path - I'd do it myself but am not able to at
           | the moment.
        
         | mhh__ wrote:
         | > The Rust code has CPU feature detection and SSE intrinsics,
         | while the C code is more idiomatic.
         | 
         | These benchmarks are almost always either useless or a scam -
         | you either end up writing rewriting the same implementation n
         | times or you don't utilize the capabilities of the language,
         | either way you're not really measuring much of anything
         | intrinsic to the language itself - Rust and C both have the
         | same backends and if you really care about performance you're
         | going to take it to the max anyway, so inference by the
         | compiler isn't that important.
        
         | qart wrote:
         | I have seen this happening so often: C/C++/Rust often end up
         | using CPU-specific features, and the code starts looking more
         | and more like assembly code, and less like idiomatic high-level
         | language code. Basically, comparisons of programs written in
         | all the other languages against these three become meaningless.
         | And in turn, hurts benchmarksgame as a resource for comparing
         | languages.
         | 
         | If I had to write a performant library at work, I too might
         | rely on CPU-specific assembly wrappers in my code. But IMO,
         | such code has no place in a general-purpose cross-language
         | benchmark site.
        
       | etaioinshrdlu wrote:
       | I know it's a meme, but it really does seem like most C or C++
       | code would be better off transitioning to Rust at some point.
       | That includes the entire Linux kernel, web browsers, entire
       | OS's...
        
       | dilap wrote:
       | My experience using ripgrep and fd is that this is also true in
       | real-world programs. :-)
        
       | lmilcin wrote:
       | Which is not at all surprising. Rust has much larger compilation
       | unit and knows more about what can read/write a particular piece
       | of memory. This allows some occasions for optimization where C
       | compiler must be conservative.
       | 
       | An example of simpler version of this is Fortran that can be
       | faster for numerical loads due to the fact that Fortran disallows
       | aliasing of function arguments. C on the other hand, must pay the
       | price of having to be conservative with how it treats arguments
       | just in case they overlap.
        
         | the8472 wrote:
         | Wouldn't the C compiler be allowed to make similar assumptions
         | with -flto or -fwhole-program?
        
           | pjmlp wrote:
           | Yeah, but again that isn't C rather a specific
           | implementation.
        
       | rurban wrote:
       | Because Rust does alloca for all locals, and this if course
       | faster. Everyone else avoids it for security reasons. Just search
       | the Rust bugtracker for stack overflows.
        
       | nynx wrote:
       | The fastest n-body program is written in very idiomatic rust.
       | https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
        
         | FartyMcFarter wrote:
         | n-body in C compiled by clang runs just as fast as Rust
         | apparently:
         | 
         | https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
        
           | pjscott wrote:
           | It's not entirely surprising that a carefully-optimized C
           | program using explicit SSE intrinsics, plus a fancy trick
           | involving a low-precision square root instruction fixed up
           | with two iterations of Newton's method, would be fast. :-)
           | 
           | What impresses me is that the Rust version didn't do any of
           | that stuff, just wrote very boring, straightforward code --
           | and got the same speed anyway. Some impressive compilation
           | there!
        
             | FartyMcFarter wrote:
             | Good point! It would be interesting to find out where the
             | Rust version gets most of its speed from.
        
           | awestroke wrote:
           | The code for the C-Clang version is terrifying, compared to
           | the Rust version. Which one would you rather maintain?
        
             | berkut wrote:
             | Well, given the comments at the top:
             | 
             | // Contributed by Mark C. Lewis.
             | 
             | // Modified slightly by Chad Whipkey.
             | 
             | // Converted from Java to C++ and added SSE support by
             | Branimir Maksimovic.
             | 
             | // Converted from C++ to C by Alexey Medvedchikov.
             | 
             | // Modified by Jeremy Zerfas.
             | 
             | It sounds like no-one bothered to actually write a from-
             | scratch version of many of these things :)
        
       | [deleted]
        
       | 1vuio0pswjnm7 wrote:
       | The reasons I use C versus comparable alternatives are not
       | limited to speed. For example, the size of the compiler
       | toolchain, the speed of compilation and the size of the resulting
       | executables are all factors I have to consider. I do lots of work
       | on systems with limited resources. How does Rust compare on those
       | points versus, say, GCC.
       | 
       | https://dev.to/aakatev/executable-size-rust-go-c-and-c-1bna
        
         | steveklabnik wrote:
         | I mean, you'll get drastically different numbers with all of
         | those examples if you actually ask for options that produce
         | small binaries. That the default flags don't try to make things
         | small (across any of these toolchains) means this isn't really
         | a fair comparison.
         | 
         | The smallest "hello world" binary rustc has ever produced was
         | 137 bytes. https://github.com/tormol/tiny-rust-executable
        
           | fmntf wrote:
           | If that's possible, why should not binaries be small (without
           | too many downsides, eg execution speed) by default?
        
             | steveklabnik wrote:
             | Because everything is a tradeoff. Getting the binary to be
             | smaller means taking more compile time, because compilers
             | have to do work to reduce the size. It is much faster to
             | produce larger binaries.
             | 
             | Additionally, most compilers produce something that's
             | useful for development and debugging by default, and that
             | means including _extra_ stuff for that purpose.
        
             | mlyle wrote:
             | A big part of why binaries are "big" is dynamic linking and
             | external symbols.
             | 
             | That Rust hello world is just hand-crafted to never be
             | linkable to anything else at runtime and invoke a system
             | call with a buffer. This isn't really how we'd like to do
             | most things.
        
               | steveklabnik wrote:
               | Interestingly enough, I would say that dynamic linking
               | makes binaries smaller, that code no longer lives in the
               | binary, but another place instead.
        
               | mlyle wrote:
               | Sure. But there's a lot of overhead that comes with it,
               | too.
        
               | djeiasbsbo wrote:
               | I'm pretty sure that for the classic "Hello World"
               | example that's not the case, because we can simply use
               | the linux write system call.
               | 
               | For example, an assembly program that just calls write
               | and then exits is much smaller than even an unlinked
               | version of it that uses printf, because the ELF binary
               | doesn't need to contain information for the linker.
        
               | steveklabnik wrote:
               | Yes, on the very very very low end, this is the case, but
               | generally, in more real programs, it's not.
               | 
               | (The program I linked is an example of exactly what
               | you're talking about)
        
             | pjmlp wrote:
             | Just like C compilers, it is a matter of use case, and
             | compiling for size has tradeoffs regarding execution speed.
        
       | api wrote:
       | One of Rust's performance advantages is the compiler's ability to
       | unambiguously determine memory aliasing. Aliasing is why many
       | numeric kernels are written in Fortran, a much older language
       | that also enforces strict aliasing as it simply doesn't allow
       | overlapping references.
       | 
       | There are probably others as well, but this is the advantage I'm
       | familiar with. C's ambiguity makes it harder to achieve some
       | optimizations that can really matter on modern CPUs.
        
         | hawk_ wrote:
         | Unfortunately that's only on paper due to LLVM issues.
        
         | nynx wrote:
         | LLVM's support for this is bugged, so rust does not currently
         | take (edit: full) advantage of it.
        
           | runevault wrote:
           | I really hope this gets fixed at some point, just because I'd
           | like to see how much faster rust can get. In particular I
           | wonder how much it would impact the compiler itself since it
           | is also written in rust.
        
           | kzrdude wrote:
           | Does not fully take advantage of it.
        
           | steveklabnik wrote:
           | Rust does not take _full_ advantage of it; that is,  &T will
           | still get noalias, it's &mut T that's currently disabled.
           | 
           | The tracking bug is https://github.com/rust-
           | lang/rust/issues/54878
        
             | nynx wrote:
             | Ah, my mistake. Thank you for the correction.
        
               | steveklabnik wrote:
               | It's all good; most people don't draw the distinction. I
               | myself didn't know that &T was also noalias for years.
        
             | lambda wrote:
             | Here is an example which, while trivial, demonstrates some
             | of the optimization issues you can run into in C and C++
             | but not in Rust, even with noalias currently only applying
             | to &T.
             | 
             | C++: https://godbolt.org/z/dTPsbh Rust:
             | https://rust.godbolt.org/z/Mdx87h
             | 
             | In C and C++, compilers are allowed to infer "noalias"
             | based on pointer types; two pointers of different type are
             | not allowed to alias. This is known as type-based alias
             | analysis. But char * is given special treatment, because of
             | its use as a generic pointer type; so if one of your
             | arguments is char * or even signed char * , that disables
             | any optimizations which were relying on type-based alias
             | analysis.
             | 
             | This provides both for performance and undefined-behavior
             | footguns. If you ever try to use some pointer type other
             | than char * or signed char * to refer to data of another
             | type, you may inadvertently cause type-based alias analysis
             | to kick in, causing invalid optimizations and
             | miscompilation. On the other hand, if you have a function
             | which takes both a char * and another pointer, the compiler
             | may not apply optimizations that it otherwise could because
             | char * is allowed to alias anything.
             | 
             | In Rust, there is no such undefined behavior footgun.
             | Because of the LLVM bug, noalias isn't applied to &mut
             | pointers so there are still some cases which could be
             | better optimized, though it sounds like there is some
             | progress being made on the LLVM front so it should be fixed
             | at some point, and there are already places where the
             | compiler can do better optimizations with better safety due
             | to the stronger semantics of &T.
        
       | jolux wrote:
       | I think benchmarking C vs C++ vs Rust must only really be useful
       | for researchers. They're all making a similar tradeoff for
       | performance: forcing you to consider how you use memory. Does
       | anyone work in a field where the performance difference between
       | these specific three platforms matters? I'm genuinely curious.
       | Edit: also, if you could explain briefly why and what makes
       | particular choices out of the three unsuitable, that would be
       | awesome too.
        
         | jandrewrogers wrote:
         | It matters for real-world software development, though the
         | reason may not be intuitive. In theory, for any particular bit
         | of software, you can write code in any of these three languages
         | that has nearly identical performance. In practice, the
         | complexity of _expressing_ equivalent performance can vary
         | considerably depending on what you are trying to do.
         | 
         | There are finite limits to the complexity cost developers are
         | willing to pay for performance. Because the cost in each of
         | these three languages is different to express some thing,
         | sometimes there will be a threshold where in one or more of
         | these languages most developers will choose a less optimal
         | design. This manifests as practical performance differences in
         | real software even though in theory they are equally expressive
         | with enough effort.
         | 
         | This comes with another tradeoff. Efficient expressiveness
         | relative to software performance comes at a cost of language
         | complexity. C is a simple language that has enough efficient
         | expressiveness for simple software architectures. C++ is at the
         | extreme opposite; you can do mind-boggling magic with its
         | metaprogramming facilities that can express almost anything
         | optimally but god help you if you are trying to _learn_ how to
         | do this yourself. Rust sits in the middle; much more capable
         | than C, not as expressive as C++.
         | 
         | This suggests the appropriate language is partly a function of
         | the investment a developer is willing to make in learning a
         | language and what they need to do with it. Today, I use modern
         | C++ even though it is a very (unnecessarily) complex language
         | and write very complex software. Once you pay the steep price
         | of learning C++ well, you acutely feel the limitations of what
         | other languages _can 't_ express easily when it comes to high
         | performance software design.
         | 
         | I used to write a lot of C. It still has critical niches but
         | not for high-performance code in 2021, giving up far too much
         | expressiveness. C++17/20 is incredibly powerful but few
         | developers really learn how to wield that power, though usage
         | of it has been growing rapidly in industry as scale and
         | efficiency have become more important. Rust is in many ways an
         | heir apparent to C and/or Java, for different reasons. Or at
         | least, that is how I loosely categorize them in my head, having
         | a moderate amount of contact with all three. They all have use
         | cases where you probably wouldn't want to use the others.
        
           | steveklabnik wrote:
           | > In practice, the complexity of expressing equivalent
           | performance can vary considerably depending on what you are
           | trying to do.
           | 
           | Yep. Rust's safety means that in some cases, you can be
           | _more_ aggressive because you know the compiler has your
           | back. And in other cases, it makes harder things tractable.
           | The example of Stylo is instructive; Mozilla tried multiple
           | times to pull the architecture off in C++, but couldn 't
           | manage to do it until Rust.
        
         | pixel_fcker wrote:
         | Graphics software.
        
         | klysm wrote:
         | Yeah, databases
        
           | JimBlackwood wrote:
           | Could you elaborate a little? I'd be interested in this
           | answer
        
             | jandrewrogers wrote:
             | All three languages are different enough that depending on
             | the language you use it fundamentally alters your database
             | architecture to better fit the mechanics of the language.
             | Databases have a very high levels of internal complexity
             | naturally, so there is a strong incentive to align the
             | design with what the language can express without adding
             | substantially more complexity.
             | 
             | I work on database engines and the impact of language
             | choice on the design, architecture, and implementation of
             | database engines is very evident. In the specific case of
             | database engines, these differences in design can have a
             | very large performance impact.
        
         | londons_explore wrote:
         | I do some graphics stuff. As soon as you get to "this chunk of
         | code needs to be run for every pixel of every 4k frame at
         | 60fps", suddenly the number of clock cycles and registers
         | matters... Some of my platforms don't have GPU's, so it really
         | is squeezing everything possible out of the language and
         | compiler...
        
           | djeiasbsbo wrote:
           | I do audio stuff and it is the same there. DSP is easy until
           | it has to be in real time and there can only be minimal
           | latency...
        
           | mrec wrote:
           | I'm curious, what platforms support 4K output but don't have
           | any kind of GPU?
        
         | steveklabnik wrote:
         | I think what's most interesting about your comment is the
         | assumption that the three are in the same ballpark. You're not
         | wrong, but it just reminds me of how far we've come. That is,
         | the key is not "which of these three can eke out the last tiny
         | ounce of things," but that Rust has successfully landed across
         | that gap you see in the graph. That it's "(C/C++/Rust) vs
         | everything else" is in of itself an interesting result. You can
         | see some skepticism of the premise elsewhere in this thread,
         | even.
        
         | kzrdude wrote:
         | It's mostly an exercise that's useful for Rust: Internally, to
         | prove that "it works"* and externally, to make a credible name
         | for Rust.
         | 
         | (*) of course part of the output is also looking at the Rust
         | code and evaluating style and `unsafe`-wise what the price for
         | winning was.
        
       | acje wrote:
       | It struck me a while ago that the most powerful feature of rust
       | is the strong contracts libraries can and must express. This
       | allows people with much deeper knowledge than me to make awesome
       | stuff I can depend on.
        
       | seeekr wrote:
       | From the page: "... a pretty solid study on the boredom of
       | performance-oriented software engineers grouped by programming
       | language." I find this both funny and consider it true to some
       | degree. There's nothing like a good old friendly arms race for
       | the benefit of all (languages and its users, in this case)
       | involved!
        
       | pmarin wrote:
       | The only conclusion I have got about this web site is how much
       | some programmers like to write benchmark code in Rust.
        
       | indymike wrote:
       | Let's talk about speed when we are implementing the same
       | algorithm and optimizations, please. If $1 was donated to cure
       | cancer every time a developer games a comparison like this, there
       | would be no more cancer.
        
       | harporoeder wrote:
       | I was wondering if perhaps this was actually measuring a
       | difference between LLVM and GCC, but they also provide a set of
       | benchmarks of C Clang vs C GCC (1) and Clang is generally slower
       | in those test. Although there is some correlation between the
       | ones Clang wins in C And Rust.
       | 
       | 1. https://benchmarksgame-
       | team.pages.debian.net/benchmarksgame/...
        
         | arcticbull wrote:
         | Rust can be faster than C because in general C compilers have
         | to assume that pointers to memory locations can overlap (unless
         | you mark them __restrict). Rust forbids aliasing pointers. This
         | opens up a whole world of optimizations in the Rust compiler.
         | Broadly speaking this is why Rust can genuinely be faster than
         | C. Same is true in FORTRAN, for what it's worth.
        
           | alerighi wrote:
           | Well you are saying that even in C you can use the restrict
           | keyword to tell the compiler that 2 memory locations can
           | overlap. Of course is in the hand of the programmer to tell
           | the compiler to do so.
           | 
           | I don't think there is a fair comparison between Rust and C:
           | C is just an higher level assembler, if the programmer knows
           | what he's doing he can use the hardware 100% of its
           | potential. That is the reason why C is still used in all the
           | embedded applications where you have ridiculous low power
           | microcontrollers and you must squeeze out the best
           | performance.
           | 
           | That is the difference between C and Rust to me: for each
           | fast Rust program you are guaranteed that you can write an
           | equivalent performant program in C (or assembly). Worst case
           | scenario you use inline assembly in C and you get that.
           | 
           | Thus the contrary cannot be true for Rust: if I give you a
           | heavily optimized C program not always you can produce an
           | equivalent version in Rust.
           | 
           | Also not always these optimizations are what you want. In C
           | you can choose the level of optimizations, and most of the
           | time, at least on the program that I write, I choose a low
           | level of optimization. The reason is that a lot of time
           | performance is not the only thing that matters, but it maybe
           | matters most the stability of the code (and a code compiler
           | with optimizations is more likely to contain bugs) or the
           | ability to debug (and thus the readability of the assembly
           | output of the compiler).
           | 
           | Rust gives out an horrible assembly code, that is impossible
           | to debug, or to check for correctness. You just have to hope
           | that the compiler doesn't contains bugs. For the same reason
           | Rust is the ideal language to write viruses, since it's
           | difficult to reverse engineer.
        
             | arcticbull wrote:
             | I think you'd be hard pressed to find more than a handful
             | of usual C programmers, even in embedded, who know what the
             | __restrict keyword does, let alone are rigorous in its
             | application.
        
               | steveklabnik wrote:
               | I've always wondered if the reason why Rust keeps running
               | into llvm bugs around restrict is that it is used so
               | sparingly, and the semantics being unclear enough, that
               | these codepaths just aren't exercised as often.
        
             | steveklabnik wrote:
             | > Thus the contrary cannot be true for Rust: if I give you
             | a heavily optimized C program not always you can produce an
             | equivalent version in Rust.
             | 
             | Do you have some evidence here? Or something more specific?
             | 
             | (The rest of your comment is opinions, which you can of
             | course have, but does not match my personal experience,
             | FWIW.)
        
             | saagarjha wrote:
             | > For the same reason Rust is the ideal language to write
             | viruses, since it's difficult to reverse engineer.
             | 
             | Eh, not really.
        
             | efaref wrote:
             | There's a translator for C99 to (unsafe) Rust:
             | https://github.com/immunant/c2rust
             | 
             | So probably you _can_ give me any C program and I 'll be
             | able to give you an equivalent Rust program. It'll probably
             | perform about the same, too.
        
             | pjscott wrote:
             | > Well you are saying that even in C you can use the
             | restrict keyword to tell the compiler that 2 memory
             | locations can overlap. Of course is in the hand of the
             | programmer to tell the compiler to do so.
             | 
             | It says that they _can 't_ overlap, but yes, you can get
             | the compiler to optimize based on this if you provide the
             | aliasing information and remember to keep it accurate. You
             | probably won't do that, though, for anything except the
             | most performance-critical of inner loops. A compiler that
             | can infer more about aliasing can provide more
             | optimization, safely, in the >99% of code that doesn't have
             | explicit aliasing annotations, and that's probably worth
             | some decent speedups in practice.
             | 
             | There are two main things you might be talking about when
             | you call a programming language fast:
             | 
             | 1. Given some really performance-critical code and enough
             | time to hand-optimize it, how close can you get the
             | compiler's output to the optimal machine code?
             | 
             | 2. If you write code normally, not making any effort to
             | micro-optimize, how fast will it usually be?
             | 
             | Both of these matter. #1 matters when you've got some
             | bottlenecks that you really care about, and #2 matters when
             | you've got a flatter profile -- and both situations are
             | common.
             | 
             | Another illustrative example of the #2 type of performance
             | with Rust is the default collections compared to the ones
             | in C++. Rust's default HashMap is faster than C++'s
             | std::unordered_map because of some ill-advised API
             | constraints in the C++ STL. You _can_ get similar
             | performance by using a custom C++ hash table
             | implementation, and in fact Rust 's HashMap is a port of a
             | C++ hash table that Google wrote for that purpose, but most
             | people probably won't bother.
             | 
             | So, a semantic question: if you _can_ get the same speed in
             | one language as you can in another, but in practice usually
             | don 't, is one language faster than the other?
        
               | hashingroll wrote:
               | > but most people probably won't bother.
               | 
               | Most people who don't bother with that probably also
               | don't bother with the performance of their hashmap.
        
             | pjmlp wrote:
             | People think C is an high level Assembler, that was only
             | true on PDP-11, 8 and 16 bit CPUs.
             | 
             | Also in C you cannot retrofit restrict in existing
             | programs, specially they depend on existing binary
             | libraries.
        
           | walki wrote:
           | > C compilers have to assume that pointers to memory
           | locations can overlap, unless you mark them __restrict...
           | 
           | What I don't fully understand is: "GCC has the option
           | -fstrict-aliasing which enables aliasing optimizations
           | globally and expects you to ensure that nothing gets
           | illegally aliased. This optimization is enabled for -O2 and
           | -O3 I believe." (source: https://stackoverflow.com/a/7298596)
           | 
           | Doesn't this mean that C++ programs compiled in release mode
           | behave as if all pointers are marked with __restrict?
        
             | jart wrote:
             | Not for char. Compiler always assumes non-restrict for char
             | pointers and arrays, which is important to remember if
             | you're ever operating on a RGB or YCbCr matrix or
             | something.
        
               | bsder wrote:
               | Huh.
               | 
               | Does that also hold for a "uint8_t"--which is often just
               | a renamed unsigned char rather than being a genuine type
               | of its own?
        
               | jart wrote:
               | Yes
        
               | Hello71 wrote:
               | not according to the standard, but in practice yes,
               | currently. there are arguments that it should be
               | considered not for optimization reasons, but there is
               | likely too much existing code relying on it to change
               | behavior. (see related llvm, gcc bugs)
        
             | murderfs wrote:
             | restrict and strict aliasing have to do with the same
             | general concept, but aren't the same. They both have to do
             | with allowing the compiler to optimize around assuming that
             | writes to one pointer won't be visible while reading from
             | another. As a concrete example, can the following branches
             | be merged?                 void foo(/*restrict*/ bool* x,
             | int* y) {         if (*x) {           printf("foo\n");
             | *y = 0;         }         if (*x) {
             | printf("bar\n");         }       }
             | 
             | Enabling strict aliasing is effectively an assertion that
             | pointers of incompatible types will never point to the same
             | data, so a write to y will never touch *x. restrict is an
             | assertion to the compiler on that specific pointer that no
             | other pointer aliases to it.
        
               | walki wrote:
               | OK thanks, indeed Clang is able to generate better
               | assembly using __restrict__. And -O3 generates the same
               | assembly as -O3 -fstrict-aliasing (which is not as good
               | as __restrict__).
               | 
               | I wish there was a C/C++ compiler flag for treating all
               | pointers as __restrict__. However I guess that C/C++
               | standard libraries wouldn't work with this compiler
               | option (and therefore this compiler option wouldn't be
               | useful in practice).
        
           | shepmaster wrote:
           | Why does the Rust compiler not optimize code assuming that
           | two mutable references cannot alias?
           | --https://stackoverflow.com/q/57259126/155423
        
             | arcticbull wrote:
             | Follow along here for re-enabling: https://github.com/rust-
             | lang/rust/issues/54878
             | 
             | For what it's worth I did say "can be" not "is" because I
             | wasn't sure of the current state of this feature. I was
             | just passing on the theory.
             | 
             | There are certainly other potential reasons, for instance
             | constant expressions and generics. And, of course, the
             | prohibited undefined behavior makes other optimizations
             | possible, and potentially better.
        
       | mh7 wrote:
       | Some of the rust versions calls C libraries for its heavy lifting
       | (gmp, pcre) so I wouldn't take this too seriously.
        
       | Matthias247 wrote:
       | Apart from those benchmark games a lot of real world C is a lot
       | less performant than people think it might be. I spent a fair
       | amount of time reviewing C code in the last 5 years - and things
       | that pop up in nearly every review are costly string operations.
       | Linear counts due to the use of null terminated strings and extra
       | allocations for substrings to attach null terminators, or just
       | deep copies because ownership can't be determined are far more
       | common than exceptional. This happens because null terminated
       | strings feel like idiomatic C to most people.
       | 
       | Rust avoids those from the start by making slices idiomatic.
       | 
       | Another thing I commonly see is the usage of suboptimal
       | containers (like arrays with linear search) - just because it's
       | there's no better alternative at hand (standard library doesn't
       | offer ones and dependency management is messy). Which also makes
       | it less surprising that code in higher level languages might
       | perform better.
        
         | frutiger wrote:
         | Mostly agree with your comment but linear search through arrays
         | of size less than a few hundred will typically beat more
         | sophisticated structures such as red-black trees or hashtables.
         | This is due to prefetching and avoidance of unpredictable
         | pointer traversals. Asymptotic complexity is only that:
         | asymptotic.
         | 
         | In many programs in many domains the sizes of these data
         | structures will rarely exceed this limit.
        
           | vbezhenar wrote:
           | It might be fast, but it'll load CPU caches with that data
           | and it'll evict another useful data. Which means that while
           | this particular code will be fast or at least not very slow,
           | some other code will be slow because its data have to be
           | fetched again.
           | 
           | I have no idea whether that matters or even easy to
           | measure...
        
             | forrestthewoods wrote:
             | You're stretching there imho.
             | 
             | L1 and L2 are per core. A cacheline is 64 bytes and per-
             | core cache size is on the order of 32kb and 256kb for
             | L1/L2.
             | 
             | Reading data from L1 is on the order of 1 nanosecond (or
             | less) and RAM on the order of 50 nanoseconds.
             | 
             | If you're scanning an array and load a dozen cachelines
             | that's almost certainly preferable to several cache-misses
             | (and lines).
             | 
             | Memory access is very often an application's bottleneck.
             | The answer is almost always more arrays and fewer pointers.
        
         | michaelmior wrote:
         | > null terminated strings feel like idiomatic C to most people
         | 
         | Doesn't that mean that null terminated strings _is_ idiomatic
         | C? That is, my understanding of the term idiomatic is that it
         | is defined by whatever is most natural to users of a language
         | regardless of whether it is the most performant.
        
           | rcoveson wrote:
           | Null terminated strings are often called cstrings. They're
           | beyond idiomatic; they're part of the C standard library.
        
             | akkartik wrote:
             | Also Unix. Syscalls return null-terminated strings all over
             | the place. So any language running on Unix has to deal with
             | null-terminated strings.
             | 
             | I know because I run a userland that uses length-prefixed
             | strings as far as possible: https://github.com/akkartik/mu
        
             | michaelmior wrote:
             | True. I was skipping over that, but the fact that functions
             | manipulating null terminated strings are part of the
             | standard library is certainly a reason in itself to
             | consider them idiomatic.
        
               | akkartik wrote:
               | They're agreeing with you.
        
         | martincmartin wrote:
         | Also lack of generics can make it slow, e.g. qsort() requires a
         | function call for each comparison. So C++'s std::sort() can be
         | significantly faster on an array of integers.
        
           | bluecalm wrote:
           | I benchmarked it several times in the past and couldn't
           | replicate std::sort being faster (GCC with high optimization
           | settings). Anyway both are slow. If you need fast sort you
           | need an implementation without any function calls (no
           | recursive calls) and both the pivot choice and the chunk size
           | at which insert sort kicks in optimized to your data and
           | hardware. My experience is that you can beat built in sort by
           | 2x to 3x.
        
             | lumost wrote:
             | How would you perform this optimization? If it's the same
             | data getting sorted, why not put it in an ordered data
             | structure?
        
               | mhh__ wrote:
               | Andrei Alexanrescu has a talk on doing this - he calls
               | them metaparameters e.g. where a hybrid sort chooses to
               | change algorithm.
               | 
               | One library I have exploits the fact that D templates are
               | embarrassingly better than C++'s, so you can actually
               | benchmark a template against it's parameters in a clean
               | manner without overhead - that could be anything from a
               | size_t parameter for a sort or a datastructure for
               | example.                       enum cpuidRange = iota(1,
               | 10).map!(ctfeRepeater).array;
               | @TemplateBenchmark!(0, cpuidRange)
               | @FunctionBenchmark!("Measure", iota(1, 10), (_) => [1, 2,
               | 3, 4])(meas)              static int sum(string
               | asmLine)(inout int[] input)             {
               | int tmp;                 foreach (i; input)
               | {                     tmp += i;
               | mixin("asm { ", asmLine, ";}");                 }
               | return tmp;             }
               | 
               | This made-up (pointless) benchmark measures how insert a
               | number of cpuid instructions into the loop of a summing
               | function affects it's runtime. My library writes the code
               | from your specification as above to generate the
               | instantiations and loop to measure the performance. As
               | you might guess, the answer is a lot (CPUID is slow and
               | serializing).
        
           | mh7 wrote:
           | It's more to do with the fact that std::sort's definition is
           | visible to the compiler and qsort() is not. Put qsort() code
           | in stdlib.h, make it static and write a static intcmp() and
           | you'll see the compiler inline that no problem.
        
             | jjgreen wrote:
             | I've done this a few times using
             | http://www.corpit.ru/mjt/qsort.html
        
             | ryanianian wrote:
             | Sure you can hard-code intcmp into qsort but then it would
             | only work for arrays of ints.
             | 
             | You could do some macro magic instead of templates e.g.
             | `DEFINE_QSORT(int, intcmp)` which could stamp out
             | `qsort_int` but that's not a part of the stdlib.
             | 
             | C++ arguably gets this right since sort<int> and
             | sort<string> will be separate functions, although templates
             | are of course a footgun. And of course duping the logic for
             | std::sort<T> for a bunch of different T impls increases the
             | binary size.
        
               | TheNewAndy wrote:
               | The poster you are replying to didn't suggest hardcoding
               | intcmp into qsort - just making it so the implmentation
               | of qsort is available to the compiler when the comparison
               | function is known (i.e. just like with C++).
               | 
               | When this is done, the compiler can inline qsort, and
               | replace the indirect function call with an inlined
               | version of intcmp, and then things are equivalent.
        
               | marvy wrote:
               | I think mh7 did not mean to hard-code intcmp into qsort.
               | The idea is to move the definition of qsort directly into
               | the stdlib.h header file. That way, the compiler can see
               | the definition of qsort and intcmp at the same time.
               | 
               | In that case, the compiler could make a specialized qsort
               | using intcmp automatically.
        
             | [deleted]
        
       | [deleted]
        
       | Svetlitski wrote:
       | Once LLVM fixes some bugs with `noalias`, at which point Rust
       | will begin using it again in more circumstances [1], I'd expect
       | to see Rust get even faster in these benchmarks, given that the
       | Rust compiler knows _much_ more about which pointers do /do-not
       | alias than most other programming languages [2] and the myriad
       | optimizations this knowledge allows.
       | 
       | [1] https://github.com/rust-
       | lang/rust/issues/54878#issuecomment-...
       | 
       | [2] https://doc.rust-lang.org/nomicon/aliasing.html
        
         | stabbles wrote:
         | I doubt there's any performance to be gained that way, but if
         | so, the C implementation can just use `restrict` to the same
         | effect.
        
           | [deleted]
        
           | roca wrote:
           | "Just use 'restrict'" isn't as easy as it sounds. You need to
           | only use 'restrict' where you are 100% sure pointers can't
           | alias. Wherever you get this wrong you have introduced a
           | subtle bug that a) only shows up in optimized code b) may or
           | may not show up at all depending on compiler version and
           | target architecture and c) only show up when two pointers
           | actually alias at runtime, which may be rare, and may be
           | intermittent ... in other words these bugs will be _hell_ to
           | debug. So in fact in a large codebase it will be a lot of
           | work to figure out where you can safely put  'restrict' and
           | you will probably introduce some very nasty bugs. Most people
           | aren't going to be willing to do this.
        
           | pornel wrote:
           | Rust's problem with aliasing in LLVM is caused exactly by
           | limited usefulness of C's restrict.
           | 
           | LLVM implements only coarse per-function aliasing information
           | needed by C, and doesn't properly preserve fine-grained
           | aliasing information that Rust can provide.
        
             | zucker42 wrote:
             | I thought the problem was there is simply a bug in the
             | implementation because restrict isn't used frequently
             | enough to have exposed the bug earlier.
        
             | jcranmer wrote:
             | Note that Rust's issue was that LLVM is buggy with restrict
             | (because it isn't exercised frequently).
             | 
             | That said, there is a series of patches in-flight to get
             | actual working full restrict support:
             | https://reviews.llvm.org/D69542
        
           | vvanders wrote:
           | Have you ever used restrict in anger?
           | 
           | I've done it when we really needed that performance for an
           | inner loop(particle system). It can be a real bastard to keep
           | the non-alias constraint held constant in a large, multi-
           | person codebase and the error cases are really gnarly to
           | chase down.
           | 
           | Compare that to Rust which has this knowledge built in since
           | it naturally falls out of the ownership model.
        
             | Gibbon1 wrote:
             | I agree with that, no aliasing in C is an ugly kludge that
             | bitches about perfectly fine code that benefit benefit from
             | it. And it's hard insure that it actually works in code
             | that does. Worse the failure is completely silent.
        
               | jart wrote:
               | Is there something special about the noalias keyword that
               | it really brings out the most unprofessional pig language
               | in developers? https://www.lysator.liu.se/c/dmr-on-
               | noalias.html
        
             | Blikkentrekker wrote:
             | Isn't every pointed `restrict` in _Fortran_ which was often
             | cited as why it still outperformed _C_ in many cases for a
             | very long time?
        
             | cma wrote:
             | When people use the unsafe keyword are they always taking
             | into account aliasing?
             | 
             | At least you can audit only those places though.
        
               | zamalek wrote:
               | Yeah, `unsafe` is the developer signing a contract to
               | uphold all the guarantees that safe rustc provided (at
               | least from the perspective of the public interface, the
               | invariants can be broken in private code).
               | 
               | That's why Rust developers don't take kindly to
               | unnecessary unsafe usage, because the audit surface area
               | (and interaction complexity) increases.
        
               | slaymaker1907 wrote:
               | Rust takes const with pointers very seriously. It is
               | undefined behavior to mutate anything non-mut (pointer or
               | reference) unless it is through an UnsafeCell. While you
               | can break compiler guarantees through pointers in Rust,
               | dereferencing this pointers must still obey the above
               | guarantee (basically think of it as you need to go back
               | to references to actually use the aliased pointers which
               | would be undefined due to the aliasing).
               | 
               | Therefore, the optimizer can always assume mutations
               | don't alias. Even UnsafeCell isn't allowed to break the
               | alias rules, it just provides flexibility going from
               | immutable to mutable.
        
         | nindalf wrote:
         | How often does benchmark code have a function that takes two
         | pointers that could potentially alias each other? If it's as
         | rare as I think it is, it might not have that much of an impact
         | on Rust's position in the benchmarks game.
         | 
         | Still, real world performance will probably benefit from this
         | fix so it's a positive change regardless.
        
           | jerf wrote:
           | I've been fairly convinced for a while that once Rust matures
           | (which is probably fairly close to "now", but I've held this
           | opinion for years) that it's going to have a performance
           | advantage in real code that's going to be hard to capture in
           | benchmarks, because it's easy in a small benchmark to be very
           | careful and ensure that you don't have aliasing, avoid extra
           | copies, etc.
           | 
           | Where I expect Rust to really shine performance-wise is at
           | the larger scale of real code, where it affords code that
           | copies less often because the programmer isn't sure in this
           | particular function whether or not they own this so they just
           | have to take a copy, or the compiler can't work out aliasing,
           | etc. Ensuring at _scale_ that you don 't take extra copies,
           | or have an aliasing problem, or that you don't have to take
           | copies of things just to "be sure" in multithreading
           | situations, is hard, and drains a lot of performance.
        
             | slaymaker1907 wrote:
             | One big advantage when writing average code is that while
             | Rust emphasizes generics and makes them easy to write, C++
             | makes it so difficult that you try and avoid it at all
             | costs. Big, complex projects are going to probably use
             | polymorphism all over the place in my experience which is
             | not really captured by benchmarks.
             | 
             | However, one advantage for C and C++ for such projects is
             | that they make it far more pleasant to do bulk allocations.
             | With C, you just need to group your mallocs and cast the
             | memory and C++ offers placement new where you pass in the
             | memory explicitly. With Rust, you need to reach into unsafe
             | Rust (rightfully so), but unsafe Rust is very unpleasant to
             | write.
        
             | throwaway894345 wrote:
             | I agree with this. Benchmark code differs from real code in
             | in that it approximates the performance ceiling for a
             | language implementation; it's not "ordinary code" or even
             | "somewhat optimized" but usually the most optimal code one
             | can conceive of with little respect paid to competing
             | concerns, like maintainability. Rust aspires to make
             | idiomatic, maintainable code almost as performant as
             | benchmark code by way of zero-cost abstractions; probably
             | more so than any other language, including C and C++, and
             | all the while keeping the ceiling high relative to other
             | languages.
             | 
             | Unfortunately, I'm still of the opinion that "idiomatic
             | Rust" is quite a lot harder to write than idiomatic Go or
             | C# or etc, and many applications absolutely index on
             | developer velocity and performance and quality are "nice-
             | to-haves". Many companies are making money hand over fist
             | with Python and JavaScript, and Go and C# are quite a lot
             | more performant and typesafe than those languages already;
             | Rust is better still in these areas, but returns diminish.
             | If C# and Go preclude 95% of the errors found in Python and
             | JS, it's not worth trading tens or hundreds of percents of
             | developer velocity for that extra 4% or so improvement in
             | quality (a Go developer could recoup a bit more of that 4%
             | by writing tests in the time they save over Rust).
             | 
             | Of course, this is all subjective, and I've met smart
             | people who argue with a straight face that Rust is better
             | for developer velocity than languages like Go, so YMMV.
        
             | roca wrote:
             | I agree that the aliasing control and other features of
             | Rust enable all kinds of interesting optimizations that
             | currently aren't being done, and 'restrict' is only part of
             | it. For example we also know that the data behind a shared
             | reference is really immutable (apart from UnsafeCell etc).
             | 
             | Unfortunately we may need an entirely new generation of
             | compiler infrastructure to exploit these opportunities
             | because LLVM is really a C/C++ compiler at heart and may
             | not be happy about taking big changes that can't benefit
             | C/C++.
        
           | kolbe wrote:
           | The problem is that it's "rare" and not "impossible."
           | Compilers have to be logically sound, not probabilistic when
           | it comes to defined behaviors.
        
           | loeg wrote:
           | Benchmark hackers can just sprinkle `restrict` all over the
           | place, too. Real world C code doesn't get restrict by default
           | and often isn't fully annotated, so the benchmarks may
           | artificially hide the real-world difference.
        
       | notorandit wrote:
       | I am not sure I can buy such a comparison. Someone smarter than
       | me already argued about test implementations. Someone else also
       | put compilers and interpreters into prospective. Of course
       | language expressiveness can gauge in but, IMHO, comparing the
       | same sort algorithm or the same hash table implementation (or
       | n-queens algo) could make much more sense especially with
       | comparable compilers.
       | 
       | If Rust implementation is father than C's, kudos goes to the
       | compiler, not to the language
        
       | ma2rten wrote:
       | I'm having a hard time making sense of this page. Why is this
       | comparing fastest implementation with slowest implementation? Why
       | is the metric busy time/least busy? Why is C++ so much better
       | than C?
        
         | jeffbee wrote:
         | Speaking generally and about no specific program, you should
         | expect C++ to be faster than C. C++ has more ways for the
         | programmer to communicate with the compiler.
        
           | vitus wrote:
           | At the same time, a lot of those mechanics involve additional
           | overhead (e.g. vtable lookups for dynamic dispatch per
           | inheritance).
           | 
           | But yes, some of these do provide hints for the compiler,
           | e.g. constexpr, ownership semantics per unique_ptr. There's
           | nothing stopping a human from writing equivalent C, so my
           | suspicion is that the performance gap is primarily due to the
           | benchmark implementation.
        
             | jandrewrogers wrote:
             | While nothing prevents you from writing C that will
             | generate the equivalent code, it requires several times
             | more lines of C than the equivalent C++. At which point, it
             | becomes an economics discussion. C is usually a choice when
             | pure performance is secondary to other considerations like
             | portability.
             | 
             | Most C++ code I see has few vtables, and what vtables exist
             | are mostly removed by the compiler. Java-style inheritance
             | hierarchies are not idiomatic in C++. The code gen for C++
             | looks a lot like hyper-optimized C code, but without having
             | to write hyper-optimized code. C++ naturally provides much
             | more information about intent to the compiler than C does,
             | and modern compilers are excellent at using that
             | information to provide nearly optimal code gen.
        
               | morelisp wrote:
               | > Most C++ code I see has few vtables, and what vtables
               | exist are mostly removed by the compiler. Java-style
               | inheritance hierarchies are not idiomatic in C++.
               | 
               | FWIW I feel like this is a fairly recent (good!)
               | development. Up through TR/TR1 I think at least half the
               | C++ code I saw was pretty heavy on the "Java envy" and
               | not at all concerned about the cost of dynamic dispatch
               | or memory allocation. Avoiding vtables was derided as
               | part of "C with classes" - today that usually refers more
               | to templates and exceptions but STL implementations were
               | not mature enough and too much code not exception-safe
               | for those to be universally viable back then. Something I
               | heard more than once was that if you had a destructor it
               | should _always_ be virtual just in case someone wanted to
               | subclass it later.
        
               | pjmlp wrote:
               | Quite natural given that Java was created to be
               | attractive to 90's C++ developers, which carried their
               | development practices into Java.
        
               | pjmlp wrote:
               | Just a small, correction, you mean 90's style C++ GUI
               | frameworks inheritance are no longer idiomatic in modern
               | C++.
               | 
               | Naturally we have to ignore that wxWidgets, Gtkmm, MFC,
               | ATL, Qt, COM, DirectX, IO Kit, DriverKit, Skia are still
               | around.
        
             | jeffbee wrote:
             | You've got it backwards. For the same amount of
             | polymorphism in the design of a program C++ is likely to be
             | faster because a C++ compiler can often devirtualize calls
             | but a C compiler faced with a home-grown vtable (the struct
             | of function pointers that every large C program eventually
             | uses) will never be able to do so.
        
               | vitus wrote:
               | Why bother to write the vtable in the first place?
               | 
               | My experience is that C++ that's written for performance
               | will often prefer the use of templates over inheritance
               | since the cost is then paid upfront by the compiler.
               | What's stopping a C programmer from hand-coding template
               | instantiations (via macro or otherwise)?
        
               | jeffbee wrote:
               | Sure you are welcome to reimplement C++ in C macros. When
               | your time is worthless anything is possible.
        
       | albertzeyer wrote:
       | Interestingly, C++ seems to be the fastest overall.
        
         | agumonkey wrote:
         | After the rust blossom storm I didn't track cpp implementation
         | evolutions.. did cpp compiler perf/libs increased or was is
         | simply faster and still is the same ?
        
       | kowlo wrote:
       | Searched the page for "Rust" which returned nothing... and the
       | box labels are awkward. Why not label them conventionally?
        
         | gus_massa wrote:
         | You can see the data in https://benchmarksgame-
         | team.pages.debian.net/benchmarksgame/... but it has no graphic
         | comparison.
        
       | topspin wrote:
       | Is it just me or has that 'benchmarks game' site been growing
       | less navigable over time? It used to be easy to use for comparing
       | benchmarks across several languages. If that capability still
       | exists somewhere it's buried and I'm not interested in puzzling
       | it out. There are no side bars or menus or anything helpful.
        
       | kzrdude wrote:
       | Rust seems to be using parallelism better. In one benchmark
       | though (fasta), C gcc is using all 4 cpus and Rust only two, and
       | still wins.
       | 
       | (Looking at just C gcc vs Rust) https://benchmarksgame-
       | team.pages.debian.net/benchmarksgame/...
        
         | FartyMcFarter wrote:
         | The C version uses OpenMP, while the Rust version doesn't.
         | 
         | I tried running the C code with 2 and 4 threads, wall-time and
         | CPU time don't change much in either case which is strange
         | (this is cygwin with gcc 10.2.0):
         | 
         | 2 threads:                 $ /usr/bin/gcc -pipe -Wall -O3
         | -fomit-frame-pointer -march=ivybridge -fopenmp fasta.c -o
         | fasta.gcc-2.gcc_run && time ./fasta.gcc-2.gcc_run 25000000 |
         | md5sum       fd55b9e8011c781131046b6dd87511e1 *-
         | real    0m0.724s       user    0m1.468s       sys     0m0.108s
         | 
         | 4 threads:                 $ /usr/bin/gcc -pipe -Wall -O3
         | -fomit-frame-pointer -march=ivybridge -fopenmp fasta.c -o
         | fasta.gcc-2.gcc_run && time ./fasta.gcc-2.gcc_run 25000000 |
         | md5sum       fd55b9e8011c781131046b6dd87511e1 *-
         | real    0m0.670s       user    0m1.514s       sys     0m0.046s
        
       ___________________________________________________________________
       (page generated 2021-01-03 23:00 UTC)