[HN Gopher] Rust is now overall faster than C in benchmarks
___________________________________________________________________
Rust is now overall faster than C in benchmarks
Author : wiineeth
Score : 319 points
Date : 2021-01-03 18:16 UTC (4 hours ago)
(HTM) web link (benchmarksgame-team.pages.debian.net)
(TXT) w3m dump (benchmarksgame-team.pages.debian.net)
| [deleted]
| Animats wrote:
| Two charts with different languages. Unclear what is being
| measured. Is this a humor article?
| nindalf wrote:
| This site has a page where they explain what they're doing -
| https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
|
| Basically, it's toy programs written in each of these
| languages. They measure how fast each one is executed. This
| methodology does have it's limitations, which that page is
| upfront about.
| moldavi wrote:
| Shouldnt they be about the same? (At least until the LLVM
| immutability optimizations happen)
|
| I suspect surprising factors are in play.
| jug wrote:
| I'd look at the algorithms here. Looking a bit fishy with
| sometimes quite serious differences...
| skohan wrote:
| The ceilings should be similar, but there is an argument that
| there are cases where idiomatic Rust might outperform idiomatic
| C or vice versa.
|
| For instance, in real-world usage, you might do a bit more
| redundant copying in C in places you would avoid it with the
| borrow checker in Rust.
|
| Conversely, enforcing RAII in Rust might cost performance in
| some scenarios relative to C.
| yudlejoza wrote:
| When Rust is faster than C in a benchmark in which C++ is also
| faster than C, I know I can safely ignore such benchmark.
| [deleted]
| tetromino_ wrote:
| The problem is not with C the language, but with the C standard
| library, which has many inefficiency warts. Examples:
|
| * Strings. Any form of string access other than scanning the
| string's characters one by one from start to finish can benefit
| from knowing the string's length in advance. In C++,
| std::string and std::string_view know their lengths; plain C
| strings don't. Thus, in performance-optimal plain C, _almost
| any_ function that takes a string parameter ought to also take
| the string length as a separate parameter - but most C standard
| library functions neglect to do so.
|
| * Callbacks. When the callback is static, you want to give the
| compiler the opportunity to inline it into the call site. In
| C++, this is natural: pass the callback as a template
| parameter. In performance-optimal C, you'd want to provide
| macro version of functions that take callbacks. But the C
| standard library only supports callbacks as function pointers
| (e.g. in bsearch(3), qsort(3) etc.), which means unnecessary
| pointer dereferences and which makes inlining impossible.
| gambiting wrote:
| >>plain C strings
|
| But.....there is no such thing. There are arrays of chars,
| the whole principle of C is "if you want to know the length
| of a string....just store it yourself". It's a bit like
| saying that two wooden planks don't have the same
| functionality as a cupboard. Like, you're technically
| correct, but the whole idea is that you can use the planks to
| build your own cupboard or literally anything else.
| coldtea wrote:
| > _I know I can safely ignore such benchmark._
|
| C++ makes certain intentions clear to the compiler in ways that
| C doesn't, which makes certain optimizations possible.
| guenthert wrote:
| That has always been the case, but it appears that the GNU
| compiler finally makes good on that promise. Kudos to all
| involved!
| lambda wrote:
| > I know I can safely ignore such benchmark
|
| And yet, rather than ignoring it, you are commenting on it,
| with a pithy retort which dismisses the entire benchmark
| without actually providing any additional insight.
|
| Programming languages, compilers, library ecosystems, the
| groups of people who decide to sit down and try to produce a
| better result for a given language, and the benchmark
| maintainers who decide what submissions count for a given
| language (does a C solution that just uses entirely inline ASM
| count?) are incredibly complex systems. Any single metric is
| never going to capture the full richness of the language, is
| never going to be representative of the experience you will
| have for every single program, etc.
|
| But does that make metrics useless? No, it just means that you
| should be informed about their limitations. You shouldn't just
| look at a single number, but instead make sure you understand
| well enough what is being measured to know how well that number
| represents anything useful.
|
| So rather than just dismissing this benchmark, it would be
| useful to ask "why are the C++ results better than the C
| results on this benchmark?"
|
| Some benchmark challenges like this allow pretty much any
| program that accepts the right input and produces the right
| output; which means you get results in which no computation is
| actually done, the output is simply hard-coded and you are
| basically just measuring the startup time or request time of
| the language or library.
|
| This particular set of benchmarks imposes some constraints to
| avoid that kind of behavior. Programs have to follow the same
| basic algorithm, so you can't figure out some clever
| algorithmic optimization which applies only for the particular
| input used in this benchmark. For things like the regex
| challenge, you are expected to use either the built-in regex in
| your languages standard library, or a common general-purpose
| regex implementation, not a specialized regex implementation
| optimized just for this particular benchmark.
|
| The goal of this set of benchmarks is to provide a reasonable
| set of reasonably realistic small problems, implemented using
| the same algorithm, and using the normal language and library
| features. It uses small simple problems in order to make it
| easy to read the programs and learn about the performance
| characteristics of the language.
|
| So rather than dismissing, why don't we take a look at the
| fastest C and C++ implementations of some of the problems?
|
| Here's the fastest implementation of the k-nucleotide problem
| in C and C++:
|
| https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
| https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
|
| I haven't sat down to do detailed profiling and performance
| comparison of each; but just off the top of my head, here are a
| few things I see which could be relevant.
|
| The C++ implementation takes advantage of a number of C++
| features; it uses a hash table from GNU pb_ds, a C++ container
| library which allows for a good amount of customization based
| on template parameters. The C implementation uses khash, a very
| fast hash table implementation which uses macros for similar
| customization.
|
| The C++ implementation makes note of using move semantics in a
| number of places, which potentially allows for certain
| optimizations that wouldn't be possible if the compiler had to
| copy data.
|
| The C++ implementation uses insertion into a templated ordered
| map to sort the results, while the C implementation uses the
| standard library qsort. This allows the comparison function to
| be inlined into the C++ sort, while it's called through a
| function pointer in the C implementation.
|
| Without actually doing some experimentation and profiling, it's
| hard to say which of these makes a difference, or if it's
| something else. But this does show that C++ provides facilities
| for generic container and algorithm types that C does not. In
| the C implementation, macros are used to work around this for
| the hash table case, while function pointers are used for
| sorting.
|
| Anyhow, rather than simply dismissing these results, why not
| dig into where the difference really lies, and provide a better
| implementation in C if you think that you can?
|
| No one set of benchmark results should be taken as gospel. But
| I think this particular benchmark game is fairly useful for
| getting a rough sense of "if I write all of my code in this
| particular language, using either the standard library or
| commonly available off the shelf libraries, how much of a
| performance penalty am I likely to pay?"
|
| I also find that the grouping of languages that he does, based
| on the minima of the kernel density estimation of their
| geometric mean scores, to be a bit more informative than
| absolute ranking within those groups. That gives a sense of the
| general class of languages. There's one group for C, C++, and
| Rust; languages which allow for performance without compromise,
| at the expense of lack of safety, higher complexity or learning
| curve, or both.
|
| There's a next big group with a lot of languages; most of them
| have been around for a while, or been designed with an eye
| towards performance, but still have some amount of overhead due
| to GC or pointer chasing or greater thread synchronizaiton
| overhead or any number of other reasons; this group includes
| Fortran, Ada, C#, Java, Go, Haskell, etc.
|
| Then there are a few groups of fairly high-level, dynamic
| languages, designed for scripting or rapid development, and
| which require you to trade off a fairly significant amount of
| performance for this. Dart, PHP, Python, Erlang, Ruby.
|
| And finally, there's Matz's Ruby, all alone in a group at the
| end, slower than pretty much everything else. I'm not quite
| sure why it's separated out from Ruby, which seems to refer to
| yarv, but maybe it's so people who come here wondering what
| they can do about their slow Ruby programs can see that they
| can at least get a big boost by switching to yarv.
|
| Anyhow, this benchmark and this grouping helps if you're
| considering what to do about some performance bottleneck you
| have in some code base; or if you're starting a project for
| something which will potentially be performance critical.
| Moving between languages in one group isn't all that likely to
| make a substantial difference; but moving to a language in
| another group would. For example, it lets you know that there's
| a pretty good chance that just rewriting a Python program
| that's a performance bottleneck in Go would improve that
| performance; but rewriting a Go program in Java, or vice versa,
| is less likely to be a performance win.
| steveklabnik wrote:
| Don't know why you're downvoted, this is a great comment.
|
| By the way, YARV has been "ruby" since 2007; these benchmarks
| were run with ruby 3.0.0preview1. The name has probably just
| never been changed.
| zarkov99 wrote:
| Why is that? There are cases where c++ is faster than C, a
| typical example is qsort vs std::sort.
| skohan wrote:
| How does c++ manage to win here? I'm not doubting, just
| curious
| steveklabnik wrote:
| The classic example is quicksort. qsort takes a function
| pointer, but std::sort is a template. This means that the
| machinery can be inlined more often in C++ than C.
|
| https://stackoverflow.com/questions/18002087/qsort-vs-
| stdsor...
| zarkov99 wrote:
| In he case of std::sort the C++ compiler can inline the
| comparison function when it generates the actual sort
| implementation from the std::sort template. In C the qsort
| implementation is fixed and must call the comparison via an
| indirect reference.
| gpderetta wrote:
| Technically libc couldhave an inline version of qsort in
| the header or the linker (even the dynamic linker) could
| do LTO, but bechmarks are done on actual implementations
| not the mythical sufficiently good compiler.
| rurban wrote:
| Don't use a libc qsort then. The compiler could generated
| it eg. Or the CTL is a header only STL for C, which
| inlines the sort function and the comparator. And because
| of the C++ bloat and indirect calls the C version is
| faster.
|
| Inlining with macros == constexpr with templates.
|
| You can decide which syntax you prefer.
| tlb wrote:
| C++ can inline the comparison function (which is often just
| a few instructions) at compile time. It can also statically
| optimize for the stride, using shifts or addressing modes
| instead of multiplies. C can't do either with qsort, which
| takes the stride and comparison function pointer as
| arguments.
| _jordan wrote:
| The constexpr specifier should give C++ an advantage
| mhh__ wrote:
| Probably not, the compiler doesn't need permission to
| constant fold of its own accord if it doesn't launch any
| missiles.
| jandrewrogers wrote:
| C++ is faster than C now for most things because C++ enables
| many optimizations that are impractical in C.
| arcticbull wrote:
| Certainly constexpr haha.
| [deleted]
| FartyMcFarter wrote:
| Looking at the _reverse-complement_ code, it appears that the
| Rust and C implementations are using different algorithms:
|
| https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
|
| https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
|
| On a quick inspection:
|
| - The Rust code is about twice as long.
|
| - The Rust code has CPU feature detection and SSE intrinsics,
| while the C code is more idiomatic.
|
| - The lookup table is larger in the Rust code.
| fsloth wrote:
| What does "idiomatic" C even mean? It's a high level assembler
| and as such _should_ not limit the creativity of programmers
| using it.
|
| C code that pretends it does not need to care about it's
| platform is not idiomatic, it's just suboptimal.
| FartyMcFarter wrote:
| By "idiomatic C" I meant any of the following:
|
| - Code that most C books/courses would teach you how to write
|
| - Portable C code (arguably portability is one of C's biggest
| successes!)
|
| - Code that you'd expect to find in the K&R book
| pjmlp wrote:
| One of C's biggest marketing successes.
|
| All high level languages are portable, including some older
| than C.
|
| It helps when one drags the OS the language was born on,
| along via parallel standards like POSIX.
| gameswithgo wrote:
| cpu feature detection is way easier in rust. its just built
| into the core libs.
| bsder wrote:
| And?
|
| These are the kind of things that make working in one
| language different from another.
|
| I'd actually like to see the SSE version in C so I can
| compare the two implementations and how much grief you have
| to go through.
| colejohnson66 wrote:
| Doesn't gcc have a CPU feature branching system? You use it
| by attaching attributes to functions that say what
| instructions are required. Granted, it's not as "elegant" as
| Rust, but it does exist.
| vitus wrote:
| I recall an anecdote about how Haskell actually outperformed C
| on various tree benchmarks because it was using a better
| implementation. At some point, the C programmers got fed up
| with the airs of superiority from Haskell programmers, ported
| the Haskell implementation, and reclaimed their position.
|
| I wouldn't be surprised if there's something similar happening
| here.
| _wldu wrote:
| Yes, it seems everyone is always trying to beat C, but
| really, no one can.
| pjmlp wrote:
| Well, junior Assembly developers had it pretty easy on 80's
| hardware.
|
| Also C++ and Fortran don't have to spend much effort to
| achieve that.
| lawn wrote:
| To be fair, if there's a language that has a fair chance
| it's Rust.
| loxias wrote:
| It's exactly the same thing. I've been seeing this tossed
| around for years now, and one of these days it'll make me
| grumpy enough to fix the benchmark.
| Blikkentrekker wrote:
| The _Haskell_ code on some pathological examples of these
| implementations that I have seen has so many unsafe
| construct, strictness annotations, inlining annotations and
| so forth that it 's practically _C_ in a different syntax.
|
| It is not idiomatic _Haskell_ at all and loses all of the
| touted benefits.
|
| Is there a separate benchmark that only accepts idiomatic
| code?
| pjmlp wrote:
| Just like C is pimped up BCPL, a language designed to
| bootstrap the CPL compiler, 10 years younger than
| ESPOL/NEWP, 10 years younger than Jovial, 6 years younger
| than PL/I, among other possible examples from the 60's.
|
| Also C was known for not being a fast language on 80's home
| hardware, ruled by Assembly.
|
| So low level coding for OS isn't coding like C and is just
| the way for mechanical sympathy, regardless of the
| language.
| harporoeder wrote:
| There was a long period where a fairly unknown theorem
| proving language ATS (1) was beating C in many test cases on
| the benchmark game (2) the benchmarks were removed though
| (3). I expect many languages could be made to win with
| sufficient effort.
|
| 1. http://www.ats-lang.org/ 2. http://web.archive.org/web/201
| 21218042116/http://shootout.al... 3.
| https://stackoverflow.com/questions/26958969/why-was-the-
| ats...
| steinuil wrote:
| ATS is not really a theorem proving language, it's almost a
| superset of C with a very long list of type system and
| language features that lets you write anything from C code
| with no safety to very high level recursive functional code
| with lifetime tracking which will translate to efficient C
| code, if the transformations are proved to be correct. It's
| a weird beast, but I'm not surprised it outperformed some C
| implementations, because it _is_ basically C with a lot
| more features.
| spiffytech wrote:
| That kind of tit-for-tat in benchmarks seems like it's
| counter to the goal of benchmarks: what kind of performance
| could I expect to see using technology $FOO? Crucially, that
| question depends on how someone will realistically implement
| $FOO.
|
| I like PyPy as an example: on the surface, implementing a
| Python runtime in Python and expecting performance gains
| seems crazy. PyPy manages to outperform CPython because
| although a C implementation should theoretically be faster,
| realistically the increased expressiveness of Python lets the
| PyPy devs opt into optimizations the CPython devs find out of
| reach.
|
| I don't know C or Rust well enough to comment on these
| specific scenarios, but if two technologies _can_ be fast,
| and one makes that speed accessible while the other
| encourages implementors to leave performance on the table,
| that 's much more useful information to me than seeing a
| back-and-forth where folks compete to implement optimizations
| I'll never make in my own projects.
| vitus wrote:
| You raise a good point. I've always been fascinating by
| PyPy's performance, personally -- anecdotally, I've
| achieved ~10x speedups from just running a script with
| `pypy` instead of `python`. I always attributed that to
| better performance of the JIT, but I could be wrong.
|
| I have nothing against Rust personally, but it's ultimately
| not an apples-to-apples comparison if they're not
| implementing the same algorithm, or even if they're not
| using the same mechanisms (e.g. Rust explicitly using SSE
| intrinsics, which are certainly available to C in just as
| idiomatic a fashion).
| gameswithgo wrote:
| comparing apples to apples is pointless, they are both
| apples.
|
| and you can always compare "equivalent" algorithms as
| some languages may not be able to efficiently express the
| same algorithm as another. i know what you are after, but
| trying to have some benchmark that is "fair" according to
| some spec that is important to your needs will just be
| seen as pointless to others. benchmark game at least lets
| us see what the performance ceiling is of various
| languages, and anytime you object to a language having a
| worse implementation you may go fix it.
|
| i think what anyone with a clue understands is that C C++
| and Rust all have roughly equivalent performance
| ceilings. Nobody really thinks Rust is faster than C now.
| vitus wrote:
| > Nobody really thinks Rust is faster than C now.
|
| The submission title would beg to differ.
|
| (I know, it explicitly calls out "benchmarks" as the
| context.)
|
| I think languages like Rust or Swift have significant
| advantages around safety over C/C++, while not
| sacrificing much in terms of performance. But if one
| language's benchmark contributors are willing to put in
| more effort than another's to eke out additional
| performance, then you're going to see skewed results in
| favor of whichever has the more fervent evangelists or
| whichever language has more to prove.
|
| If the goal is to compare performance of two languages
| which can express the same optimization in exactly the
| same way, and only one uses it, then the benchmarks fail
| in that respect.
| gameswithgo wrote:
| go fix the c implementation then.
|
| no single person can maintain optimally implemented
| solutions over dozens of languages. its up to we the
| people to help out. you want perfectly equivalent
| comparisons, make it happen
| sli wrote:
| I admit I'm destroying the metaphor here but there are a
| lot of useful comparisons to be made between different
| cultivars of apple and those comparisons have
| implications on their applications.
| zeroimpl wrote:
| That's the whole point of the metaphor - comparing apples
| to apples is good, apples to oranges is bad.
| gameswithgo wrote:
| and its a bad metaphor. there are many reasonable
| comparisons to be made between any two things even when
| both are not fruits
| FartyMcFarter wrote:
| > I like PyPy as an example: on the surface, implementing a
| Python runtime in Python and expecting performance gains
| seems crazy.
|
| Not if it includes a JIT compiler, as PyPy does. I don't
| know much about PyPy, but if most of the time is spent in
| JITted machine code, the fact that it's written in Python
| may not affect performance much.
| pjmlp wrote:
| Kind of, PyPy is a metacircular JIT compiler, it uses
| RPython a Python subset.
| cygx wrote:
| _realistically the increased expressiveness of Python lets
| the PyPy devs opt into optimizations the CPython devs find
| out of reach_
|
| Not really: It's the compilation strategy (that whole meta-
| tracing JIT compiler thing, compared to a simple bytecode
| interpreter) that makes the difference, not the surface
| syntax of the implementation language - which is actually
| not 'real' Python, but a restricted, less dynamic subset
| known as RPython.
|
| Also note that the CPython is deliberately kept 'dumb'.
| mkl wrote:
| > That kind of tit-for-tat in benchmarks seems like it's
| counter to the goal of benchmarks
|
| Yep. There are good reasons why this set of benchmarks is
| called The Computer Language Benchmarks _Game_.
| moonchild wrote:
| > PyPy manages to outperform CPython because although a C
| implementation should theoretically be faster,
| realistically the increased expressiveness of Python lets
| the PyPy devs opt into optimizations the CPython devs find
| out of reach.
|
| Pypy outperforms cpython for the simple reason that pypy is
| a jit where cpython is a basic bytecode interpreter.
| Anything else is icing on the cake.
|
| The only reason python seems slow as an implementation
| language is because it was traditionally slow; the reason
| it was traditionally slow is that cpython, the only major
| implementation, is slow. Common lisp, for instance, is
| similarly dynamic (moreso, in fact), also is also generally
| natively compiled by itself, and is quite performant.
| FartyMcFarter wrote:
| > I wouldn't be surprised if there's something similar
| happening here.
|
| In this case it seems like benchmark code is allowed to use
| intrinsics, which can degenerate into a situation where a
| benchmark in language X is more "glorified x86 Assembly code"
| than actual code in language X.
|
| This is not very useful for comparing languages IMO.
| Especially since all of Rust, C, C++ can use this strategy
| and become almost identical in both code and performance.
| gameswithgo wrote:
| So three languages with identical performance ceilings end
| up with essentially identical performance results. sounds
| like a job well done.
|
| if you want to know something more than that, like how
| performance tends to end up after inputs of identical
| effort and talent you will need to recruit a lot of people
| and money to run really large scale experiments and when
| you are done people on HN will just endlessly find nits to
| pick with your results whenever they don't agree with their
| preconceived notions
| pornel wrote:
| Nothing stops someone from copying and submitting other
| implementation's algorithm. There are multiple implementations
| of each benchmark for every language:
|
| * https://benchmarksgame-
| team.pages.debian.net/benchmarksgame/...
|
| * https://benchmarksgame-
| team.pages.debian.net/benchmarksgame/...
|
| * https://benchmarksgame-
| team.pages.debian.net/benchmarksgame/...
|
| It's possible that someone has already submitted both
| algorithms for both languages, and different approaches won for
| language-specific reasons.
| FartyMcFarter wrote:
| These are all the C versions I can find:
|
| https://benchmarksgame-
| team.pages.debian.net/benchmarksgame/...
|
| https://benchmarksgame-
| team.pages.debian.net/benchmarksgame/...
|
| https://benchmarksgame-
| team.pages.debian.net/benchmarksgame/...
|
| https://benchmarksgame-
| team.pages.debian.net/benchmarksgame/...
|
| None of them have SSE intrinsics or are quite as long as the
| Rust version.
|
| I find it doubtful that SSE intrinsics wouldn't help the C
| version, if they are indeed helping the Rust version. This
| seems fairly easy to check as the Rust version has a non-SSE
| fallback code path - I'd do it myself but am not able to at
| the moment.
| mhh__ wrote:
| > The Rust code has CPU feature detection and SSE intrinsics,
| while the C code is more idiomatic.
|
| These benchmarks are almost always either useless or a scam -
| you either end up writing rewriting the same implementation n
| times or you don't utilize the capabilities of the language,
| either way you're not really measuring much of anything
| intrinsic to the language itself - Rust and C both have the
| same backends and if you really care about performance you're
| going to take it to the max anyway, so inference by the
| compiler isn't that important.
| qart wrote:
| I have seen this happening so often: C/C++/Rust often end up
| using CPU-specific features, and the code starts looking more
| and more like assembly code, and less like idiomatic high-level
| language code. Basically, comparisons of programs written in
| all the other languages against these three become meaningless.
| And in turn, hurts benchmarksgame as a resource for comparing
| languages.
|
| If I had to write a performant library at work, I too might
| rely on CPU-specific assembly wrappers in my code. But IMO,
| such code has no place in a general-purpose cross-language
| benchmark site.
| etaioinshrdlu wrote:
| I know it's a meme, but it really does seem like most C or C++
| code would be better off transitioning to Rust at some point.
| That includes the entire Linux kernel, web browsers, entire
| OS's...
| dilap wrote:
| My experience using ripgrep and fd is that this is also true in
| real-world programs. :-)
| lmilcin wrote:
| Which is not at all surprising. Rust has much larger compilation
| unit and knows more about what can read/write a particular piece
| of memory. This allows some occasions for optimization where C
| compiler must be conservative.
|
| An example of simpler version of this is Fortran that can be
| faster for numerical loads due to the fact that Fortran disallows
| aliasing of function arguments. C on the other hand, must pay the
| price of having to be conservative with how it treats arguments
| just in case they overlap.
| the8472 wrote:
| Wouldn't the C compiler be allowed to make similar assumptions
| with -flto or -fwhole-program?
| pjmlp wrote:
| Yeah, but again that isn't C rather a specific
| implementation.
| rurban wrote:
| Because Rust does alloca for all locals, and this if course
| faster. Everyone else avoids it for security reasons. Just search
| the Rust bugtracker for stack overflows.
| nynx wrote:
| The fastest n-body program is written in very idiomatic rust.
| https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
| FartyMcFarter wrote:
| n-body in C compiled by clang runs just as fast as Rust
| apparently:
|
| https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
| pjscott wrote:
| It's not entirely surprising that a carefully-optimized C
| program using explicit SSE intrinsics, plus a fancy trick
| involving a low-precision square root instruction fixed up
| with two iterations of Newton's method, would be fast. :-)
|
| What impresses me is that the Rust version didn't do any of
| that stuff, just wrote very boring, straightforward code --
| and got the same speed anyway. Some impressive compilation
| there!
| FartyMcFarter wrote:
| Good point! It would be interesting to find out where the
| Rust version gets most of its speed from.
| awestroke wrote:
| The code for the C-Clang version is terrifying, compared to
| the Rust version. Which one would you rather maintain?
| berkut wrote:
| Well, given the comments at the top:
|
| // Contributed by Mark C. Lewis.
|
| // Modified slightly by Chad Whipkey.
|
| // Converted from Java to C++ and added SSE support by
| Branimir Maksimovic.
|
| // Converted from C++ to C by Alexey Medvedchikov.
|
| // Modified by Jeremy Zerfas.
|
| It sounds like no-one bothered to actually write a from-
| scratch version of many of these things :)
| [deleted]
| 1vuio0pswjnm7 wrote:
| The reasons I use C versus comparable alternatives are not
| limited to speed. For example, the size of the compiler
| toolchain, the speed of compilation and the size of the resulting
| executables are all factors I have to consider. I do lots of work
| on systems with limited resources. How does Rust compare on those
| points versus, say, GCC.
|
| https://dev.to/aakatev/executable-size-rust-go-c-and-c-1bna
| steveklabnik wrote:
| I mean, you'll get drastically different numbers with all of
| those examples if you actually ask for options that produce
| small binaries. That the default flags don't try to make things
| small (across any of these toolchains) means this isn't really
| a fair comparison.
|
| The smallest "hello world" binary rustc has ever produced was
| 137 bytes. https://github.com/tormol/tiny-rust-executable
| fmntf wrote:
| If that's possible, why should not binaries be small (without
| too many downsides, eg execution speed) by default?
| steveklabnik wrote:
| Because everything is a tradeoff. Getting the binary to be
| smaller means taking more compile time, because compilers
| have to do work to reduce the size. It is much faster to
| produce larger binaries.
|
| Additionally, most compilers produce something that's
| useful for development and debugging by default, and that
| means including _extra_ stuff for that purpose.
| mlyle wrote:
| A big part of why binaries are "big" is dynamic linking and
| external symbols.
|
| That Rust hello world is just hand-crafted to never be
| linkable to anything else at runtime and invoke a system
| call with a buffer. This isn't really how we'd like to do
| most things.
| steveklabnik wrote:
| Interestingly enough, I would say that dynamic linking
| makes binaries smaller, that code no longer lives in the
| binary, but another place instead.
| mlyle wrote:
| Sure. But there's a lot of overhead that comes with it,
| too.
| djeiasbsbo wrote:
| I'm pretty sure that for the classic "Hello World"
| example that's not the case, because we can simply use
| the linux write system call.
|
| For example, an assembly program that just calls write
| and then exits is much smaller than even an unlinked
| version of it that uses printf, because the ELF binary
| doesn't need to contain information for the linker.
| steveklabnik wrote:
| Yes, on the very very very low end, this is the case, but
| generally, in more real programs, it's not.
|
| (The program I linked is an example of exactly what
| you're talking about)
| pjmlp wrote:
| Just like C compilers, it is a matter of use case, and
| compiling for size has tradeoffs regarding execution speed.
| api wrote:
| One of Rust's performance advantages is the compiler's ability to
| unambiguously determine memory aliasing. Aliasing is why many
| numeric kernels are written in Fortran, a much older language
| that also enforces strict aliasing as it simply doesn't allow
| overlapping references.
|
| There are probably others as well, but this is the advantage I'm
| familiar with. C's ambiguity makes it harder to achieve some
| optimizations that can really matter on modern CPUs.
| hawk_ wrote:
| Unfortunately that's only on paper due to LLVM issues.
| nynx wrote:
| LLVM's support for this is bugged, so rust does not currently
| take (edit: full) advantage of it.
| runevault wrote:
| I really hope this gets fixed at some point, just because I'd
| like to see how much faster rust can get. In particular I
| wonder how much it would impact the compiler itself since it
| is also written in rust.
| kzrdude wrote:
| Does not fully take advantage of it.
| steveklabnik wrote:
| Rust does not take _full_ advantage of it; that is, &T will
| still get noalias, it's &mut T that's currently disabled.
|
| The tracking bug is https://github.com/rust-
| lang/rust/issues/54878
| nynx wrote:
| Ah, my mistake. Thank you for the correction.
| steveklabnik wrote:
| It's all good; most people don't draw the distinction. I
| myself didn't know that &T was also noalias for years.
| lambda wrote:
| Here is an example which, while trivial, demonstrates some
| of the optimization issues you can run into in C and C++
| but not in Rust, even with noalias currently only applying
| to &T.
|
| C++: https://godbolt.org/z/dTPsbh Rust:
| https://rust.godbolt.org/z/Mdx87h
|
| In C and C++, compilers are allowed to infer "noalias"
| based on pointer types; two pointers of different type are
| not allowed to alias. This is known as type-based alias
| analysis. But char * is given special treatment, because of
| its use as a generic pointer type; so if one of your
| arguments is char * or even signed char * , that disables
| any optimizations which were relying on type-based alias
| analysis.
|
| This provides both for performance and undefined-behavior
| footguns. If you ever try to use some pointer type other
| than char * or signed char * to refer to data of another
| type, you may inadvertently cause type-based alias analysis
| to kick in, causing invalid optimizations and
| miscompilation. On the other hand, if you have a function
| which takes both a char * and another pointer, the compiler
| may not apply optimizations that it otherwise could because
| char * is allowed to alias anything.
|
| In Rust, there is no such undefined behavior footgun.
| Because of the LLVM bug, noalias isn't applied to &mut
| pointers so there are still some cases which could be
| better optimized, though it sounds like there is some
| progress being made on the LLVM front so it should be fixed
| at some point, and there are already places where the
| compiler can do better optimizations with better safety due
| to the stronger semantics of &T.
| jolux wrote:
| I think benchmarking C vs C++ vs Rust must only really be useful
| for researchers. They're all making a similar tradeoff for
| performance: forcing you to consider how you use memory. Does
| anyone work in a field where the performance difference between
| these specific three platforms matters? I'm genuinely curious.
| Edit: also, if you could explain briefly why and what makes
| particular choices out of the three unsuitable, that would be
| awesome too.
| jandrewrogers wrote:
| It matters for real-world software development, though the
| reason may not be intuitive. In theory, for any particular bit
| of software, you can write code in any of these three languages
| that has nearly identical performance. In practice, the
| complexity of _expressing_ equivalent performance can vary
| considerably depending on what you are trying to do.
|
| There are finite limits to the complexity cost developers are
| willing to pay for performance. Because the cost in each of
| these three languages is different to express some thing,
| sometimes there will be a threshold where in one or more of
| these languages most developers will choose a less optimal
| design. This manifests as practical performance differences in
| real software even though in theory they are equally expressive
| with enough effort.
|
| This comes with another tradeoff. Efficient expressiveness
| relative to software performance comes at a cost of language
| complexity. C is a simple language that has enough efficient
| expressiveness for simple software architectures. C++ is at the
| extreme opposite; you can do mind-boggling magic with its
| metaprogramming facilities that can express almost anything
| optimally but god help you if you are trying to _learn_ how to
| do this yourself. Rust sits in the middle; much more capable
| than C, not as expressive as C++.
|
| This suggests the appropriate language is partly a function of
| the investment a developer is willing to make in learning a
| language and what they need to do with it. Today, I use modern
| C++ even though it is a very (unnecessarily) complex language
| and write very complex software. Once you pay the steep price
| of learning C++ well, you acutely feel the limitations of what
| other languages _can 't_ express easily when it comes to high
| performance software design.
|
| I used to write a lot of C. It still has critical niches but
| not for high-performance code in 2021, giving up far too much
| expressiveness. C++17/20 is incredibly powerful but few
| developers really learn how to wield that power, though usage
| of it has been growing rapidly in industry as scale and
| efficiency have become more important. Rust is in many ways an
| heir apparent to C and/or Java, for different reasons. Or at
| least, that is how I loosely categorize them in my head, having
| a moderate amount of contact with all three. They all have use
| cases where you probably wouldn't want to use the others.
| steveklabnik wrote:
| > In practice, the complexity of expressing equivalent
| performance can vary considerably depending on what you are
| trying to do.
|
| Yep. Rust's safety means that in some cases, you can be
| _more_ aggressive because you know the compiler has your
| back. And in other cases, it makes harder things tractable.
| The example of Stylo is instructive; Mozilla tried multiple
| times to pull the architecture off in C++, but couldn 't
| manage to do it until Rust.
| pixel_fcker wrote:
| Graphics software.
| klysm wrote:
| Yeah, databases
| JimBlackwood wrote:
| Could you elaborate a little? I'd be interested in this
| answer
| jandrewrogers wrote:
| All three languages are different enough that depending on
| the language you use it fundamentally alters your database
| architecture to better fit the mechanics of the language.
| Databases have a very high levels of internal complexity
| naturally, so there is a strong incentive to align the
| design with what the language can express without adding
| substantially more complexity.
|
| I work on database engines and the impact of language
| choice on the design, architecture, and implementation of
| database engines is very evident. In the specific case of
| database engines, these differences in design can have a
| very large performance impact.
| londons_explore wrote:
| I do some graphics stuff. As soon as you get to "this chunk of
| code needs to be run for every pixel of every 4k frame at
| 60fps", suddenly the number of clock cycles and registers
| matters... Some of my platforms don't have GPU's, so it really
| is squeezing everything possible out of the language and
| compiler...
| djeiasbsbo wrote:
| I do audio stuff and it is the same there. DSP is easy until
| it has to be in real time and there can only be minimal
| latency...
| mrec wrote:
| I'm curious, what platforms support 4K output but don't have
| any kind of GPU?
| steveklabnik wrote:
| I think what's most interesting about your comment is the
| assumption that the three are in the same ballpark. You're not
| wrong, but it just reminds me of how far we've come. That is,
| the key is not "which of these three can eke out the last tiny
| ounce of things," but that Rust has successfully landed across
| that gap you see in the graph. That it's "(C/C++/Rust) vs
| everything else" is in of itself an interesting result. You can
| see some skepticism of the premise elsewhere in this thread,
| even.
| kzrdude wrote:
| It's mostly an exercise that's useful for Rust: Internally, to
| prove that "it works"* and externally, to make a credible name
| for Rust.
|
| (*) of course part of the output is also looking at the Rust
| code and evaluating style and `unsafe`-wise what the price for
| winning was.
| acje wrote:
| It struck me a while ago that the most powerful feature of rust
| is the strong contracts libraries can and must express. This
| allows people with much deeper knowledge than me to make awesome
| stuff I can depend on.
| seeekr wrote:
| From the page: "... a pretty solid study on the boredom of
| performance-oriented software engineers grouped by programming
| language." I find this both funny and consider it true to some
| degree. There's nothing like a good old friendly arms race for
| the benefit of all (languages and its users, in this case)
| involved!
| pmarin wrote:
| The only conclusion I have got about this web site is how much
| some programmers like to write benchmark code in Rust.
| indymike wrote:
| Let's talk about speed when we are implementing the same
| algorithm and optimizations, please. If $1 was donated to cure
| cancer every time a developer games a comparison like this, there
| would be no more cancer.
| harporoeder wrote:
| I was wondering if perhaps this was actually measuring a
| difference between LLVM and GCC, but they also provide a set of
| benchmarks of C Clang vs C GCC (1) and Clang is generally slower
| in those test. Although there is some correlation between the
| ones Clang wins in C And Rust.
|
| 1. https://benchmarksgame-
| team.pages.debian.net/benchmarksgame/...
| arcticbull wrote:
| Rust can be faster than C because in general C compilers have
| to assume that pointers to memory locations can overlap (unless
| you mark them __restrict). Rust forbids aliasing pointers. This
| opens up a whole world of optimizations in the Rust compiler.
| Broadly speaking this is why Rust can genuinely be faster than
| C. Same is true in FORTRAN, for what it's worth.
| alerighi wrote:
| Well you are saying that even in C you can use the restrict
| keyword to tell the compiler that 2 memory locations can
| overlap. Of course is in the hand of the programmer to tell
| the compiler to do so.
|
| I don't think there is a fair comparison between Rust and C:
| C is just an higher level assembler, if the programmer knows
| what he's doing he can use the hardware 100% of its
| potential. That is the reason why C is still used in all the
| embedded applications where you have ridiculous low power
| microcontrollers and you must squeeze out the best
| performance.
|
| That is the difference between C and Rust to me: for each
| fast Rust program you are guaranteed that you can write an
| equivalent performant program in C (or assembly). Worst case
| scenario you use inline assembly in C and you get that.
|
| Thus the contrary cannot be true for Rust: if I give you a
| heavily optimized C program not always you can produce an
| equivalent version in Rust.
|
| Also not always these optimizations are what you want. In C
| you can choose the level of optimizations, and most of the
| time, at least on the program that I write, I choose a low
| level of optimization. The reason is that a lot of time
| performance is not the only thing that matters, but it maybe
| matters most the stability of the code (and a code compiler
| with optimizations is more likely to contain bugs) or the
| ability to debug (and thus the readability of the assembly
| output of the compiler).
|
| Rust gives out an horrible assembly code, that is impossible
| to debug, or to check for correctness. You just have to hope
| that the compiler doesn't contains bugs. For the same reason
| Rust is the ideal language to write viruses, since it's
| difficult to reverse engineer.
| arcticbull wrote:
| I think you'd be hard pressed to find more than a handful
| of usual C programmers, even in embedded, who know what the
| __restrict keyword does, let alone are rigorous in its
| application.
| steveklabnik wrote:
| I've always wondered if the reason why Rust keeps running
| into llvm bugs around restrict is that it is used so
| sparingly, and the semantics being unclear enough, that
| these codepaths just aren't exercised as often.
| steveklabnik wrote:
| > Thus the contrary cannot be true for Rust: if I give you
| a heavily optimized C program not always you can produce an
| equivalent version in Rust.
|
| Do you have some evidence here? Or something more specific?
|
| (The rest of your comment is opinions, which you can of
| course have, but does not match my personal experience,
| FWIW.)
| saagarjha wrote:
| > For the same reason Rust is the ideal language to write
| viruses, since it's difficult to reverse engineer.
|
| Eh, not really.
| efaref wrote:
| There's a translator for C99 to (unsafe) Rust:
| https://github.com/immunant/c2rust
|
| So probably you _can_ give me any C program and I 'll be
| able to give you an equivalent Rust program. It'll probably
| perform about the same, too.
| pjscott wrote:
| > Well you are saying that even in C you can use the
| restrict keyword to tell the compiler that 2 memory
| locations can overlap. Of course is in the hand of the
| programmer to tell the compiler to do so.
|
| It says that they _can 't_ overlap, but yes, you can get
| the compiler to optimize based on this if you provide the
| aliasing information and remember to keep it accurate. You
| probably won't do that, though, for anything except the
| most performance-critical of inner loops. A compiler that
| can infer more about aliasing can provide more
| optimization, safely, in the >99% of code that doesn't have
| explicit aliasing annotations, and that's probably worth
| some decent speedups in practice.
|
| There are two main things you might be talking about when
| you call a programming language fast:
|
| 1. Given some really performance-critical code and enough
| time to hand-optimize it, how close can you get the
| compiler's output to the optimal machine code?
|
| 2. If you write code normally, not making any effort to
| micro-optimize, how fast will it usually be?
|
| Both of these matter. #1 matters when you've got some
| bottlenecks that you really care about, and #2 matters when
| you've got a flatter profile -- and both situations are
| common.
|
| Another illustrative example of the #2 type of performance
| with Rust is the default collections compared to the ones
| in C++. Rust's default HashMap is faster than C++'s
| std::unordered_map because of some ill-advised API
| constraints in the C++ STL. You _can_ get similar
| performance by using a custom C++ hash table
| implementation, and in fact Rust 's HashMap is a port of a
| C++ hash table that Google wrote for that purpose, but most
| people probably won't bother.
|
| So, a semantic question: if you _can_ get the same speed in
| one language as you can in another, but in practice usually
| don 't, is one language faster than the other?
| hashingroll wrote:
| > but most people probably won't bother.
|
| Most people who don't bother with that probably also
| don't bother with the performance of their hashmap.
| pjmlp wrote:
| People think C is an high level Assembler, that was only
| true on PDP-11, 8 and 16 bit CPUs.
|
| Also in C you cannot retrofit restrict in existing
| programs, specially they depend on existing binary
| libraries.
| walki wrote:
| > C compilers have to assume that pointers to memory
| locations can overlap, unless you mark them __restrict...
|
| What I don't fully understand is: "GCC has the option
| -fstrict-aliasing which enables aliasing optimizations
| globally and expects you to ensure that nothing gets
| illegally aliased. This optimization is enabled for -O2 and
| -O3 I believe." (source: https://stackoverflow.com/a/7298596)
|
| Doesn't this mean that C++ programs compiled in release mode
| behave as if all pointers are marked with __restrict?
| jart wrote:
| Not for char. Compiler always assumes non-restrict for char
| pointers and arrays, which is important to remember if
| you're ever operating on a RGB or YCbCr matrix or
| something.
| bsder wrote:
| Huh.
|
| Does that also hold for a "uint8_t"--which is often just
| a renamed unsigned char rather than being a genuine type
| of its own?
| jart wrote:
| Yes
| Hello71 wrote:
| not according to the standard, but in practice yes,
| currently. there are arguments that it should be
| considered not for optimization reasons, but there is
| likely too much existing code relying on it to change
| behavior. (see related llvm, gcc bugs)
| murderfs wrote:
| restrict and strict aliasing have to do with the same
| general concept, but aren't the same. They both have to do
| with allowing the compiler to optimize around assuming that
| writes to one pointer won't be visible while reading from
| another. As a concrete example, can the following branches
| be merged? void foo(/*restrict*/ bool* x,
| int* y) { if (*x) { printf("foo\n");
| *y = 0; } if (*x) {
| printf("bar\n"); } }
|
| Enabling strict aliasing is effectively an assertion that
| pointers of incompatible types will never point to the same
| data, so a write to y will never touch *x. restrict is an
| assertion to the compiler on that specific pointer that no
| other pointer aliases to it.
| walki wrote:
| OK thanks, indeed Clang is able to generate better
| assembly using __restrict__. And -O3 generates the same
| assembly as -O3 -fstrict-aliasing (which is not as good
| as __restrict__).
|
| I wish there was a C/C++ compiler flag for treating all
| pointers as __restrict__. However I guess that C/C++
| standard libraries wouldn't work with this compiler
| option (and therefore this compiler option wouldn't be
| useful in practice).
| shepmaster wrote:
| Why does the Rust compiler not optimize code assuming that
| two mutable references cannot alias?
| --https://stackoverflow.com/q/57259126/155423
| arcticbull wrote:
| Follow along here for re-enabling: https://github.com/rust-
| lang/rust/issues/54878
|
| For what it's worth I did say "can be" not "is" because I
| wasn't sure of the current state of this feature. I was
| just passing on the theory.
|
| There are certainly other potential reasons, for instance
| constant expressions and generics. And, of course, the
| prohibited undefined behavior makes other optimizations
| possible, and potentially better.
| mh7 wrote:
| Some of the rust versions calls C libraries for its heavy lifting
| (gmp, pcre) so I wouldn't take this too seriously.
| Matthias247 wrote:
| Apart from those benchmark games a lot of real world C is a lot
| less performant than people think it might be. I spent a fair
| amount of time reviewing C code in the last 5 years - and things
| that pop up in nearly every review are costly string operations.
| Linear counts due to the use of null terminated strings and extra
| allocations for substrings to attach null terminators, or just
| deep copies because ownership can't be determined are far more
| common than exceptional. This happens because null terminated
| strings feel like idiomatic C to most people.
|
| Rust avoids those from the start by making slices idiomatic.
|
| Another thing I commonly see is the usage of suboptimal
| containers (like arrays with linear search) - just because it's
| there's no better alternative at hand (standard library doesn't
| offer ones and dependency management is messy). Which also makes
| it less surprising that code in higher level languages might
| perform better.
| frutiger wrote:
| Mostly agree with your comment but linear search through arrays
| of size less than a few hundred will typically beat more
| sophisticated structures such as red-black trees or hashtables.
| This is due to prefetching and avoidance of unpredictable
| pointer traversals. Asymptotic complexity is only that:
| asymptotic.
|
| In many programs in many domains the sizes of these data
| structures will rarely exceed this limit.
| vbezhenar wrote:
| It might be fast, but it'll load CPU caches with that data
| and it'll evict another useful data. Which means that while
| this particular code will be fast or at least not very slow,
| some other code will be slow because its data have to be
| fetched again.
|
| I have no idea whether that matters or even easy to
| measure...
| forrestthewoods wrote:
| You're stretching there imho.
|
| L1 and L2 are per core. A cacheline is 64 bytes and per-
| core cache size is on the order of 32kb and 256kb for
| L1/L2.
|
| Reading data from L1 is on the order of 1 nanosecond (or
| less) and RAM on the order of 50 nanoseconds.
|
| If you're scanning an array and load a dozen cachelines
| that's almost certainly preferable to several cache-misses
| (and lines).
|
| Memory access is very often an application's bottleneck.
| The answer is almost always more arrays and fewer pointers.
| michaelmior wrote:
| > null terminated strings feel like idiomatic C to most people
|
| Doesn't that mean that null terminated strings _is_ idiomatic
| C? That is, my understanding of the term idiomatic is that it
| is defined by whatever is most natural to users of a language
| regardless of whether it is the most performant.
| rcoveson wrote:
| Null terminated strings are often called cstrings. They're
| beyond idiomatic; they're part of the C standard library.
| akkartik wrote:
| Also Unix. Syscalls return null-terminated strings all over
| the place. So any language running on Unix has to deal with
| null-terminated strings.
|
| I know because I run a userland that uses length-prefixed
| strings as far as possible: https://github.com/akkartik/mu
| michaelmior wrote:
| True. I was skipping over that, but the fact that functions
| manipulating null terminated strings are part of the
| standard library is certainly a reason in itself to
| consider them idiomatic.
| akkartik wrote:
| They're agreeing with you.
| martincmartin wrote:
| Also lack of generics can make it slow, e.g. qsort() requires a
| function call for each comparison. So C++'s std::sort() can be
| significantly faster on an array of integers.
| bluecalm wrote:
| I benchmarked it several times in the past and couldn't
| replicate std::sort being faster (GCC with high optimization
| settings). Anyway both are slow. If you need fast sort you
| need an implementation without any function calls (no
| recursive calls) and both the pivot choice and the chunk size
| at which insert sort kicks in optimized to your data and
| hardware. My experience is that you can beat built in sort by
| 2x to 3x.
| lumost wrote:
| How would you perform this optimization? If it's the same
| data getting sorted, why not put it in an ordered data
| structure?
| mhh__ wrote:
| Andrei Alexanrescu has a talk on doing this - he calls
| them metaparameters e.g. where a hybrid sort chooses to
| change algorithm.
|
| One library I have exploits the fact that D templates are
| embarrassingly better than C++'s, so you can actually
| benchmark a template against it's parameters in a clean
| manner without overhead - that could be anything from a
| size_t parameter for a sort or a datastructure for
| example. enum cpuidRange = iota(1,
| 10).map!(ctfeRepeater).array;
| @TemplateBenchmark!(0, cpuidRange)
| @FunctionBenchmark!("Measure", iota(1, 10), (_) => [1, 2,
| 3, 4])(meas) static int sum(string
| asmLine)(inout int[] input) {
| int tmp; foreach (i; input)
| { tmp += i;
| mixin("asm { ", asmLine, ";}"); }
| return tmp; }
|
| This made-up (pointless) benchmark measures how insert a
| number of cpuid instructions into the loop of a summing
| function affects it's runtime. My library writes the code
| from your specification as above to generate the
| instantiations and loop to measure the performance. As
| you might guess, the answer is a lot (CPUID is slow and
| serializing).
| mh7 wrote:
| It's more to do with the fact that std::sort's definition is
| visible to the compiler and qsort() is not. Put qsort() code
| in stdlib.h, make it static and write a static intcmp() and
| you'll see the compiler inline that no problem.
| jjgreen wrote:
| I've done this a few times using
| http://www.corpit.ru/mjt/qsort.html
| ryanianian wrote:
| Sure you can hard-code intcmp into qsort but then it would
| only work for arrays of ints.
|
| You could do some macro magic instead of templates e.g.
| `DEFINE_QSORT(int, intcmp)` which could stamp out
| `qsort_int` but that's not a part of the stdlib.
|
| C++ arguably gets this right since sort<int> and
| sort<string> will be separate functions, although templates
| are of course a footgun. And of course duping the logic for
| std::sort<T> for a bunch of different T impls increases the
| binary size.
| TheNewAndy wrote:
| The poster you are replying to didn't suggest hardcoding
| intcmp into qsort - just making it so the implmentation
| of qsort is available to the compiler when the comparison
| function is known (i.e. just like with C++).
|
| When this is done, the compiler can inline qsort, and
| replace the indirect function call with an inlined
| version of intcmp, and then things are equivalent.
| marvy wrote:
| I think mh7 did not mean to hard-code intcmp into qsort.
| The idea is to move the definition of qsort directly into
| the stdlib.h header file. That way, the compiler can see
| the definition of qsort and intcmp at the same time.
|
| In that case, the compiler could make a specialized qsort
| using intcmp automatically.
| [deleted]
| [deleted]
| Svetlitski wrote:
| Once LLVM fixes some bugs with `noalias`, at which point Rust
| will begin using it again in more circumstances [1], I'd expect
| to see Rust get even faster in these benchmarks, given that the
| Rust compiler knows _much_ more about which pointers do /do-not
| alias than most other programming languages [2] and the myriad
| optimizations this knowledge allows.
|
| [1] https://github.com/rust-
| lang/rust/issues/54878#issuecomment-...
|
| [2] https://doc.rust-lang.org/nomicon/aliasing.html
| stabbles wrote:
| I doubt there's any performance to be gained that way, but if
| so, the C implementation can just use `restrict` to the same
| effect.
| [deleted]
| roca wrote:
| "Just use 'restrict'" isn't as easy as it sounds. You need to
| only use 'restrict' where you are 100% sure pointers can't
| alias. Wherever you get this wrong you have introduced a
| subtle bug that a) only shows up in optimized code b) may or
| may not show up at all depending on compiler version and
| target architecture and c) only show up when two pointers
| actually alias at runtime, which may be rare, and may be
| intermittent ... in other words these bugs will be _hell_ to
| debug. So in fact in a large codebase it will be a lot of
| work to figure out where you can safely put 'restrict' and
| you will probably introduce some very nasty bugs. Most people
| aren't going to be willing to do this.
| pornel wrote:
| Rust's problem with aliasing in LLVM is caused exactly by
| limited usefulness of C's restrict.
|
| LLVM implements only coarse per-function aliasing information
| needed by C, and doesn't properly preserve fine-grained
| aliasing information that Rust can provide.
| zucker42 wrote:
| I thought the problem was there is simply a bug in the
| implementation because restrict isn't used frequently
| enough to have exposed the bug earlier.
| jcranmer wrote:
| Note that Rust's issue was that LLVM is buggy with restrict
| (because it isn't exercised frequently).
|
| That said, there is a series of patches in-flight to get
| actual working full restrict support:
| https://reviews.llvm.org/D69542
| vvanders wrote:
| Have you ever used restrict in anger?
|
| I've done it when we really needed that performance for an
| inner loop(particle system). It can be a real bastard to keep
| the non-alias constraint held constant in a large, multi-
| person codebase and the error cases are really gnarly to
| chase down.
|
| Compare that to Rust which has this knowledge built in since
| it naturally falls out of the ownership model.
| Gibbon1 wrote:
| I agree with that, no aliasing in C is an ugly kludge that
| bitches about perfectly fine code that benefit benefit from
| it. And it's hard insure that it actually works in code
| that does. Worse the failure is completely silent.
| jart wrote:
| Is there something special about the noalias keyword that
| it really brings out the most unprofessional pig language
| in developers? https://www.lysator.liu.se/c/dmr-on-
| noalias.html
| Blikkentrekker wrote:
| Isn't every pointed `restrict` in _Fortran_ which was often
| cited as why it still outperformed _C_ in many cases for a
| very long time?
| cma wrote:
| When people use the unsafe keyword are they always taking
| into account aliasing?
|
| At least you can audit only those places though.
| zamalek wrote:
| Yeah, `unsafe` is the developer signing a contract to
| uphold all the guarantees that safe rustc provided (at
| least from the perspective of the public interface, the
| invariants can be broken in private code).
|
| That's why Rust developers don't take kindly to
| unnecessary unsafe usage, because the audit surface area
| (and interaction complexity) increases.
| slaymaker1907 wrote:
| Rust takes const with pointers very seriously. It is
| undefined behavior to mutate anything non-mut (pointer or
| reference) unless it is through an UnsafeCell. While you
| can break compiler guarantees through pointers in Rust,
| dereferencing this pointers must still obey the above
| guarantee (basically think of it as you need to go back
| to references to actually use the aliased pointers which
| would be undefined due to the aliasing).
|
| Therefore, the optimizer can always assume mutations
| don't alias. Even UnsafeCell isn't allowed to break the
| alias rules, it just provides flexibility going from
| immutable to mutable.
| nindalf wrote:
| How often does benchmark code have a function that takes two
| pointers that could potentially alias each other? If it's as
| rare as I think it is, it might not have that much of an impact
| on Rust's position in the benchmarks game.
|
| Still, real world performance will probably benefit from this
| fix so it's a positive change regardless.
| jerf wrote:
| I've been fairly convinced for a while that once Rust matures
| (which is probably fairly close to "now", but I've held this
| opinion for years) that it's going to have a performance
| advantage in real code that's going to be hard to capture in
| benchmarks, because it's easy in a small benchmark to be very
| careful and ensure that you don't have aliasing, avoid extra
| copies, etc.
|
| Where I expect Rust to really shine performance-wise is at
| the larger scale of real code, where it affords code that
| copies less often because the programmer isn't sure in this
| particular function whether or not they own this so they just
| have to take a copy, or the compiler can't work out aliasing,
| etc. Ensuring at _scale_ that you don 't take extra copies,
| or have an aliasing problem, or that you don't have to take
| copies of things just to "be sure" in multithreading
| situations, is hard, and drains a lot of performance.
| slaymaker1907 wrote:
| One big advantage when writing average code is that while
| Rust emphasizes generics and makes them easy to write, C++
| makes it so difficult that you try and avoid it at all
| costs. Big, complex projects are going to probably use
| polymorphism all over the place in my experience which is
| not really captured by benchmarks.
|
| However, one advantage for C and C++ for such projects is
| that they make it far more pleasant to do bulk allocations.
| With C, you just need to group your mallocs and cast the
| memory and C++ offers placement new where you pass in the
| memory explicitly. With Rust, you need to reach into unsafe
| Rust (rightfully so), but unsafe Rust is very unpleasant to
| write.
| throwaway894345 wrote:
| I agree with this. Benchmark code differs from real code in
| in that it approximates the performance ceiling for a
| language implementation; it's not "ordinary code" or even
| "somewhat optimized" but usually the most optimal code one
| can conceive of with little respect paid to competing
| concerns, like maintainability. Rust aspires to make
| idiomatic, maintainable code almost as performant as
| benchmark code by way of zero-cost abstractions; probably
| more so than any other language, including C and C++, and
| all the while keeping the ceiling high relative to other
| languages.
|
| Unfortunately, I'm still of the opinion that "idiomatic
| Rust" is quite a lot harder to write than idiomatic Go or
| C# or etc, and many applications absolutely index on
| developer velocity and performance and quality are "nice-
| to-haves". Many companies are making money hand over fist
| with Python and JavaScript, and Go and C# are quite a lot
| more performant and typesafe than those languages already;
| Rust is better still in these areas, but returns diminish.
| If C# and Go preclude 95% of the errors found in Python and
| JS, it's not worth trading tens or hundreds of percents of
| developer velocity for that extra 4% or so improvement in
| quality (a Go developer could recoup a bit more of that 4%
| by writing tests in the time they save over Rust).
|
| Of course, this is all subjective, and I've met smart
| people who argue with a straight face that Rust is better
| for developer velocity than languages like Go, so YMMV.
| roca wrote:
| I agree that the aliasing control and other features of
| Rust enable all kinds of interesting optimizations that
| currently aren't being done, and 'restrict' is only part of
| it. For example we also know that the data behind a shared
| reference is really immutable (apart from UnsafeCell etc).
|
| Unfortunately we may need an entirely new generation of
| compiler infrastructure to exploit these opportunities
| because LLVM is really a C/C++ compiler at heart and may
| not be happy about taking big changes that can't benefit
| C/C++.
| kolbe wrote:
| The problem is that it's "rare" and not "impossible."
| Compilers have to be logically sound, not probabilistic when
| it comes to defined behaviors.
| loeg wrote:
| Benchmark hackers can just sprinkle `restrict` all over the
| place, too. Real world C code doesn't get restrict by default
| and often isn't fully annotated, so the benchmarks may
| artificially hide the real-world difference.
| notorandit wrote:
| I am not sure I can buy such a comparison. Someone smarter than
| me already argued about test implementations. Someone else also
| put compilers and interpreters into prospective. Of course
| language expressiveness can gauge in but, IMHO, comparing the
| same sort algorithm or the same hash table implementation (or
| n-queens algo) could make much more sense especially with
| comparable compilers.
|
| If Rust implementation is father than C's, kudos goes to the
| compiler, not to the language
| ma2rten wrote:
| I'm having a hard time making sense of this page. Why is this
| comparing fastest implementation with slowest implementation? Why
| is the metric busy time/least busy? Why is C++ so much better
| than C?
| jeffbee wrote:
| Speaking generally and about no specific program, you should
| expect C++ to be faster than C. C++ has more ways for the
| programmer to communicate with the compiler.
| vitus wrote:
| At the same time, a lot of those mechanics involve additional
| overhead (e.g. vtable lookups for dynamic dispatch per
| inheritance).
|
| But yes, some of these do provide hints for the compiler,
| e.g. constexpr, ownership semantics per unique_ptr. There's
| nothing stopping a human from writing equivalent C, so my
| suspicion is that the performance gap is primarily due to the
| benchmark implementation.
| jandrewrogers wrote:
| While nothing prevents you from writing C that will
| generate the equivalent code, it requires several times
| more lines of C than the equivalent C++. At which point, it
| becomes an economics discussion. C is usually a choice when
| pure performance is secondary to other considerations like
| portability.
|
| Most C++ code I see has few vtables, and what vtables exist
| are mostly removed by the compiler. Java-style inheritance
| hierarchies are not idiomatic in C++. The code gen for C++
| looks a lot like hyper-optimized C code, but without having
| to write hyper-optimized code. C++ naturally provides much
| more information about intent to the compiler than C does,
| and modern compilers are excellent at using that
| information to provide nearly optimal code gen.
| morelisp wrote:
| > Most C++ code I see has few vtables, and what vtables
| exist are mostly removed by the compiler. Java-style
| inheritance hierarchies are not idiomatic in C++.
|
| FWIW I feel like this is a fairly recent (good!)
| development. Up through TR/TR1 I think at least half the
| C++ code I saw was pretty heavy on the "Java envy" and
| not at all concerned about the cost of dynamic dispatch
| or memory allocation. Avoiding vtables was derided as
| part of "C with classes" - today that usually refers more
| to templates and exceptions but STL implementations were
| not mature enough and too much code not exception-safe
| for those to be universally viable back then. Something I
| heard more than once was that if you had a destructor it
| should _always_ be virtual just in case someone wanted to
| subclass it later.
| pjmlp wrote:
| Quite natural given that Java was created to be
| attractive to 90's C++ developers, which carried their
| development practices into Java.
| pjmlp wrote:
| Just a small, correction, you mean 90's style C++ GUI
| frameworks inheritance are no longer idiomatic in modern
| C++.
|
| Naturally we have to ignore that wxWidgets, Gtkmm, MFC,
| ATL, Qt, COM, DirectX, IO Kit, DriverKit, Skia are still
| around.
| jeffbee wrote:
| You've got it backwards. For the same amount of
| polymorphism in the design of a program C++ is likely to be
| faster because a C++ compiler can often devirtualize calls
| but a C compiler faced with a home-grown vtable (the struct
| of function pointers that every large C program eventually
| uses) will never be able to do so.
| vitus wrote:
| Why bother to write the vtable in the first place?
|
| My experience is that C++ that's written for performance
| will often prefer the use of templates over inheritance
| since the cost is then paid upfront by the compiler.
| What's stopping a C programmer from hand-coding template
| instantiations (via macro or otherwise)?
| jeffbee wrote:
| Sure you are welcome to reimplement C++ in C macros. When
| your time is worthless anything is possible.
| albertzeyer wrote:
| Interestingly, C++ seems to be the fastest overall.
| agumonkey wrote:
| After the rust blossom storm I didn't track cpp implementation
| evolutions.. did cpp compiler perf/libs increased or was is
| simply faster and still is the same ?
| kowlo wrote:
| Searched the page for "Rust" which returned nothing... and the
| box labels are awkward. Why not label them conventionally?
| gus_massa wrote:
| You can see the data in https://benchmarksgame-
| team.pages.debian.net/benchmarksgame/... but it has no graphic
| comparison.
| topspin wrote:
| Is it just me or has that 'benchmarks game' site been growing
| less navigable over time? It used to be easy to use for comparing
| benchmarks across several languages. If that capability still
| exists somewhere it's buried and I'm not interested in puzzling
| it out. There are no side bars or menus or anything helpful.
| kzrdude wrote:
| Rust seems to be using parallelism better. In one benchmark
| though (fasta), C gcc is using all 4 cpus and Rust only two, and
| still wins.
|
| (Looking at just C gcc vs Rust) https://benchmarksgame-
| team.pages.debian.net/benchmarksgame/...
| FartyMcFarter wrote:
| The C version uses OpenMP, while the Rust version doesn't.
|
| I tried running the C code with 2 and 4 threads, wall-time and
| CPU time don't change much in either case which is strange
| (this is cygwin with gcc 10.2.0):
|
| 2 threads: $ /usr/bin/gcc -pipe -Wall -O3
| -fomit-frame-pointer -march=ivybridge -fopenmp fasta.c -o
| fasta.gcc-2.gcc_run && time ./fasta.gcc-2.gcc_run 25000000 |
| md5sum fd55b9e8011c781131046b6dd87511e1 *-
| real 0m0.724s user 0m1.468s sys 0m0.108s
|
| 4 threads: $ /usr/bin/gcc -pipe -Wall -O3
| -fomit-frame-pointer -march=ivybridge -fopenmp fasta.c -o
| fasta.gcc-2.gcc_run && time ./fasta.gcc-2.gcc_run 25000000 |
| md5sum fd55b9e8011c781131046b6dd87511e1 *-
| real 0m0.670s user 0m1.514s sys 0m0.046s
___________________________________________________________________
(page generated 2021-01-03 23:00 UTC)