[HN Gopher] Speeding up Ruby by rewriting C in Ruby
       ___________________________________________________________________
        
       Speeding up Ruby by rewriting C in Ruby
        
       Author : todsacerdoti
       Score  : 184 points
       Date   : 2024-12-04 12:31 UTC (10 hours ago)
        
 (HTM) web link (jpcamara.com)
 (TXT) w3m dump (jpcamara.com)
        
       | ksec wrote:
       | >This got me thinking that it would be interesting to see a kind
       | of "YJIT standard library" emerge, where core ruby functionality
       | run in C could be swapped out for Ruby implementations for use by
       | people using YJIT.
       | 
       | This actually makes me feel sad because it reminded me of Chris
       | Seaton. The idea isn't new and Chris has been promoting it during
       | his time working on TruffleRuby. I think the idea goes back even
       | further to Rubinius.
       | 
       | It is also nice to see TruffleRuby being very fast and YJIT still
       | has lots of headroom to grow. I remember one obstacle with it
       | running rails was memory usage. I wonder if that is still the
       | case.
        
         | Asmod4n wrote:
         | One of the amazing things truffle ruby does is handle c
         | extensions like ruby code, meaning C is interpreted and not
         | compiled in a traditional sense.
         | 
         | This makes way for jitting c code to make it way faster than
         | the author has written it.
        
           | pantulis wrote:
           | Amazing indeed!
        
         | 0x457 wrote:
         | Yup, Rubinius was probably the most widely known implementation
         | of Ruby's standard library in Ruby. Too bad it was slower than
         | MRI.
        
         | e12e wrote:
         | I thought maybe mruby had a mostly ruby stdlib - but I guess
         | it's c ported over from mri?
        
       | Imustaskforhelp wrote:
       | super interesting , actually I am also a contributer of the
       | https://github.com/bddicken/languages and after I had tried to
       | create a lua approach , I started to think of truffleruby as it
       | was mentioned somewhere but unfortunately when I had run the code
       | of main.rb , there was virtually no significant difference b/w
       | truffleruby and main.rb (sometimes normal ruby was faster than
       | truffleruby)
       | 
       | I am not sure if the benchmark that you had provided showing the
       | speed of truffleruby were made after the changes that you have
       | made.
       | 
       | I would really appreciate it if I could verify the benchmark
       | 
       | and maybe try to add it to the main
       | https://github.com/bddicken/languages as a commit as well ,
       | because the truffleruby implementation actually is faster than
       | the node js and goes close to bun or even golang for that matter
       | which is nuts.
       | 
       | This was a fun post to skim through , definitely bookmarking it.
        
       | jeremy_k wrote:
       | Super interesting. I didn't know that YJIT was written in Rust.
        
       | jerf wrote:
       | "In most ways, these types of benchmarks are meaningless. Python
       | was the slowest language in the benchmark, and yet at the same
       | time it's the most used language on Github as of October 2024."
       | 
       | First, this indicates some sort of deep confusion about the
       | purpose of benchmarks in the first place. Benchmarks are
       | performance tests, not popularity tests. And I don't think I'm
       | just jumping on a bit of bad wording, because I see this idea in
       | its various forms a lot poking out in a lot of conversations.
       | Python is popular because there are many aspects to it, among
       | which is the fact that yes, it really is a rather slow language,
       | but the positives _outweigh_ it for many purposes. They don 't
       | _cancel_ it. Python 's other positive aspects do not speed it up;
       | indeed, they're actually critically tied to why it is slow in the
       | first place. If they were not, Python would not be slow. It has
       | had a lot of work done on it over the years, after all.
       | 
       | Secondly, I think people sort of chant "microbenchmarks are
       | _useless_ ", but they aren't _useless_. I find that
       | microbenchmark actually represents some fairly realistic
       | representation of the relative performance of those various
       | languages. What they are not is totally determinative. You can 't
       | divide one language's microbenchmark on this test by another to
       | get a "Python is 160x slower than C". This is, in fact, not an
       | accurate assessment; if you want a single unified number, 40-50
       | is much closer. But "useless" is _way_ too strong. No language is
       | so wonderful on all other dimensions that it can have something
       | as basic as a function call be dozens of times slower than some
       | other language and yet keep up with that other language in
       | general. (Assuming both languages have had production-quality
       | optimizations applied to them and one of them isn 't some very
       | very young language.) It is a real fact about these languages, it
       | is not a huge outlier, and it is a problem I've encountered in
       | real codebases before when I needed to literally optimize out
       | function calls in a dynamic scripting language to speed up
       | certain code to acceptable levels, because function calls in
       | dynamic scripting languages really are expensive in a way that
       | really can matter. It shouldn't be overestimated and used to
       | derive silly "x times faster/slower" values, but at the same
       | time, if you're dismissing these sorts of things, you're throwing
       | away real data. There are no languages that are just as fast as
       | C, except gee golly they just happen to have this one thing where
       | function calls are 1000 times slower for no reason even though
       | everything else is C-speed. These performance differences are
       | reasonably correlated.
        
         | chikere232 wrote:
         | very true.
         | 
         | Also, for a lot of the areas where languages like python or
         | ruby aren't great choices because of performance, they would
         | _also_ not be great choices because of the cost of maintaining
         | untyped code, or in python 's case the cost of maintaining code
         | in a language that keeps making breaking changes in minor
         | versions.
         | 
         | Script with scripting languages, build other things in other
         | languages
        
         | mlyle wrote:
         | > First, this indicates some sort of deep confusion about the
         | purpose of benchmarks in the first place. Benchmarks are
         | performance tests, not popularity tests.
         | 
         | I don't think it indicates a deep confusion. I think it leaves
         | a simple point unsaid because it's so strongly implied (related
         | to what you say):
         | 
         | Python may be very low in benchmarks, but clearly it has
         | _acceptable performance_ for a very large subset of
         | applications. As a result, a whole lot of us can ignore the
         | benchmarks.
         | 
         | Even in domains where one would have shuddered at this before.
         | My students are launching a satellite into low earth orbit that
         | has its primary flight computer running python. Yes, sometimes
         | this does waste a few hundred milliseconds and it wastes
         | several milliwatts on average. But even in the constrained
         | environment of a tiny microcontroller in low earth orbit,
         | language performance doesn't really matter to us.
         | 
         | We wouldn't pay any kind of cost (financial or giving up any
         | features) to make it 10x better.
        
           | ModernMech wrote:
           | "My students" - so there's really nothing on the line except
           | a grade then, yeah? That's why you wouldn't pay any cost to
           | make it 10x better, because there's no catastrophic
           | consequence if it fails. But sometimes wasting a few
           | milliwatts on average is the difference between success and
           | failure.
           | 
           | I've built an autonomous drone using Matlab. It worked but it
           | was a research project, so when it came down to making the
           | thing real and putting our reputation on the line, we
           | couldn't keep going down that route -- we couldn't afford the
           | interpreter overhead, the GC pauses, and all the other
           | nonsense. That aircraft was designed to be as efficient as
           | possible, so we could literally measure the inefficiency from
           | the choice of language in terms of how much it cost in extra
           | battery weight and therefore decreased range.
           | 
           | If you can afford that, great, you have the freedom to run
           | your satellite in whatever language. If not, then yeah you're
           | going to choose a different language if it means extra
           | performance, more runtime, greater range, etc.
        
             | mlyle wrote:
             | > "My students" - so there's really nothing on the line
             | except a grade then, yeah? That's why you wouldn't pay any
             | cost to make it 10x better, because there's no catastrophic
             | consequence if it fails. But sometimes wasting a few
             | milliwatts on average is the difference between success and
             | failure.
             | 
             | Years of effort from a large team is worth something, as is
             | the tens of thousands of dollars we're spending. We expect
             | a return on that investment of data and mission success.
             | We're spending a lot of money to improve odds of success.
             | 
             | But even in this power constrained application, a few
             | milliwatts is nothing. (Nearly half the time, it's
             | literally nothing, because we'd have to use power to run
             | heaters anyways. Most of the rest of the time, we're in the
             | sun, so there's a lot of power around, too). The marginal
             | benefit to saving a milliwatt is zero, so unless the
             | marginal cost is also zero we're not doing it.
             | 
             | > That aircraft was designed to be as efficient as
             | possible, so we could literally measure the inefficiency
             | from the choice of language in terms of how much it cost in
             | extra battery weight and therefore decreased range
             | 
             | If this is a powered-lift drone, that seems silly. It's
             | hard to waste enough power to be more than rounding error
             | compared to what large brushless motors take.
        
           | igouy wrote:
           | _otoh_ When performance doesn 't matter, it doesn't matter.
           | 
           |  _otoh_ When the title is  "Speeding up Ruby" we are kind-of
           | presuming it matters.
        
           | jerf wrote:
           | I wouldn't jump on it except for the number of times I've
           | been discussing this online and people completely seriously
           | counter "Python is a fairly slow language" with "But it's
           | popular!"
           | 
           | Fuzzy one-dimensional thinking that classifies languages on a
           | "good" and "bad" axis is quite endemic in this industry. And
           | for those people, you can counter "X is slow" with "X has
           | good library support", and disprove "X lacks good tooling"
           | with "But X has a good type system", because all they hear is
           | that you said something is "good" but they have a reason why
           | it's "bad", or vice versa.
           | 
           | Keep an eye out for it.
        
         | igouy wrote:
         | > people sort of chant "microbenchmarks are useless", but they
         | aren't useless.
         | 
         | They might be !
         | 
         | (They aren't necessarily useless. It depends. It depends what
         | one is looking for. It depends _etc etc_ )
         | 
         | > You can't divide one language's microbenchmark on this test
         | by another to get a "Python is 160x slower than C".
         | 
         | Sure you can !
         | 
         | https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
         | 
         | -- and --
         | 
         | Table 4, page 139
         | 
         | https://dl.acm.org/doi/pdf/10.1145/3687997.3695638
         | 
         | -- and then one has -- " _[A]_ Python is 160x slower than C "
         | not " _[THE]_ Python is 160x slower than C ".
         | 
         | Something multiple and tentative not something singular and
         | definitive.
        
       | davidw wrote:
       | It seems like it's been a while since I've seen one of these
       | language benchmark things.
       | 
       | https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
       | seems like the latest iteration of what used to be a pretty
       | popular one, now with fewer languages and more self-deprecation.
        
         | igouy wrote:
         | > fewer languages
         | 
         | Maybe you've only noticed the dozen in-your-face on the home
         | page?
         | 
         | The charts have shown ~27 for a decade or so.
         | 
         | There's another half-dozen more in the site map.
        
       | igouy wrote:
       | > a fun visualization of each language's performance
       | 
       | The effect is similar to dragging a string past a cat: complete
       | distraction -- unable to avoid focusing on the movement -- unable
       | to extract any information from the movement.
       | 
       | To understand the measurements, cover the "fun visualization" and
       | read the numbers in the single column data table.
       | 
       | (Unfortunately we aren't able to scan down the column of numbers,
       | because the language implementation name is shown first.)
       | 
       | Previously: <blink>
       | 
       | https://developer.mozilla.org/en-US/docs/Glossary/blink_elem...
        
         | chikere232 wrote:
         | It does visualise how big the difference is though
        
           | igouy wrote:
           | Cover up the single column of lang/secs and then try to read
           | how big the difference is between java and php from the
           | moving circles.
           | 
           | You would have no problem doing that with a [ _typo_
           | histogram should say bar chart].
        
             | MeetingsBrowser wrote:
             | Cover the labels on the histogram and try to read how big
             | the difference is between java and php....
        
               | igouy wrote:
               | We can read the relative difference from the length of
               | the bars because the bars are stable.
        
               | MeetingsBrowser wrote:
               | I can see the relative difference in speed between the
               | two balls.
        
               | igouy wrote:
               | "The first principle is that you must not fool yourself
               | and you are the easiest person to fool."
               | 
               | :-)
        
             | chikere232 wrote:
             | PHP looks much slower
        
               | igouy wrote:
               | The question is: How much slower?
               | 
               | We could try to count how many times the java circle
               | crosses left-to-right and right-to-left, in the time it
               | takes for the PHP circle to cross left-to-right once.
               | 
               | That's error prone but should be approximately correct
               | after a couple of attempts.
               | 
               | That's work we're forced to do because the "fun
               | visualization" is uninformative.
        
       | tiffanyh wrote:
       | Dart - I see it mentioned (and perf looks impressive), but is it
       | widely adopted?
       | 
       | Also, would have loved to see LuaJIT (interpreted lang) & Crystal
       | (static Ruby like language) included just for comparison sake.
        
         | igouy wrote:
         | fwiw
         | 
         | https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
        
         | suby wrote:
         | It looks like a more complete breakdown is here. Crystal ranks
         | just below Dart at 0.5413 (Dart was 0.5295). Luajit was 0.8056.
         | I'm surprised Luajit does worse than Dart. Actually I am
         | surprised Dart is beating out languages like C# too.
         | 
         | http://benjdd.com/languages2
        
           | igouy wrote:
           | Maybe that dozen lines of code isn't sufficient to
           | characterize performance differences?
           | 
           | Nearly 25 years ago, nested loops and fibs.
           | 
           | https://web.archive.org/web/20010424150558/http://www.bagley.
           | ..
           | 
           | https://web.archive.org/web/20010124092800/http://www.bagley.
           | ..
           | 
           | It's been a long time since the benchmarks game showed those.
        
           | saurik wrote:
           | Dart's VM was designed by the team (I think not just the one
           | guy, but maybe I'm wrong on that and it really is just Lars
           | Bak) that designed most of the truly notable VMs that have
           | ever existed: Self, Smalltalk Strongtalk, Java Hotspot, and
           | JavaScript V8. It also features an ahead-of-time compiler
           | mode in addition to a world-class JIT and interpreter,
           | allowing for hot reload during development.
           | 
           | https://en.m.wikipedia.org/wiki/Lars_Bak_(computer_programme.
           | ..
           | 
           | It was stuck with a bad rep for being the language that was
           | never going to replace JavaScript in the browser, and then
           | was merely a transpiler no one was going to use, before it
           | found a new life as the language for Flutter, which has
           | driven a lot of its syntax and semantics improvements since,
           | with built-in VM support for extremely efficient object
           | templating (used by the reactive UI framework).
        
           | lern_too_spel wrote:
           | Runtime startup isn't amortized.
        
             | igouy wrote:
             | How do you know?
        
           | neonsunset wrote:
           | This nested loops microbenchmark only measures in-loop
           | integer division optimizations on ARM64 - there are division
           | fault handling differences which are ARM64 specific which
           | introduce significant variance between compilers of
           | comparable capability.
           | 
           | On x86_64 I expect the numbers would have been much closer
           | and within measurement error. The top half is within
           | 0.5-0.59s - there really isn't much you can do inside such a
           | loop, almost nothing happens there.
           | 
           | As Isaac pointed out in a sibling comment - it's best to pick
           | specific microbenchmarks, a selection of languages and
           | implementations that interest you and dissect those - it will
           | tell you much more.
        
           | ModernMech wrote:
           | I wonder why C++ isn't in that list but a bunch of languages
           | no one uses are.
        
         | Alifatisk wrote:
         | Been using pure Dart since last year, it's a lovely language
         | that has it's quirks. I like it.
         | 
         | It's fast and flexible.
        
           | contagiousflow wrote:
           | Have you used it for anything other than Flutter? I recently
           | did a Flutter project and I'm interested in using dart more
           | now.
        
             | Alifatisk wrote:
             | Yes, that's what I meant with pure Dart. I've created cli's
             | with it and a little api-only server.
        
       | coliveira wrote:
       | This kind of benchmark doesn't make sense for Python because it
       | is measuring the speed of pure code written in the language.
       | However, and here is the important point, most python code rely
       | on compiled libraries to run fast. The heavy lifting in ML code
       | is done in C, and Python is used only as a glue language. Even
       | for web development this is also the case, Python is only calling
       | a bunch of libraries, many of those being written in C.
        
         | igouy wrote:
         | _aka_ Python is as fast as C when it is C.
        
         | knowitnone wrote:
         | I don't know. Ruby is able to call C too so it's a wash?
        
         | chucke wrote:
         | That's not true. Sure, many hot path functions dealing with
         | tensor calculations are done in numpy functions, but etl and
         | args/results are python objects and functions. And most web
         | development libs are pure python (flask, django, etc)
        
           | coliveira wrote:
           | For performance, hot paths are the only ones that matter.
        
             | IshKebab wrote:
             | Sure, but only a small subset of problems _have_ a hot
             | path. You can easily offload huge tensor operations to C.
             | That 's the best possible case. More usually the "hot path"
             | is fairly evenly distributed through your entire codebase.
             | If you offload the hot path to C you'll end up rewriting
             | the whole thing in C.
        
         | int_19h wrote:
         | Any language with FFI (which is like all of them, these days)
         | has the same exact issue, the only difference being how common
         | it is to drop into C or other fast compiled language for parts
         | of the code.
         | 
         | And this kind of benchmark is the one that tells you _why_ this
         | is different across different languages.
        
         | ModernMech wrote:
         | Yes, if you pull out all the optimization tricks for Python, it
         | will be faster than vanilla Python. And yet it's _still_ 6x
         | slower (by my measurement) than naive code written in a
         | compiled language like Rust without any libraries.
        
       | Alifatisk wrote:
       | Woah, Ruby has become fast, like really fast. What's even more
       | impressive is TruffleRuby, damn!
        
         | knowitnone wrote:
         | It's Oracle https://github.com/oracle/truffleruby Double Damn!
        
         | tiffanyh wrote:
         | Note that Rails doesn't work on Truffle and from what I
         | understand, won't anytime soon.
         | 
         | Which is disappointing since it has the highest likelihood of
         | making the biggest impact to Ruby perf.
        
           | uamgeoalsk wrote:
           | Huh, what exactly doesn't work? Their own readme says
           | "TruffleRuby runs Rails and is compatible with many gems,
           | including C extensions."
           | (https://github.com/oracle/truffleruby)
        
             | tiffanyh wrote:
             | Truffle:                 TruffleRuby is not 100% compatible
             | with MRI 3.2 yet
             | 
             | Rails:                 Rails 8 will require Ruby 3.2.0 or
             | newer
             | 
             | https://github.com/oracle/truffleruby
             | 
             | https://rubyonrails.org/2024/9/27/this-week-in-rails
        
               | hotpocket777 wrote:
               | Is it possible that those two statements taken together
               | means truffleruby can run rails 8?
        
       | Lammy wrote:
       | > There was a PR to improve the performance of `Integer#succ` in
       | early 2024, which helped me understand why anyone would ever use
       | it: "We use `Integer#succ` when we rewrite loop methods in Ruby
       | (e.g. `Integer#times` and `Array#each`) because `opt_succ (i =
       | i.succ)` is faster to dispatch on the interpreter than `putobject
       | 1; opt_plus (i += 1)`."
       | 
       | I find myself using `#succ` most often for readability reasons,
       | not just for performance. Here's an example where I use it twice
       | in my UUID library's `#bytes` method to keep my brain in "bit
       | slicing mode" when reading the code. I need to loop 16 times
       | (`0xF.succ`) and then within that loop divide things by 256
       | (`0xFF.succ`):
       | https://github.com/okeeblow/DistorteD/blob/ba48d10/Globe%20G...
        
         | e12e wrote:
         | Why do you find 0xF.succ better than 0x10 in this case?
        
       | knowitnone wrote:
       | I'm a little surprised that Node is beating Deno. Interesting
       | that Java would be faster than Kotlin since both run on jvm.
        
         | entropicdrifter wrote:
         | I mean, the JVM's been optimized specifically for Java since
         | the Bronze Ages at this point, it's not _that_ surprising
        
         | pjmlp wrote:
         | That is one of the differences between a platform systems
         | language, and guest languages.
         | 
         | You only have to check the additional bytecode that gets
         | generated, to work around the features not natively supported.
        
       | Someone wrote:
       | FTA: The loop example iterates 1 billion times, utilizing a
       | nested loop:                 u = ARGV[0].to_i              r =
       | rand(10_000)                                 a =
       | Array.new(10_000, 0)
       | (0...10_000).each do |i|
       | (0...100_000).each do |j|                          a[i] += j % u
       | end         a[i] += r                             end
       | puts a[r]
       | 
       | Weird benchmark. Hand-optimized, I guess this benchmark will
       | spend over 99% of its time in the first two lines.
       | 
       | If you do liveliness analysis on array elements you'll discover
       | that it is possible to remove the entire outer loop, turning the
       | program into:                 u = ARGV[0].to_i              r =
       | rand(10_000)
       | (0...100_000).each do |j|                        a += j % u
       | end       a += r                                    puts a
       | 
       | Are there compilers that do this kind of analysis?
       | 
       | Even though _u_ isn't known at compile time, that inner loop can
       | be replaced by a few instructions, too, but that's a more
       | standard optimization that, I suspect, the likes of _clang_ may
       | be close to making.
        
         | IshKebab wrote:
         | Compilers don't do liveness analysis on individual array
         | elements. It's too much data to keep track of and would
         | probably only be useful in incorrect code like this.
         | 
         | I used to work on an AI compiler where liveness analysis of
         | individual tensor elements actually _would_ have been useful.
         | We still didn 't do it because the compilation time/memory
         | requirements would be insane.
        
       | smileson2 wrote:
       | Game changing for my advent of code solutions which look
       | surprisingly similar
        
       | coliveira wrote:
       | Although being slow, Python has a saving grace: it doesn't have a
       | huge virtual machine like Java, so it can in many situations
       | provide a better experience.
        
         | igouy wrote:
         | Does JavaME have a "huge virtual machine" ?
         | 
         | https://www.oracle.com/java/technologies/javameoverview.html
         | 
         | Do you mean CPython or PyPy or MicroPython or ?
        
           | coliveira wrote:
           | > Does JavaME have a "huge virtual machine"
           | 
           | Yes, compared to Python.
           | 
           | > Do you mean CPython or PyPy
           | 
           | Python standard virtual machine is called CPython, just look
           | at the official web page.
        
       ___________________________________________________________________
       (page generated 2024-12-04 23:00 UTC)