[HN Gopher] Speeding up Ruby by rewriting C in Ruby
___________________________________________________________________
Speeding up Ruby by rewriting C in Ruby
Author : todsacerdoti
Score : 184 points
Date : 2024-12-04 12:31 UTC (10 hours ago)
(HTM) web link (jpcamara.com)
(TXT) w3m dump (jpcamara.com)
| ksec wrote:
| >This got me thinking that it would be interesting to see a kind
| of "YJIT standard library" emerge, where core ruby functionality
| run in C could be swapped out for Ruby implementations for use by
| people using YJIT.
|
| This actually makes me feel sad because it reminded me of Chris
| Seaton. The idea isn't new and Chris has been promoting it during
| his time working on TruffleRuby. I think the idea goes back even
| further to Rubinius.
|
| It is also nice to see TruffleRuby being very fast and YJIT still
| has lots of headroom to grow. I remember one obstacle with it
| running rails was memory usage. I wonder if that is still the
| case.
| Asmod4n wrote:
| One of the amazing things truffle ruby does is handle c
| extensions like ruby code, meaning C is interpreted and not
| compiled in a traditional sense.
|
| This makes way for jitting c code to make it way faster than
| the author has written it.
| pantulis wrote:
| Amazing indeed!
| 0x457 wrote:
| Yup, Rubinius was probably the most widely known implementation
| of Ruby's standard library in Ruby. Too bad it was slower than
| MRI.
| e12e wrote:
| I thought maybe mruby had a mostly ruby stdlib - but I guess
| it's c ported over from mri?
| Imustaskforhelp wrote:
| super interesting , actually I am also a contributer of the
| https://github.com/bddicken/languages and after I had tried to
| create a lua approach , I started to think of truffleruby as it
| was mentioned somewhere but unfortunately when I had run the code
| of main.rb , there was virtually no significant difference b/w
| truffleruby and main.rb (sometimes normal ruby was faster than
| truffleruby)
|
| I am not sure if the benchmark that you had provided showing the
| speed of truffleruby were made after the changes that you have
| made.
|
| I would really appreciate it if I could verify the benchmark
|
| and maybe try to add it to the main
| https://github.com/bddicken/languages as a commit as well ,
| because the truffleruby implementation actually is faster than
| the node js and goes close to bun or even golang for that matter
| which is nuts.
|
| This was a fun post to skim through , definitely bookmarking it.
| jeremy_k wrote:
| Super interesting. I didn't know that YJIT was written in Rust.
| jerf wrote:
| "In most ways, these types of benchmarks are meaningless. Python
| was the slowest language in the benchmark, and yet at the same
| time it's the most used language on Github as of October 2024."
|
| First, this indicates some sort of deep confusion about the
| purpose of benchmarks in the first place. Benchmarks are
| performance tests, not popularity tests. And I don't think I'm
| just jumping on a bit of bad wording, because I see this idea in
| its various forms a lot poking out in a lot of conversations.
| Python is popular because there are many aspects to it, among
| which is the fact that yes, it really is a rather slow language,
| but the positives _outweigh_ it for many purposes. They don 't
| _cancel_ it. Python 's other positive aspects do not speed it up;
| indeed, they're actually critically tied to why it is slow in the
| first place. If they were not, Python would not be slow. It has
| had a lot of work done on it over the years, after all.
|
| Secondly, I think people sort of chant "microbenchmarks are
| _useless_ ", but they aren't _useless_. I find that
| microbenchmark actually represents some fairly realistic
| representation of the relative performance of those various
| languages. What they are not is totally determinative. You can 't
| divide one language's microbenchmark on this test by another to
| get a "Python is 160x slower than C". This is, in fact, not an
| accurate assessment; if you want a single unified number, 40-50
| is much closer. But "useless" is _way_ too strong. No language is
| so wonderful on all other dimensions that it can have something
| as basic as a function call be dozens of times slower than some
| other language and yet keep up with that other language in
| general. (Assuming both languages have had production-quality
| optimizations applied to them and one of them isn 't some very
| very young language.) It is a real fact about these languages, it
| is not a huge outlier, and it is a problem I've encountered in
| real codebases before when I needed to literally optimize out
| function calls in a dynamic scripting language to speed up
| certain code to acceptable levels, because function calls in
| dynamic scripting languages really are expensive in a way that
| really can matter. It shouldn't be overestimated and used to
| derive silly "x times faster/slower" values, but at the same
| time, if you're dismissing these sorts of things, you're throwing
| away real data. There are no languages that are just as fast as
| C, except gee golly they just happen to have this one thing where
| function calls are 1000 times slower for no reason even though
| everything else is C-speed. These performance differences are
| reasonably correlated.
| chikere232 wrote:
| very true.
|
| Also, for a lot of the areas where languages like python or
| ruby aren't great choices because of performance, they would
| _also_ not be great choices because of the cost of maintaining
| untyped code, or in python 's case the cost of maintaining code
| in a language that keeps making breaking changes in minor
| versions.
|
| Script with scripting languages, build other things in other
| languages
| mlyle wrote:
| > First, this indicates some sort of deep confusion about the
| purpose of benchmarks in the first place. Benchmarks are
| performance tests, not popularity tests.
|
| I don't think it indicates a deep confusion. I think it leaves
| a simple point unsaid because it's so strongly implied (related
| to what you say):
|
| Python may be very low in benchmarks, but clearly it has
| _acceptable performance_ for a very large subset of
| applications. As a result, a whole lot of us can ignore the
| benchmarks.
|
| Even in domains where one would have shuddered at this before.
| My students are launching a satellite into low earth orbit that
| has its primary flight computer running python. Yes, sometimes
| this does waste a few hundred milliseconds and it wastes
| several milliwatts on average. But even in the constrained
| environment of a tiny microcontroller in low earth orbit,
| language performance doesn't really matter to us.
|
| We wouldn't pay any kind of cost (financial or giving up any
| features) to make it 10x better.
| ModernMech wrote:
| "My students" - so there's really nothing on the line except
| a grade then, yeah? That's why you wouldn't pay any cost to
| make it 10x better, because there's no catastrophic
| consequence if it fails. But sometimes wasting a few
| milliwatts on average is the difference between success and
| failure.
|
| I've built an autonomous drone using Matlab. It worked but it
| was a research project, so when it came down to making the
| thing real and putting our reputation on the line, we
| couldn't keep going down that route -- we couldn't afford the
| interpreter overhead, the GC pauses, and all the other
| nonsense. That aircraft was designed to be as efficient as
| possible, so we could literally measure the inefficiency from
| the choice of language in terms of how much it cost in extra
| battery weight and therefore decreased range.
|
| If you can afford that, great, you have the freedom to run
| your satellite in whatever language. If not, then yeah you're
| going to choose a different language if it means extra
| performance, more runtime, greater range, etc.
| mlyle wrote:
| > "My students" - so there's really nothing on the line
| except a grade then, yeah? That's why you wouldn't pay any
| cost to make it 10x better, because there's no catastrophic
| consequence if it fails. But sometimes wasting a few
| milliwatts on average is the difference between success and
| failure.
|
| Years of effort from a large team is worth something, as is
| the tens of thousands of dollars we're spending. We expect
| a return on that investment of data and mission success.
| We're spending a lot of money to improve odds of success.
|
| But even in this power constrained application, a few
| milliwatts is nothing. (Nearly half the time, it's
| literally nothing, because we'd have to use power to run
| heaters anyways. Most of the rest of the time, we're in the
| sun, so there's a lot of power around, too). The marginal
| benefit to saving a milliwatt is zero, so unless the
| marginal cost is also zero we're not doing it.
|
| > That aircraft was designed to be as efficient as
| possible, so we could literally measure the inefficiency
| from the choice of language in terms of how much it cost in
| extra battery weight and therefore decreased range
|
| If this is a powered-lift drone, that seems silly. It's
| hard to waste enough power to be more than rounding error
| compared to what large brushless motors take.
| igouy wrote:
| _otoh_ When performance doesn 't matter, it doesn't matter.
|
| _otoh_ When the title is "Speeding up Ruby" we are kind-of
| presuming it matters.
| jerf wrote:
| I wouldn't jump on it except for the number of times I've
| been discussing this online and people completely seriously
| counter "Python is a fairly slow language" with "But it's
| popular!"
|
| Fuzzy one-dimensional thinking that classifies languages on a
| "good" and "bad" axis is quite endemic in this industry. And
| for those people, you can counter "X is slow" with "X has
| good library support", and disprove "X lacks good tooling"
| with "But X has a good type system", because all they hear is
| that you said something is "good" but they have a reason why
| it's "bad", or vice versa.
|
| Keep an eye out for it.
| igouy wrote:
| > people sort of chant "microbenchmarks are useless", but they
| aren't useless.
|
| They might be !
|
| (They aren't necessarily useless. It depends. It depends what
| one is looking for. It depends _etc etc_ )
|
| > You can't divide one language's microbenchmark on this test
| by another to get a "Python is 160x slower than C".
|
| Sure you can !
|
| https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
|
| -- and --
|
| Table 4, page 139
|
| https://dl.acm.org/doi/pdf/10.1145/3687997.3695638
|
| -- and then one has -- " _[A]_ Python is 160x slower than C "
| not " _[THE]_ Python is 160x slower than C ".
|
| Something multiple and tentative not something singular and
| definitive.
| davidw wrote:
| It seems like it's been a while since I've seen one of these
| language benchmark things.
|
| https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
| seems like the latest iteration of what used to be a pretty
| popular one, now with fewer languages and more self-deprecation.
| igouy wrote:
| > fewer languages
|
| Maybe you've only noticed the dozen in-your-face on the home
| page?
|
| The charts have shown ~27 for a decade or so.
|
| There's another half-dozen more in the site map.
| igouy wrote:
| > a fun visualization of each language's performance
|
| The effect is similar to dragging a string past a cat: complete
| distraction -- unable to avoid focusing on the movement -- unable
| to extract any information from the movement.
|
| To understand the measurements, cover the "fun visualization" and
| read the numbers in the single column data table.
|
| (Unfortunately we aren't able to scan down the column of numbers,
| because the language implementation name is shown first.)
|
| Previously: <blink>
|
| https://developer.mozilla.org/en-US/docs/Glossary/blink_elem...
| chikere232 wrote:
| It does visualise how big the difference is though
| igouy wrote:
| Cover up the single column of lang/secs and then try to read
| how big the difference is between java and php from the
| moving circles.
|
| You would have no problem doing that with a [ _typo_
| histogram should say bar chart].
| MeetingsBrowser wrote:
| Cover the labels on the histogram and try to read how big
| the difference is between java and php....
| igouy wrote:
| We can read the relative difference from the length of
| the bars because the bars are stable.
| MeetingsBrowser wrote:
| I can see the relative difference in speed between the
| two balls.
| igouy wrote:
| "The first principle is that you must not fool yourself
| and you are the easiest person to fool."
|
| :-)
| chikere232 wrote:
| PHP looks much slower
| igouy wrote:
| The question is: How much slower?
|
| We could try to count how many times the java circle
| crosses left-to-right and right-to-left, in the time it
| takes for the PHP circle to cross left-to-right once.
|
| That's error prone but should be approximately correct
| after a couple of attempts.
|
| That's work we're forced to do because the "fun
| visualization" is uninformative.
| tiffanyh wrote:
| Dart - I see it mentioned (and perf looks impressive), but is it
| widely adopted?
|
| Also, would have loved to see LuaJIT (interpreted lang) & Crystal
| (static Ruby like language) included just for comparison sake.
| igouy wrote:
| fwiw
|
| https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
| suby wrote:
| It looks like a more complete breakdown is here. Crystal ranks
| just below Dart at 0.5413 (Dart was 0.5295). Luajit was 0.8056.
| I'm surprised Luajit does worse than Dart. Actually I am
| surprised Dart is beating out languages like C# too.
|
| http://benjdd.com/languages2
| igouy wrote:
| Maybe that dozen lines of code isn't sufficient to
| characterize performance differences?
|
| Nearly 25 years ago, nested loops and fibs.
|
| https://web.archive.org/web/20010424150558/http://www.bagley.
| ..
|
| https://web.archive.org/web/20010124092800/http://www.bagley.
| ..
|
| It's been a long time since the benchmarks game showed those.
| saurik wrote:
| Dart's VM was designed by the team (I think not just the one
| guy, but maybe I'm wrong on that and it really is just Lars
| Bak) that designed most of the truly notable VMs that have
| ever existed: Self, Smalltalk Strongtalk, Java Hotspot, and
| JavaScript V8. It also features an ahead-of-time compiler
| mode in addition to a world-class JIT and interpreter,
| allowing for hot reload during development.
|
| https://en.m.wikipedia.org/wiki/Lars_Bak_(computer_programme.
| ..
|
| It was stuck with a bad rep for being the language that was
| never going to replace JavaScript in the browser, and then
| was merely a transpiler no one was going to use, before it
| found a new life as the language for Flutter, which has
| driven a lot of its syntax and semantics improvements since,
| with built-in VM support for extremely efficient object
| templating (used by the reactive UI framework).
| lern_too_spel wrote:
| Runtime startup isn't amortized.
| igouy wrote:
| How do you know?
| neonsunset wrote:
| This nested loops microbenchmark only measures in-loop
| integer division optimizations on ARM64 - there are division
| fault handling differences which are ARM64 specific which
| introduce significant variance between compilers of
| comparable capability.
|
| On x86_64 I expect the numbers would have been much closer
| and within measurement error. The top half is within
| 0.5-0.59s - there really isn't much you can do inside such a
| loop, almost nothing happens there.
|
| As Isaac pointed out in a sibling comment - it's best to pick
| specific microbenchmarks, a selection of languages and
| implementations that interest you and dissect those - it will
| tell you much more.
| ModernMech wrote:
| I wonder why C++ isn't in that list but a bunch of languages
| no one uses are.
| Alifatisk wrote:
| Been using pure Dart since last year, it's a lovely language
| that has it's quirks. I like it.
|
| It's fast and flexible.
| contagiousflow wrote:
| Have you used it for anything other than Flutter? I recently
| did a Flutter project and I'm interested in using dart more
| now.
| Alifatisk wrote:
| Yes, that's what I meant with pure Dart. I've created cli's
| with it and a little api-only server.
| coliveira wrote:
| This kind of benchmark doesn't make sense for Python because it
| is measuring the speed of pure code written in the language.
| However, and here is the important point, most python code rely
| on compiled libraries to run fast. The heavy lifting in ML code
| is done in C, and Python is used only as a glue language. Even
| for web development this is also the case, Python is only calling
| a bunch of libraries, many of those being written in C.
| igouy wrote:
| _aka_ Python is as fast as C when it is C.
| knowitnone wrote:
| I don't know. Ruby is able to call C too so it's a wash?
| chucke wrote:
| That's not true. Sure, many hot path functions dealing with
| tensor calculations are done in numpy functions, but etl and
| args/results are python objects and functions. And most web
| development libs are pure python (flask, django, etc)
| coliveira wrote:
| For performance, hot paths are the only ones that matter.
| IshKebab wrote:
| Sure, but only a small subset of problems _have_ a hot
| path. You can easily offload huge tensor operations to C.
| That 's the best possible case. More usually the "hot path"
| is fairly evenly distributed through your entire codebase.
| If you offload the hot path to C you'll end up rewriting
| the whole thing in C.
| int_19h wrote:
| Any language with FFI (which is like all of them, these days)
| has the same exact issue, the only difference being how common
| it is to drop into C or other fast compiled language for parts
| of the code.
|
| And this kind of benchmark is the one that tells you _why_ this
| is different across different languages.
| ModernMech wrote:
| Yes, if you pull out all the optimization tricks for Python, it
| will be faster than vanilla Python. And yet it's _still_ 6x
| slower (by my measurement) than naive code written in a
| compiled language like Rust without any libraries.
| Alifatisk wrote:
| Woah, Ruby has become fast, like really fast. What's even more
| impressive is TruffleRuby, damn!
| knowitnone wrote:
| It's Oracle https://github.com/oracle/truffleruby Double Damn!
| tiffanyh wrote:
| Note that Rails doesn't work on Truffle and from what I
| understand, won't anytime soon.
|
| Which is disappointing since it has the highest likelihood of
| making the biggest impact to Ruby perf.
| uamgeoalsk wrote:
| Huh, what exactly doesn't work? Their own readme says
| "TruffleRuby runs Rails and is compatible with many gems,
| including C extensions."
| (https://github.com/oracle/truffleruby)
| tiffanyh wrote:
| Truffle: TruffleRuby is not 100% compatible
| with MRI 3.2 yet
|
| Rails: Rails 8 will require Ruby 3.2.0 or
| newer
|
| https://github.com/oracle/truffleruby
|
| https://rubyonrails.org/2024/9/27/this-week-in-rails
| hotpocket777 wrote:
| Is it possible that those two statements taken together
| means truffleruby can run rails 8?
| Lammy wrote:
| > There was a PR to improve the performance of `Integer#succ` in
| early 2024, which helped me understand why anyone would ever use
| it: "We use `Integer#succ` when we rewrite loop methods in Ruby
| (e.g. `Integer#times` and `Array#each`) because `opt_succ (i =
| i.succ)` is faster to dispatch on the interpreter than `putobject
| 1; opt_plus (i += 1)`."
|
| I find myself using `#succ` most often for readability reasons,
| not just for performance. Here's an example where I use it twice
| in my UUID library's `#bytes` method to keep my brain in "bit
| slicing mode" when reading the code. I need to loop 16 times
| (`0xF.succ`) and then within that loop divide things by 256
| (`0xFF.succ`):
| https://github.com/okeeblow/DistorteD/blob/ba48d10/Globe%20G...
| e12e wrote:
| Why do you find 0xF.succ better than 0x10 in this case?
| knowitnone wrote:
| I'm a little surprised that Node is beating Deno. Interesting
| that Java would be faster than Kotlin since both run on jvm.
| entropicdrifter wrote:
| I mean, the JVM's been optimized specifically for Java since
| the Bronze Ages at this point, it's not _that_ surprising
| pjmlp wrote:
| That is one of the differences between a platform systems
| language, and guest languages.
|
| You only have to check the additional bytecode that gets
| generated, to work around the features not natively supported.
| Someone wrote:
| FTA: The loop example iterates 1 billion times, utilizing a
| nested loop: u = ARGV[0].to_i r =
| rand(10_000) a =
| Array.new(10_000, 0)
| (0...10_000).each do |i|
| (0...100_000).each do |j| a[i] += j % u
| end a[i] += r end
| puts a[r]
|
| Weird benchmark. Hand-optimized, I guess this benchmark will
| spend over 99% of its time in the first two lines.
|
| If you do liveliness analysis on array elements you'll discover
| that it is possible to remove the entire outer loop, turning the
| program into: u = ARGV[0].to_i r =
| rand(10_000)
| (0...100_000).each do |j| a += j % u
| end a += r puts a
|
| Are there compilers that do this kind of analysis?
|
| Even though _u_ isn't known at compile time, that inner loop can
| be replaced by a few instructions, too, but that's a more
| standard optimization that, I suspect, the likes of _clang_ may
| be close to making.
| IshKebab wrote:
| Compilers don't do liveness analysis on individual array
| elements. It's too much data to keep track of and would
| probably only be useful in incorrect code like this.
|
| I used to work on an AI compiler where liveness analysis of
| individual tensor elements actually _would_ have been useful.
| We still didn 't do it because the compilation time/memory
| requirements would be insane.
| smileson2 wrote:
| Game changing for my advent of code solutions which look
| surprisingly similar
| coliveira wrote:
| Although being slow, Python has a saving grace: it doesn't have a
| huge virtual machine like Java, so it can in many situations
| provide a better experience.
| igouy wrote:
| Does JavaME have a "huge virtual machine" ?
|
| https://www.oracle.com/java/technologies/javameoverview.html
|
| Do you mean CPython or PyPy or MicroPython or ?
| coliveira wrote:
| > Does JavaME have a "huge virtual machine"
|
| Yes, compared to Python.
|
| > Do you mean CPython or PyPy
|
| Python standard virtual machine is called CPython, just look
| at the official web page.
___________________________________________________________________
(page generated 2024-12-04 23:00 UTC)