[HN Gopher] The intricacies of implementing memoization in Ruby
___________________________________________________________________
The intricacies of implementing memoization in Ruby
Author : thunderbong
Score : 97 points
Date : 2024-12-23 13:37 UTC (9 hours ago)
(HTM) web link (denisdefreyne.com)
(TXT) w3m dump (denisdefreyne.com)
| User23 wrote:
| It's fun how the runtime in seconds for the naive Fibonacci is
| 36: 14930352 (1.2s) 37: 24157817 (2.0s) 38: 39088169
| (3.2s) 39: 63245986 (5.2s) 40: 102334155 (8.3s)
| 41: 165580141 (13.5s) 42: 267914296 (21.8s)
|
| What a familiar pattern in those runtimes!
| izietto wrote:
| Fibonacci never stops surprising!
| JadeNB wrote:
| That's a neat observation! But isn't it for the naive reason
| that, with negligible overhead, the run time of executing
| `fib(n) = fib(n - 1) + fib(n - 2)` is the run time of executing
| `fib(n - 1)`, plus the run time of executing `fib(n - 2)`?
|
| On the other hand, if someone had asked me for an estimate on
| the run time, I probably would have tried breaking out the
| master theorem instead of thinking of this, so I don't want to
| downplay the observation.
| thih9 wrote:
| I know it's pointless but now I'd like to see a fibonacci
| implementation that returns these runtimes as the result.
| kazinator wrote:
| Another kind of fun with Fibonacci (sort of):
|
| https://news.ycombinator.com/item?id=14331627
|
| floor(x + 0.5) wants to be round(x), that function having been
| introduced in C99. Just remnants of old school C coding.
| thih9 wrote:
| > My preferred approach to [thread safety] is to explicitly not
| think about thread safety, and delegate this concern to the
| caller.
|
| Good call IMO. If I was the author then I'd implement this; it
| would mostly work for current use cases, would be a pain to
| maintain and impossible to extend. I learned something today.
| tomchuk wrote:
| Can't mention Fibonacci and memoization in the same sentence
| without me breaking out my favorite Python party trick:
| def fib(n, cache = {0: 0, 1: 1}): if n not in cache:
| cache[n] = fib(n-1) + fib(n-2) return cache[n]
| jaggederest wrote:
| The directly translated ruby version (from stack overflow of
| course) is even shorter: def fib(n,
| cache=Hash.new{ |h,k| h[k] = k < 2 ? k : h[k-1] + h[k-2] })
| cache[n] end
|
| It runs out of stack around 7146 on my machine at least. The
| python one is limited by the recursion depth limit in my test
| but of course that's configurable at your own risk.
| vidarh wrote:
| If you're first going to golf it, endless-def:
| def fib(n, cache=Hash.new{ |h,k| h[k] = k < 2 ? k : h[k-1] +
| h[k-2] }) = cache[n]
| jaggederest wrote:
| It's actually kind of ungolfed. The default version would
| be just fib = Hash.new{ |h,k| h[k] = k <
| 2 ? k : h[k-1] + h[k-2] } fib[7145]
| inopinatus wrote:
| This is the proper Ruby form since it relies on the []
| operator and is therefore substitutable for any other
| callable esp. procs/lambdas, and able to wrap them as
| well.
|
| This isn't just golf, it's an excellent way to pass
| around a lazily computed, self-caching closure. I use it
| often in preference to memoization gems.
|
| Aside from noting the false/nil equivalence concern, the
| original article seems over-engineered to me. Excessive
| metaprogramming is a common trap for Ruby devs.
| nyrikki wrote:
| IMHO, the mamul form is even better for golf, F(n) in O(n
| log n)
| x86x87 wrote:
| RUBY_THREAD_VM_STACK_SIZE for Ruby
| jez wrote:
| The Python version and this Ruby version are not equivalent.
|
| In Ruby, default parameters are allocated for each call to
| the function, which means that they are not shared across
| calls.
|
| In Python, default parameters are allocated once and shared
| across all calls, allowing them to be mutated.
|
| This becomes obvious if we change the Python and Ruby
| versions to print when they're computing something that is
| not yet cached. For back-to-back calls to `fib(10)`, the
| Python version prints only on the first call to `fib(10)`.
| The Ruby version must recompute all values from 2 - 10 again.
| azhenley wrote:
| That's beautiful. Why didn't I think of that?!
| rplnt wrote:
| Probably because not using mutable default arguments is
| python 101. Just something you don't do or think about. But
| this is indeed clever use of that gotcha.
| kccqzy wrote:
| You should see the Mathematica version:
| fibonacci[0] = 0; fibonacci[1] = 1;
| fibonacci[n_] := fibonacci[n] = fibonacci[n-1] + fibonacci[n-2]
|
| It cleverly uses both = and := together. Usually people use =
| for immediate assignment (such as constants) and := for delayed
| assignment (such as functions) but this combines the two.
| kazinator wrote:
| Does that work due to Ruby having Python-like broken semantics
| for evaluating the default value expressions for optional
| arguments? (I have no idea.) Or does it work due to the object
| denoted by {...} being a real literal?
|
| If the default value expression is evaluated on each call to
| the function in which it is required rather than at function
| definition time, and if the { ... } syntax is a constructor
| that creates a fresh object, then this wouldn't work; one of
| those two conditions (or both) must be broken.
| kazinator wrote:
| TXR Lisp version: (defun fib (n : (cache #H(()
| (0 1) (1 1)))) (or [cache n] (set [cache n]
| (+ (fib (pred n)) (fib (ppred n))))))
|
| TXR Lisp does _not_ have broken Python semantics for evaluating
| the default expressions for optional arguments. The expressions
| are freshly evaluated on each call in which they are needed, in
| the lexical scope in in which the prior parameters are already
| visible, as well as the scope surrounding the function
| definition.
|
| Why this works is that the #H hash literal syntax is a true
| literal. Every time that expression is evaluated, it yields the
| same hash table, which is mutable. This wouldn't work in an
| implementation of TXR Lisp in which literals are put into a ROM
| image or mapped into read only virtual memory. We will not
| likely see such a thing any time soon.
|
| If we change #H(...) to, say, (hash-props 0 1 1 1), it won't
| work any more.
| jmull wrote:
| In real code I've only actually seen memoization used as a cache.
| That leaves cache validation to some higher level which is in a
| poor position to do it very well. In my experience, most often it
| means someone has to figure out what things to manually restart,
| and when.
| kazinator wrote:
| You mean invalidation?
|
| Speaking of invalidation, the memoized Fibonacci will work even
| with a cache of 1 item that is replaced each time. (By work, I
| mean slash exponential time to linear.)
|
| If you calculate fib(n - 2) first and cache it, evacuating
| everything else in the process, that is good enough: when f(n -
| 1) needs f(n - 2), it finds it instead of recursing, and so the
| multply-and-surrender logic is disrupted, resulting in time
| proportional to n.
| JadeNB wrote:
| With all that it borrowed from Perl, I'm surprised Ruby didn't
| also borrow the defined-or operator `//=`
| (https://perldoc.perl.org/perlop#Logical-Defined-Or). (Or, if it
| did, I guess the author hadn't encountered it.)
| bradly wrote:
| I don't think Ruby has this natively, but Rails adds it with
| Object#presence.
| https://api.rubyonrails.org/classes/Object.html#method-i-pre...
| tprynn wrote:
| ActiveSupport adds tons of great convenience methods to Ruby
| and you can require it even outside of Rails! blank, present;
| date and time conversions like 1.month.from_now; except;
| enumerable methods like pluck... It's just lovely
| require 'active_support/all'
|
| https://guides.rubyonrails.org/v5.2/active_support_core_exte.
| ..
| JadeNB wrote:
| > I don't think Ruby has this natively, but Rails adds it
| with Object#presence. >
| https://api.rubyonrails.org/classes/Object.html#method-i-
| pre...
|
| It's not immediately obvious to me how one could use that to
| write a Ruby analogue of `a //= b`. If I understand
| correctly, it would still need to be the relatively verbose
| `a = a.present? ? a : b`. (I think even `a = a.presence || b`
| wouldn't work in case `a` was defined but nil, which is
| exactly the edge case the defined-or operator is meant to
| handle.)
| gls2ro wrote:
| Not sure I understand this example from Perl but Ruby has ||=
| that checks for nil but not defined.
|
| You can do
|
| a ||= b
|
| And if a is nil then it will get the value from b else it will
| remain with the current value
| viraptor wrote:
| This really checks for true values, which is fine for nil and
| numbers. But it will fail you if you want to cache `false` or
| any falsey object.
| kazinator wrote:
| That's an anti-pattern. In a properly lexically scoped
| language, an undefined identifier is erroneous, diagnosable
| before you even run the code.
|
| If optional dynamic scoping is supported, it can make sense to
| deal with an undefined variable, but is still something you
| would want to avoid. E.g. Common Lisp programs almost always
| assidiously define dynamic variables via a top-level _defvar_.
|
| Testing for definedness is usually part of some hack that
| doesn't deserve its own syntax as a predefined language
| feature.
|
| The basic idea is that programs should usually know which of
| their variables are defined, at what point in the program. It
| should be statically obvious, or failing that, at least
| dynamically obvious. The remaining situations can be valid, but
| are low-priority hacks.
|
| (You can always have your macro for it, if you want it badly
| enough.)
|
| For the environment var example, we can use the regular _or_
| (which is much like ||), if the lookup yields _nil_ , which is
| false: 1> (or (getenv "HOME") (getenv "LOGDIR")
| (error "You're homeless!")) "/Users/kazinator" 2>
| (or (getenv "XXXHOME") (getenv "LOGDIR") (error "You're
| homeless!")) ** You're homeless! ** during
| evaluation at expr-2:1 of form (error "You're homeless!")
|
| Looks like Perl made a mess of this simple thing, too.
|
| I understand some languages have an "undefined" value for a
| variable that is actually defined, but that's pretty lame too;
| it gives rise to multiple ways of being false. Awk is like
| this. Awk knows every variable in the program before executing
| it. GNU Awk builds a global symbol table of all variables at
| compile time. So every variable is considered defined in the
| sense of being known to the implementation and existing. It
| just has a kind of undefined value initially that serves as an
| empty string or zero, and tests false, but is neither a number
| nor string. It's a pretty cockamamie design from the POV of
| larger programs, but does lead to very short one-liners.
| JadeNB wrote:
| These thoughts on proper typing discipline are a valuable
| perspective on good programming-language design, but I think
| that they probably don't help much for writing code in Ruby,
| which, I believe, doesn't behave this way. (I'm not a Ruby-
| ist, so can't speak authoritatively to it.)
|
| Nonetheless, supposing that we're working in a language that
| behaves as you suggest--if memoization isn't built in, how
| would a user build it in a way that allows caching of the
| return value `false`? (For example, for testing whether a
| positive integer is prime, where getting that `false` value
| could be very expensive.)
| mmahemoff wrote:
| I've come across memoize gems before and thought, "do I really
| need to introduce a dependency when I can save results into a
| hash", but this makes a convincing argument to leverage a gem
| like memo_wise. Clean DSL syntax and the perf gains are much
| nicer than micromanaging it.
| jacobevelyn wrote:
| memo_wise maintainer here. No pressure from me to try it out,
| but if you do use it and have any feedback or suggestions
| please let us know!
| nixpulvis wrote:
| There are only 1 hard problems in computer science: memoization
| and ... shoot what was it again?
| kazinator wrote:
| There are only 2 hard problems in computer science:
|
| 1. Naming things
|
| 2. Memori^H^Hdamn it^H^H^H^H^H^H^Hization.
|
| P.S. I see what you did there. You botched your cache
| invalidation, and so lost the answer.
| drusepth wrote:
| 3. Off-by-1 errors
| mtkd wrote:
| With any Ruby code under load, it's important to have a proper
| understanding of what's happening in memory -- if adding
| memoisation is yielding significant gains, there's likely much
| more potential for optimisation
|
| Many in the Ruby community have taken a 1974 quote from Donald
| Knuth out of context and used it as a justification for minimal
| or zero effort on memory efficiency
|
| The article is a very interesting read, but implementation is
| heavily reliant on metaprogramming magic that can be something of
| a grenade in realworld projects especially if anyone fairly new
| to Ruby has to fix it later
|
| Almost none of the gems I used 10 years ago still exist today --
| taking a small penalty for hand-rolling initially will usually
| save days or more later -- the author even says a similar earlier
| Gem he built is long gone
| bradly wrote:
| > if adding memoization is yielding significant gains, there's
| likely much more potential for optimization
|
| Most often when I see memoization in Ruby code it is when
| fetching data from either a db or external service. Memory
| isn't the concern is these cases.
|
| > Almost none of the gems I used 10 years ago still exist today
| -- taking a small penalty for hand-rolling initially will
| usually save days or more later -- the author even says a
| similar earlier Gem he built is long gone
|
| With a clean DSL changing the gem or removing it for your own
| solution should not be problem.
| mtkd wrote:
| If app is repeatedly asking DB for same data and it's
| immutable for a period -- the persistance choice is wrong
|
| There is potentially a whole lot of memory/callstack overhead
| involved in a request like that even when memoised
|
| It likely should be in a memory structure all the time or
| Redis if sufficiently complex / shared across processes etc.
| as all the other processes will likely be doing same
|
| Only in Ruby world would someone say yeah what we need right
| there is a Gem with some define_method sprinkles
| wutwutwat wrote:
| > If app is repeatedly asking DB for same data and it's
| immutable for a period -- the persistance choice is wrong
|
| Says who? ACID databases are the norm for a reason.
|
| > There is potentially a whole lot of memory/callstack
| overhead involved in a request like that even when memoised
|
| So because there is overhead elsewhere you shouldn't
| optimize here?
|
| > It likely should be in a memory structure all the time or
| Redis if sufficiently complex / shared across processes
| etc. as all the other processes will likely be doing same
|
| Redis doesn't help if the issue is the network io.
| Accessing anything out of process memory will objectively
| be slower than accessing in process memory. Doesn't matter
| if the data is coming from pg, redis, or wherever. If its
| going across the wire, there will be io latency.
|
| > Only in Ruby world would someone say yeah what we need
| right there is a Gem with some define_method sprinkles
|
| Ah, I see this is based on a negative view of ruby in
| particular. Memoization is a common pattern in basically
| every language, and can net significant performance wins
| with very little effort (low hanging fruit). Is it a magic
| bullet? Nope. But it's nice to have in the tool belt when
| dealing with an app that's falling on its face and every
| improvment means you get to sleep better at night.
| ufmace wrote:
| The thing about metaprogramming magic in Ruby is that, it can
| work okay for one thing, but if you have 2 or 3 or more things
| all using it to do different things, it's very likely they'll
| conflict and blow up in a way that's really difficult to
| troubleshoot or solve.
| 15155 wrote:
| I see no mention of object shape optimization and why and how
| excessive rose memoization (or just instance variables in
| general) can be expensive.
| baggy_trough wrote:
| It would be nice if Ruby had a syntax for memoizing a method
| that would handle the nil/false case and update the shape at
| init time.
| jez wrote:
| In a similar line, it would be nice if there were a way to
| predeclare instance variables so that there was a canonical
| place to put type annotations for instance variables. Ruby
| type checkers right now all invent their own ways of
| declaring the types of instance variables.
|
| If Ruby supported something like class A
| def @instance_variable def
| self.@singleton_class_instance_variable def
| @@class_variable end
|
| as a way to pre-declare the object shape at class load time,
| it would be an obvious place to also let type checkers
| declare the types of these instance variables.
|
| (Not saying that _this specific_ syntax is ideal, but rather
| just that having _some_ syntax for declaring instance
| variables at load time, not method run time, would solve both
| the "predetermined object shape" and "predetermined ivar
| types" problems.)
| jez wrote:
| For those curious, this article is a good introduction to the
| topic:
|
| https://railsatscale.com/2023-10-24-memoization-pattern-and-...
| bwilliams wrote:
| Great article, memoization is pretty complex and full of trade-
| offs. We recognized a lot of these same pitfalls as we worked
| through making memoization a consistent and common pattern in our
| (very large, monolithic Rails) codebase via a `memoize def...`
| helper. The `false` and `null` pitfall is _surprisingly_ common,
| enough that it (and other pitfalls) warranted us writing our own
| memoization DSL.
|
| We took the opposite approach of most libraries though (and a
| reason we rolled our own instead of pulling in an existing gem)
| by avoiding handling the complex edge cases that hide trade-offs:
|
| 1. Class methods can't be memoized.
|
| 2. Methods with arguments can't be memoized (so no LRU caches,
| weak refs, tracking arguments, handling default args, hash/deep
| object equality, etc)
|
| The thought at the time was that those more complex scenarios
| deserve more thought than "prefix it with `memoize`". It's been a
| few years since we introduced the helper and after seeing it all
| throughout our codebase with no issues I'm even more confident
| this was the correct decision.
|
| I haven't revisited it in a while, but I'd love to add some extra
| niceties to it, like hooking into `ActiveRecord`s `#reload`,
| although I dislike memoizing in models.
| kazinator wrote:
| In TXR Lisp, I implemented an original invention of mine called
| Paremeter List Macros (Parameter Macros for short).
|
| A parameter macro works in any lambda syntax. Methods, macros.
|
| The documentation for Parameter List Macros shows an example
| implementation of rudimentary memoization.
|
| https://www.nongnu.org/txr/txr-manpage.html#N-C9CAE162
|
| You can use :memo in a method if you like, in a lambda
| expression, defmacro, macrolet, labels, ...: anywhere you have
| a lambda parameter list.
|
| There is a predefined parameter macro called :key which
| implements keyword parameters. (Common-Lisp-like keyword
| parameters are not native to the language run-time at all: you
| have to use this param macro if you want them.)
|
| Another predefined parameter macro called :match allows you to
| express a function with pattern matching. :match will take
| stock of your patterns and implicitly generate a parameter list
| which can take any of those patterns, and a function body which
| then further handles the cases.
| jweir wrote:
| We no longer use memoization except for older code.
|
| New code has anything that needs to be memoized defined in the
| initialize method. This solves a lot of problems and Sorbet
| typing is what encouraged us to go this route.
|
| Now there could be a branching problem - one branch will not need
| to call the expensive thing another would not. In reality I have
| found this never to be the case in our code base. May be
| different for your designs.
___________________________________________________________________
(page generated 2024-12-23 23:00 UTC)