[HN Gopher] The intricacies of implementing memoization in Ruby
       ___________________________________________________________________
        
       The intricacies of implementing memoization in Ruby
        
       Author : thunderbong
       Score  : 97 points
       Date   : 2024-12-23 13:37 UTC (9 hours ago)
        
 (HTM) web link (denisdefreyne.com)
 (TXT) w3m dump (denisdefreyne.com)
        
       | User23 wrote:
       | It's fun how the runtime in seconds for the naive Fibonacci is
       | 36: 14930352 (1.2s)       37: 24157817 (2.0s)       38: 39088169
       | (3.2s)       39: 63245986 (5.2s)       40: 102334155 (8.3s)
       | 41: 165580141 (13.5s)       42: 267914296 (21.8s)
       | 
       | What a familiar pattern in those runtimes!
        
         | izietto wrote:
         | Fibonacci never stops surprising!
        
         | JadeNB wrote:
         | That's a neat observation! But isn't it for the naive reason
         | that, with negligible overhead, the run time of executing
         | `fib(n) = fib(n - 1) + fib(n - 2)` is the run time of executing
         | `fib(n - 1)`, plus the run time of executing `fib(n - 2)`?
         | 
         | On the other hand, if someone had asked me for an estimate on
         | the run time, I probably would have tried breaking out the
         | master theorem instead of thinking of this, so I don't want to
         | downplay the observation.
        
         | thih9 wrote:
         | I know it's pointless but now I'd like to see a fibonacci
         | implementation that returns these runtimes as the result.
        
         | kazinator wrote:
         | Another kind of fun with Fibonacci (sort of):
         | 
         | https://news.ycombinator.com/item?id=14331627
         | 
         | floor(x + 0.5) wants to be round(x), that function having been
         | introduced in C99. Just remnants of old school C coding.
        
       | thih9 wrote:
       | > My preferred approach to [thread safety] is to explicitly not
       | think about thread safety, and delegate this concern to the
       | caller.
       | 
       | Good call IMO. If I was the author then I'd implement this; it
       | would mostly work for current use cases, would be a pain to
       | maintain and impossible to extend. I learned something today.
        
       | tomchuk wrote:
       | Can't mention Fibonacci and memoization in the same sentence
       | without me breaking out my favorite Python party trick:
       | def fib(n, cache = {0: 0, 1: 1}):             if n not in cache:
       | cache[n] = fib(n-1) + fib(n-2)             return cache[n]
        
         | jaggederest wrote:
         | The directly translated ruby version (from stack overflow of
         | course) is even shorter:                   def fib(n,
         | cache=Hash.new{ |h,k| h[k] = k < 2 ? k : h[k-1] + h[k-2] })
         | cache[n]         end
         | 
         | It runs out of stack around 7146 on my machine at least. The
         | python one is limited by the recursion depth limit in my test
         | but of course that's configurable at your own risk.
        
           | vidarh wrote:
           | If you're first going to golf it, endless-def:
           | def fib(n, cache=Hash.new{ |h,k| h[k] = k < 2 ? k : h[k-1] +
           | h[k-2] }) = cache[n]
        
             | jaggederest wrote:
             | It's actually kind of ungolfed. The default version would
             | be just                   fib = Hash.new{ |h,k| h[k] = k <
             | 2 ? k : h[k-1] + h[k-2] }                  fib[7145]
        
               | inopinatus wrote:
               | This is the proper Ruby form since it relies on the []
               | operator and is therefore substitutable for any other
               | callable esp. procs/lambdas, and able to wrap them as
               | well.
               | 
               | This isn't just golf, it's an excellent way to pass
               | around a lazily computed, self-caching closure. I use it
               | often in preference to memoization gems.
               | 
               | Aside from noting the false/nil equivalence concern, the
               | original article seems over-engineered to me. Excessive
               | metaprogramming is a common trap for Ruby devs.
        
             | nyrikki wrote:
             | IMHO, the mamul form is even better for golf, F(n) in O(n
             | log n)
        
           | x86x87 wrote:
           | RUBY_THREAD_VM_STACK_SIZE for Ruby
        
           | jez wrote:
           | The Python version and this Ruby version are not equivalent.
           | 
           | In Ruby, default parameters are allocated for each call to
           | the function, which means that they are not shared across
           | calls.
           | 
           | In Python, default parameters are allocated once and shared
           | across all calls, allowing them to be mutated.
           | 
           | This becomes obvious if we change the Python and Ruby
           | versions to print when they're computing something that is
           | not yet cached. For back-to-back calls to `fib(10)`, the
           | Python version prints only on the first call to `fib(10)`.
           | The Ruby version must recompute all values from 2 - 10 again.
        
         | azhenley wrote:
         | That's beautiful. Why didn't I think of that?!
        
           | rplnt wrote:
           | Probably because not using mutable default arguments is
           | python 101. Just something you don't do or think about. But
           | this is indeed clever use of that gotcha.
        
         | kccqzy wrote:
         | You should see the Mathematica version:
         | fibonacci[0] = 0;         fibonacci[1] = 1;
         | fibonacci[n_] := fibonacci[n] = fibonacci[n-1] + fibonacci[n-2]
         | 
         | It cleverly uses both = and := together. Usually people use =
         | for immediate assignment (such as constants) and := for delayed
         | assignment (such as functions) but this combines the two.
        
         | kazinator wrote:
         | Does that work due to Ruby having Python-like broken semantics
         | for evaluating the default value expressions for optional
         | arguments? (I have no idea.) Or does it work due to the object
         | denoted by {...} being a real literal?
         | 
         | If the default value expression is evaluated on each call to
         | the function in which it is required rather than at function
         | definition time, and if the { ... } syntax is a constructor
         | that creates a fresh object, then this wouldn't work; one of
         | those two conditions (or both) must be broken.
        
         | kazinator wrote:
         | TXR Lisp version:                 (defun fib (n : (cache #H(()
         | (0 1) (1 1))))         (or [cache n]             (set [cache n]
         | (+ (fib (pred n)) (fib (ppred n))))))
         | 
         | TXR Lisp does _not_ have broken Python semantics for evaluating
         | the default expressions for optional arguments. The expressions
         | are freshly evaluated on each call in which they are needed, in
         | the lexical scope in in which the prior parameters are already
         | visible, as well as the scope surrounding the function
         | definition.
         | 
         | Why this works is that the #H hash literal syntax is a true
         | literal. Every time that expression is evaluated, it yields the
         | same hash table, which is mutable. This wouldn't work in an
         | implementation of TXR Lisp in which literals are put into a ROM
         | image or mapped into read only virtual memory. We will not
         | likely see such a thing any time soon.
         | 
         | If we change #H(...) to, say, (hash-props 0 1 1 1), it won't
         | work any more.
        
       | jmull wrote:
       | In real code I've only actually seen memoization used as a cache.
       | That leaves cache validation to some higher level which is in a
       | poor position to do it very well. In my experience, most often it
       | means someone has to figure out what things to manually restart,
       | and when.
        
         | kazinator wrote:
         | You mean invalidation?
         | 
         | Speaking of invalidation, the memoized Fibonacci will work even
         | with a cache of 1 item that is replaced each time. (By work, I
         | mean slash exponential time to linear.)
         | 
         | If you calculate fib(n - 2) first and cache it, evacuating
         | everything else in the process, that is good enough: when f(n -
         | 1) needs f(n - 2), it finds it instead of recursing, and so the
         | multply-and-surrender logic is disrupted, resulting in time
         | proportional to n.
        
       | JadeNB wrote:
       | With all that it borrowed from Perl, I'm surprised Ruby didn't
       | also borrow the defined-or operator `//=`
       | (https://perldoc.perl.org/perlop#Logical-Defined-Or). (Or, if it
       | did, I guess the author hadn't encountered it.)
        
         | bradly wrote:
         | I don't think Ruby has this natively, but Rails adds it with
         | Object#presence.
         | https://api.rubyonrails.org/classes/Object.html#method-i-pre...
        
           | tprynn wrote:
           | ActiveSupport adds tons of great convenience methods to Ruby
           | and you can require it even outside of Rails! blank, present;
           | date and time conversions like 1.month.from_now; except;
           | enumerable methods like pluck... It's just lovely
           | require 'active_support/all'
           | 
           | https://guides.rubyonrails.org/v5.2/active_support_core_exte.
           | ..
        
           | JadeNB wrote:
           | > I don't think Ruby has this natively, but Rails adds it
           | with Object#presence. >
           | https://api.rubyonrails.org/classes/Object.html#method-i-
           | pre...
           | 
           | It's not immediately obvious to me how one could use that to
           | write a Ruby analogue of `a //= b`. If I understand
           | correctly, it would still need to be the relatively verbose
           | `a = a.present? ? a : b`. (I think even `a = a.presence || b`
           | wouldn't work in case `a` was defined but nil, which is
           | exactly the edge case the defined-or operator is meant to
           | handle.)
        
         | gls2ro wrote:
         | Not sure I understand this example from Perl but Ruby has ||=
         | that checks for nil but not defined.
         | 
         | You can do
         | 
         | a ||= b
         | 
         | And if a is nil then it will get the value from b else it will
         | remain with the current value
        
           | viraptor wrote:
           | This really checks for true values, which is fine for nil and
           | numbers. But it will fail you if you want to cache `false` or
           | any falsey object.
        
         | kazinator wrote:
         | That's an anti-pattern. In a properly lexically scoped
         | language, an undefined identifier is erroneous, diagnosable
         | before you even run the code.
         | 
         | If optional dynamic scoping is supported, it can make sense to
         | deal with an undefined variable, but is still something you
         | would want to avoid. E.g. Common Lisp programs almost always
         | assidiously define dynamic variables via a top-level _defvar_.
         | 
         | Testing for definedness is usually part of some hack that
         | doesn't deserve its own syntax as a predefined language
         | feature.
         | 
         | The basic idea is that programs should usually know which of
         | their variables are defined, at what point in the program. It
         | should be statically obvious, or failing that, at least
         | dynamically obvious. The remaining situations can be valid, but
         | are low-priority hacks.
         | 
         | (You can always have your macro for it, if you want it badly
         | enough.)
         | 
         | For the environment var example, we can use the regular _or_
         | (which is much like ||), if the lookup yields _nil_ , which is
         | false:                 1> (or (getenv "HOME") (getenv "LOGDIR")
         | (error "You're homeless!"))       "/Users/kazinator"       2>
         | (or (getenv "XXXHOME") (getenv "LOGDIR") (error "You're
         | homeless!"))       ** You're homeless!       ** during
         | evaluation at expr-2:1 of form (error "You're homeless!")
         | 
         | Looks like Perl made a mess of this simple thing, too.
         | 
         | I understand some languages have an "undefined" value for a
         | variable that is actually defined, but that's pretty lame too;
         | it gives rise to multiple ways of being false. Awk is like
         | this. Awk knows every variable in the program before executing
         | it. GNU Awk builds a global symbol table of all variables at
         | compile time. So every variable is considered defined in the
         | sense of being known to the implementation and existing. It
         | just has a kind of undefined value initially that serves as an
         | empty string or zero, and tests false, but is neither a number
         | nor string. It's a pretty cockamamie design from the POV of
         | larger programs, but does lead to very short one-liners.
        
           | JadeNB wrote:
           | These thoughts on proper typing discipline are a valuable
           | perspective on good programming-language design, but I think
           | that they probably don't help much for writing code in Ruby,
           | which, I believe, doesn't behave this way. (I'm not a Ruby-
           | ist, so can't speak authoritatively to it.)
           | 
           | Nonetheless, supposing that we're working in a language that
           | behaves as you suggest--if memoization isn't built in, how
           | would a user build it in a way that allows caching of the
           | return value `false`? (For example, for testing whether a
           | positive integer is prime, where getting that `false` value
           | could be very expensive.)
        
       | mmahemoff wrote:
       | I've come across memoize gems before and thought, "do I really
       | need to introduce a dependency when I can save results into a
       | hash", but this makes a convincing argument to leverage a gem
       | like memo_wise. Clean DSL syntax and the perf gains are much
       | nicer than micromanaging it.
        
         | jacobevelyn wrote:
         | memo_wise maintainer here. No pressure from me to try it out,
         | but if you do use it and have any feedback or suggestions
         | please let us know!
        
       | nixpulvis wrote:
       | There are only 1 hard problems in computer science: memoization
       | and ... shoot what was it again?
        
         | kazinator wrote:
         | There are only 2 hard problems in computer science:
         | 
         | 1. Naming things
         | 
         | 2. Memori^H^Hdamn it^H^H^H^H^H^H^Hization.
         | 
         | P.S. I see what you did there. You botched your cache
         | invalidation, and so lost the answer.
        
           | drusepth wrote:
           | 3. Off-by-1 errors
        
       | mtkd wrote:
       | With any Ruby code under load, it's important to have a proper
       | understanding of what's happening in memory -- if adding
       | memoisation is yielding significant gains, there's likely much
       | more potential for optimisation
       | 
       | Many in the Ruby community have taken a 1974 quote from Donald
       | Knuth out of context and used it as a justification for minimal
       | or zero effort on memory efficiency
       | 
       | The article is a very interesting read, but implementation is
       | heavily reliant on metaprogramming magic that can be something of
       | a grenade in realworld projects especially if anyone fairly new
       | to Ruby has to fix it later
       | 
       | Almost none of the gems I used 10 years ago still exist today --
       | taking a small penalty for hand-rolling initially will usually
       | save days or more later -- the author even says a similar earlier
       | Gem he built is long gone
        
         | bradly wrote:
         | > if adding memoization is yielding significant gains, there's
         | likely much more potential for optimization
         | 
         | Most often when I see memoization in Ruby code it is when
         | fetching data from either a db or external service. Memory
         | isn't the concern is these cases.
         | 
         | > Almost none of the gems I used 10 years ago still exist today
         | -- taking a small penalty for hand-rolling initially will
         | usually save days or more later -- the author even says a
         | similar earlier Gem he built is long gone
         | 
         | With a clean DSL changing the gem or removing it for your own
         | solution should not be problem.
        
           | mtkd wrote:
           | If app is repeatedly asking DB for same data and it's
           | immutable for a period -- the persistance choice is wrong
           | 
           | There is potentially a whole lot of memory/callstack overhead
           | involved in a request like that even when memoised
           | 
           | It likely should be in a memory structure all the time or
           | Redis if sufficiently complex / shared across processes etc.
           | as all the other processes will likely be doing same
           | 
           | Only in Ruby world would someone say yeah what we need right
           | there is a Gem with some define_method sprinkles
        
             | wutwutwat wrote:
             | > If app is repeatedly asking DB for same data and it's
             | immutable for a period -- the persistance choice is wrong
             | 
             | Says who? ACID databases are the norm for a reason.
             | 
             | > There is potentially a whole lot of memory/callstack
             | overhead involved in a request like that even when memoised
             | 
             | So because there is overhead elsewhere you shouldn't
             | optimize here?
             | 
             | > It likely should be in a memory structure all the time or
             | Redis if sufficiently complex / shared across processes
             | etc. as all the other processes will likely be doing same
             | 
             | Redis doesn't help if the issue is the network io.
             | Accessing anything out of process memory will objectively
             | be slower than accessing in process memory. Doesn't matter
             | if the data is coming from pg, redis, or wherever. If its
             | going across the wire, there will be io latency.
             | 
             | > Only in Ruby world would someone say yeah what we need
             | right there is a Gem with some define_method sprinkles
             | 
             | Ah, I see this is based on a negative view of ruby in
             | particular. Memoization is a common pattern in basically
             | every language, and can net significant performance wins
             | with very little effort (low hanging fruit). Is it a magic
             | bullet? Nope. But it's nice to have in the tool belt when
             | dealing with an app that's falling on its face and every
             | improvment means you get to sleep better at night.
        
         | ufmace wrote:
         | The thing about metaprogramming magic in Ruby is that, it can
         | work okay for one thing, but if you have 2 or 3 or more things
         | all using it to do different things, it's very likely they'll
         | conflict and blow up in a way that's really difficult to
         | troubleshoot or solve.
        
       | 15155 wrote:
       | I see no mention of object shape optimization and why and how
       | excessive rose memoization (or just instance variables in
       | general) can be expensive.
        
         | baggy_trough wrote:
         | It would be nice if Ruby had a syntax for memoizing a method
         | that would handle the nil/false case and update the shape at
         | init time.
        
           | jez wrote:
           | In a similar line, it would be nice if there were a way to
           | predeclare instance variables so that there was a canonical
           | place to put type annotations for instance variables. Ruby
           | type checkers right now all invent their own ways of
           | declaring the types of instance variables.
           | 
           | If Ruby supported something like                   class A
           | def @instance_variable           def
           | self.@singleton_class_instance_variable           def
           | @@class_variable         end
           | 
           | as a way to pre-declare the object shape at class load time,
           | it would be an obvious place to also let type checkers
           | declare the types of these instance variables.
           | 
           | (Not saying that _this specific_ syntax is ideal, but rather
           | just that having _some_ syntax for declaring instance
           | variables at load time, not method run time, would solve both
           | the  "predetermined object shape" and "predetermined ivar
           | types" problems.)
        
         | jez wrote:
         | For those curious, this article is a good introduction to the
         | topic:
         | 
         | https://railsatscale.com/2023-10-24-memoization-pattern-and-...
        
       | bwilliams wrote:
       | Great article, memoization is pretty complex and full of trade-
       | offs. We recognized a lot of these same pitfalls as we worked
       | through making memoization a consistent and common pattern in our
       | (very large, monolithic Rails) codebase via a `memoize def...`
       | helper. The `false` and `null` pitfall is _surprisingly_ common,
       | enough that it (and other pitfalls) warranted us writing our own
       | memoization DSL.
       | 
       | We took the opposite approach of most libraries though (and a
       | reason we rolled our own instead of pulling in an existing gem)
       | by avoiding handling the complex edge cases that hide trade-offs:
       | 
       | 1. Class methods can't be memoized.
       | 
       | 2. Methods with arguments can't be memoized (so no LRU caches,
       | weak refs, tracking arguments, handling default args, hash/deep
       | object equality, etc)
       | 
       | The thought at the time was that those more complex scenarios
       | deserve more thought than "prefix it with `memoize`". It's been a
       | few years since we introduced the helper and after seeing it all
       | throughout our codebase with no issues I'm even more confident
       | this was the correct decision.
       | 
       | I haven't revisited it in a while, but I'd love to add some extra
       | niceties to it, like hooking into `ActiveRecord`s `#reload`,
       | although I dislike memoizing in models.
        
         | kazinator wrote:
         | In TXR Lisp, I implemented an original invention of mine called
         | Paremeter List Macros (Parameter Macros for short).
         | 
         | A parameter macro works in any lambda syntax. Methods, macros.
         | 
         | The documentation for Parameter List Macros shows an example
         | implementation of rudimentary memoization.
         | 
         | https://www.nongnu.org/txr/txr-manpage.html#N-C9CAE162
         | 
         | You can use :memo in a method if you like, in a lambda
         | expression, defmacro, macrolet, labels, ...: anywhere you have
         | a lambda parameter list.
         | 
         | There is a predefined parameter macro called :key which
         | implements keyword parameters. (Common-Lisp-like keyword
         | parameters are not native to the language run-time at all: you
         | have to use this param macro if you want them.)
         | 
         | Another predefined parameter macro called :match allows you to
         | express a function with pattern matching. :match will take
         | stock of your patterns and implicitly generate a parameter list
         | which can take any of those patterns, and a function body which
         | then further handles the cases.
        
       | jweir wrote:
       | We no longer use memoization except for older code.
       | 
       | New code has anything that needs to be memoized defined in the
       | initialize method. This solves a lot of problems and Sorbet
       | typing is what encouraged us to go this route.
       | 
       | Now there could be a branching problem - one branch will not need
       | to call the expensive thing another would not. In reality I have
       | found this never to be the case in our code base. May be
       | different for your designs.
        
       ___________________________________________________________________
       (page generated 2024-12-23 23:00 UTC)