[HN Gopher] Ruby Garbage Collection Deep Dive: Tri-Color Mark an...
___________________________________________________________________
Ruby Garbage Collection Deep Dive: Tri-Color Mark and Sweep
Author : mooreds
Score : 90 points
Date : 2021-02-18 17:32 UTC (5 hours ago)
(HTM) web link (jemma.dev)
(TXT) w3m dump (jemma.dev)
| jemmaissroff wrote:
| Author here, let me know if you have any questions / suggestions
| for future blog posts. Planning a series around Ruby's GC,
| definitely with future posts on incremental marking, generational
| gc and compaction.
|
| Also! Writing a book about managed GC with a focus on Ruby. I'll
| keep posting updates on twitter @jemmaissroff or you can sign up
| for updates at https://buttondown.email/jemmaissroff
| ammanley wrote:
| Thank you for contributing to this. Very excited to see more of
| this.
| eklitzke wrote:
| Does rvalue mean something different here than it normally means?
| Normally rvalues refer to temporary values.
| tenderlove wrote:
| RVALUE is the base type in Ruby's heap. It refers specifically
| to [this struct](https://github.com/ruby/ruby/blob/7bd93293621b
| 85a87e7e117317...). All Ruby objects are instances of an RVALUE
| struct.
|
| Hope that helps!
| WJW wrote:
| In particular in Ruby it derives from "Ruby value" rather
| than the C/C++ rvalues that can appear on the righthand side
| of an assignment.
| snvsn wrote:
| Yes. In the Ruby VM, an RVALUE [0] is a union (a "generic
| type") of structs that represent all the different kinds of
| Ruby object types and a couple of runtime types.
|
| [0] https://sonots.github.io/ruby-
| capi/db/d8e/struct_r_v_a_l_u_e...
| jemmaissroff wrote:
| The first post in the series also explains RVALUES in the
| context of the Ruby Heap if it's helpful
| https://jemma.dev/blog/gc-internal
| dragontamer wrote:
| Rvalue seems to be a 40-byte "Ruby Object". (If an object takes
| up more than 40-bytes, it gets an additional allocation, and
| the first 40-bytes go into the rvalue "slot").
| dragontamer wrote:
| The tri-color invariant is "obvious" in single-threaded stop-the-
| world mark and sweep algorithms. There's no need to track color
| in single-threaded stop-the-world.
|
| ----------
|
| The reason why you need to formalize the tri-color invariant is
| because of __multithreading__ issues. If Thread#2 changes an
| object from "object.blah = object2", and "object" is black
| ("already processed" by the garbage collector), then you MUST set
| object2 to grey or black.
|
| White-objects and grey-objects do NOT need to be processed in
| this manner. (ex: if "object" is white, then the GC will process
| object in the future. Ditto if object is grey)
|
| ----------
|
| That's the real secret of tri-color mark and sweep. Your explicit
| color-tracking allows for parallel garbage collection on
| multicore systems.
|
| Now the method to hold the tri-color invariant depends on a
| number of decisions: maybe there's a "read barrier", or maybe
| there's a "write barrier" (not a barrier in a multithreaded
| sense, but that's the terminology that garbage collector
| textbooks use).
|
| ------------
|
| > There is one more nuance here. As of Ruby 3.0, if auto-
| compaction is enabled, compaction will actually happen as part of
| the sweeping phase. A more in depth explanation of how and why
| this happens will follow in a later post about compaction in this
| Garbage Collection Deep Dive Series.
|
| That means this is not a Mark-and-Sweep algorithm, but instead a
| Mark-and-Compact algorithm.
|
| Mark-and-Compact has more technical issues than Mark-and-Sweep.
| The devil is in the details, so to speak. If you're moving
| pointers around at the lowest level, you need to have either:
|
| 1. A level of indirection on all reads: because the GC may "move"
| and object while you weren't looking.
|
| 2. OR, a way for the GC to set all pointers in all registers,
| variables, and objects, to the "new correct location" while your
| code wasn't looking. (Aka: "Stop the world" methodology)
|
| Compaction is great at reducing fragmentation (which makes future
| "malloc" more efficient... and also causes the code to use less
| memory). But it does take more effort to compact the heap.
| tenderlove wrote:
| > The reason why you need to formalize the tri-color invariant
| is because of __multithreading__ issues.
|
| Not specifically "multithreading" issues, but rather
| concurrency. MRI implements incremental marking and lazy
| sweeping. The GC and the mutator execute in the same thread,
| but they execute concurrently. The tri-color algorithm ensures
| consistency in a single threaded, concurrently executing
| environment. It allows us to "pause" the GC in the middle of
| marking and allow the mutator to continue in the same thread.
|
| > 2. OR, a way for the GC to set all pointers in all registers,
| variables, and objects, to the "new correct location" while
| your code wasn't looking. (Aka: "Stop the world" methodology)
|
| We're also able to do compaction concurrently with the mutator
| via a read barrier.
|
| > But it does take more effort to compact the heap.
|
| Indeed it does!
| ufo wrote:
| This is very important to remember. A lot of the time, the
| incremental GC is still single threaded. The difference is
| that while a two-color "stop the world" algorithm has to
| collect everything in one long GC pause, the three-color
| incremental algorithm can spread the work over a series of
| smaller pauses.
| treis wrote:
| How does it handle the case where a RVALUE is added to an
| already processed black node in between pauses?
| tenderlove wrote:
| MRI implements a write barrier. You can check out the
| write barrier implementation here: https:
| //github.com/ruby/ruby/blob/7bd93293621b85a87e7e117317612
| bb0a84efb7a/gc.c#L7802
|
| The behavior for writes on a black object is defined in
| this function: https://github.com/ruby/ru
| by/blob/7bd93293621b85a87e7e117317612bb0a84efb7a/gc.c#L77
| 66
|
| Basically when a black -> white reference is written
| during incremental marking, the white object will be
| grey'd and added to the mark stack.
|
| Happy Thursday!
| ufo wrote:
| Alternatively, you can also move the gc "backwards",
| changing the black object. In Lua, sometimes the write
| barrier recolors the white child object to grey and
| sometimes it recolors the black parent object to grey.
|
| IIRC, the motivation for going backwards is that if you
| assign a large number of white children, recoloring the
| parent once is faster than recoloring all the children.
| alberth wrote:
| Lots of research on Tri-Color from Mike Pall of LuaJIT fame.
|
| http://wiki.luajit.org/New-Garbage-Collector
___________________________________________________________________
(page generated 2021-02-18 23:00 UTC)