[HN Gopher] The absurd cost of finalizers in Go
       ___________________________________________________________________
        
       The absurd cost of finalizers in Go
        
       Author : jjgreen
       Score  : 23 points
       Date   : 2023-05-19 15:00 UTC (8 hours ago)
        
 (HTM) web link (lemire.me)
 (TXT) w3m dump (lemire.me)
        
       | leononame wrote:
       | Does anyone know: Is there actually a better way in go?
       | 
       | I've heard a couple of times that C Bindings in go are slow. Is
       | this what is usually referred to?
        
         | silisili wrote:
         | The cost of calling is slow. The C code itself runs at, well, C
         | speed.
         | 
         | So if you're writing a system in both, it's best to have the C
         | side do a lot in each call, vs having the Go side call a lot of
         | tiny funcs.
        
           | kgeist wrote:
           | As far as I remember, there's a lot of machinery involved to
           | make a C call because C doesn't know anything about Go's
           | goroutines and their resizable stacks. So the runtime has to
           | prepare the stack accordingly every time you make a call -
           | and it's done on a dedicated CGO thread with a traditional
           | stack.
        
         | Patrickmi wrote:
         | That's a misconception that cgo is slow, it's improving and
         | it's not slow, when it comes to stacks, Go with its resizable
         | stack and it's concurrency it's not hard to say that it's very
         | much Go is different from other languages. From a normal C
         | function call that returns immediately there's not much over
         | head BUT when C function takes time with the runtime expecting
         | return value, the runtime will try to align with the C stack
         | creating a more traditional stack, I think there's a proposal
         | for a directive to let the compiler know there's no need for
         | the runtime to wait
        
         | assbuttbuttass wrote:
         | The better way is to call free manually, instead of relying on
         | a finalizer.
         | 
         | If you're only calling cgo in a few places, it's not too hard
         | to check each one for memory leaks.
        
       | sieongioetnio wrote:
       | Effective Java 2nd Edition is 15 years old now. Is the Java
       | information at all accurate?
        
       | tedunangst wrote:
       | There's no space for the finalizer in the object, so it has to be
       | recorded on the side, and it's kinda messy.
        
       | rsc wrote:
       | The benchmarks in the blog post are believable but the profile is
       | not. The benchmarks do roughly the same amount of cgo work; all
       | the extra work in the finalizer benchmark is setting up the
       | finalizer, specifically 'runtime.addspecial'.
       | 
       | We ran the benchmark here and confirmed the relative times, but
       | our profile shows runtime.addspecial as expected. The expectation
       | is that finalizers are used rarely, for objects where the
       | lifetime is unknown but typically long. In contrast, a finalizer
       | benchmark is hammering on finalizers in a way that real apps
       | basically never do.
       | 
       | There are also some O(N) things in finalizer registration, which
       | turn into O(N^2) things for N finalizers for nearby allocations
       | (only up to N=1024), again because we expect them to be very
       | rare. Those could be fixed if this use case of tons of finalizers
       | is realistic in practice.
       | 
       | If you have locally scoped C data you would typically defer
       | C.free(p), and if you do that the cost is basically identical to
       | calling free(p) directly.                   BenchmarkAddition-16
       | 22961580         51.77 ns/op         BenchmarkAllocate-16
       | 7611168        144.3 ns/op         BenchmarkAllocateDeferFree-16
       | 7065505        144.3 ns/op         BenchmarkAllocateFinalizer-16
       | 1028251       1243 ns/op
       | 
       | In response to the "cgo is slow" questions, this shows that cgo
       | calls are about 50ns on my four-year-old x86 MacBook Pro. Is that
       | fast or slow? It depends on what the cgo call is doing. If it's
       | executing a single add instruction, an extra 50ns is slow. If
       | it's doing something more substantial, an extra 50ns may be
       | nothing at all.
        
       ___________________________________________________________________
       (page generated 2023-05-19 23:02 UTC)