[HN Gopher] The absurd cost of finalizers in Go
___________________________________________________________________
The absurd cost of finalizers in Go
Author : jjgreen
Score : 23 points
Date : 2023-05-19 15:00 UTC (8 hours ago)
(HTM) web link (lemire.me)
(TXT) w3m dump (lemire.me)
| leononame wrote:
| Does anyone know: Is there actually a better way in go?
|
| I've heard a couple of times that C Bindings in go are slow. Is
| this what is usually referred to?
| silisili wrote:
| The cost of calling is slow. The C code itself runs at, well, C
| speed.
|
| So if you're writing a system in both, it's best to have the C
| side do a lot in each call, vs having the Go side call a lot of
| tiny funcs.
| kgeist wrote:
| As far as I remember, there's a lot of machinery involved to
| make a C call because C doesn't know anything about Go's
| goroutines and their resizable stacks. So the runtime has to
| prepare the stack accordingly every time you make a call -
| and it's done on a dedicated CGO thread with a traditional
| stack.
| Patrickmi wrote:
| That's a misconception that cgo is slow, it's improving and
| it's not slow, when it comes to stacks, Go with its resizable
| stack and it's concurrency it's not hard to say that it's very
| much Go is different from other languages. From a normal C
| function call that returns immediately there's not much over
| head BUT when C function takes time with the runtime expecting
| return value, the runtime will try to align with the C stack
| creating a more traditional stack, I think there's a proposal
| for a directive to let the compiler know there's no need for
| the runtime to wait
| assbuttbuttass wrote:
| The better way is to call free manually, instead of relying on
| a finalizer.
|
| If you're only calling cgo in a few places, it's not too hard
| to check each one for memory leaks.
| sieongioetnio wrote:
| Effective Java 2nd Edition is 15 years old now. Is the Java
| information at all accurate?
| tedunangst wrote:
| There's no space for the finalizer in the object, so it has to be
| recorded on the side, and it's kinda messy.
| rsc wrote:
| The benchmarks in the blog post are believable but the profile is
| not. The benchmarks do roughly the same amount of cgo work; all
| the extra work in the finalizer benchmark is setting up the
| finalizer, specifically 'runtime.addspecial'.
|
| We ran the benchmark here and confirmed the relative times, but
| our profile shows runtime.addspecial as expected. The expectation
| is that finalizers are used rarely, for objects where the
| lifetime is unknown but typically long. In contrast, a finalizer
| benchmark is hammering on finalizers in a way that real apps
| basically never do.
|
| There are also some O(N) things in finalizer registration, which
| turn into O(N^2) things for N finalizers for nearby allocations
| (only up to N=1024), again because we expect them to be very
| rare. Those could be fixed if this use case of tons of finalizers
| is realistic in practice.
|
| If you have locally scoped C data you would typically defer
| C.free(p), and if you do that the cost is basically identical to
| calling free(p) directly. BenchmarkAddition-16
| 22961580 51.77 ns/op BenchmarkAllocate-16
| 7611168 144.3 ns/op BenchmarkAllocateDeferFree-16
| 7065505 144.3 ns/op BenchmarkAllocateFinalizer-16
| 1028251 1243 ns/op
|
| In response to the "cgo is slow" questions, this shows that cgo
| calls are about 50ns on my four-year-old x86 MacBook Pro. Is that
| fast or slow? It depends on what the cgo call is doing. If it's
| executing a single add instruction, an extra 50ns is slow. If
| it's doing something more substantial, an extra 50ns may be
| nothing at all.
___________________________________________________________________
(page generated 2023-05-19 23:02 UTC)