[HN Gopher] Rustgo: Calling Rust from Go with near-zero overhead...
       ___________________________________________________________________
        
       Rustgo: Calling Rust from Go with near-zero overhead (2017)
        
       Author : telotortium
       Score  : 178 points
       Date   : 2024-07-31 05:43 UTC (17 hours ago)
        
 (HTM) web link (words.filippo.io)
 (TXT) w3m dump (words.filippo.io)
        
       | gnabgib wrote:
       | (2017) Discussions:
       | 
       | 2017 (282 points, 68 comments)
       | https://news.ycombinator.com/item?id=15017519
       | 
       | 2019 (107 points, 37 comments)
       | https://news.ycombinator.com/item?id=20600178
        
       | zxilly wrote:
       | There are many years since the article being written. I'm
       | wondering if there is a better solution in 2024.
        
         | ozgrakkurt wrote:
         | Doesn't seem like there is. Probably because go developers want
         | code to be pure go if possible
        
         | mappu wrote:
         | Cgo overhead was made 5-30x faster in go1.21, and it was
         | already within noise levels in this article, so i'd use that.
        
           | chabad360 wrote:
           | Can you provide a reference for that? A quick search for me
           | didn't bring up anything. I'm curious to read more.
        
             | lossolo wrote:
             | https://shane.ai/posts/cgo-performance-in-go1.21/
        
               | mananaysiempre wrote:
               | > Cgo calls take about 40ns, about the same time
               | encoding/json takes to parse a single digit integer.
               | 
               | As an aside, that sure feels like a lot of time to parse
               | a single-digit integer. Not catastrophic by any means,
               | but still, 100 to 200 cycles is a sign the program is
               | definitely Doing Something. Perhaps that's the memory
               | allocator?
        
         | benmmurphy wrote:
         | they changed the ABI in 1.17 to pass arguments in registers
         | instead of the stack: https://go.dev/doc/go1.17#compiler so if
         | you used this solution you might not need to do the fixup
         | anymore if the ABI matches.
        
       | sitkack wrote:
       | This is totally sick! I can't wait to go through these repro
       | steps using new versions.
        
       | Havoc wrote:
       | What is the benefit of this?
       | 
       | I was under the impression that go isn't that far off from rust
       | on most speed metrics being both compiled. So this adds
       | complexity for what gain?
        
         | Yoric wrote:
         | In benchmarks I've seen, Go is ~6 times slower.
         | 
         | But generally speaking, the reason I prefer Rust to Go for most
         | developments is its type system. I could see myself using Rust
         | to handle business-specific/safety-critical logic and Go as a
         | fast prototyping language to assemble them.
        
           | dewey wrote:
           | What part of the Go typing system is lacking for "business-
           | specific/safety-critical logic"?
        
             | randomdata wrote:
             | It lacks expressiveness, which means you actually have to
             | put some thought into your design to avoid butting heads
             | with it. You can't throw up garbage and then keep massaging
             | the types until you can make whatever gobbledygook you came
             | up with fit.
             | 
             | Business/safety critical logic tends to get complicated,
             | and, at least in the case of business software, is more
             | than likely is venturing into unexplored territory, so when
             | you're not very skilled at programming finding the right
             | design can become an insurmountable challenge. A more
             | expressive type system offers a bandaid to work around
             | that.
        
               | smodo wrote:
               | A bit of cart-horse-y reasoning.... An expressive type
               | system helps me to model the nuances that are actually
               | present in the business logic. If I simplify or
               | restructure my model more than real life allows for I
               | would say that causes much more trouble than actually
               | modeling what's happening on the ground.
               | 
               | Yes, when I'm behind my desk I always find ways to make
               | the business logic better and simpler and more fitting.
               | But those damn humans never follow my rules.
        
             | kbolino wrote:
             | Go doesn't have any closed or sealed types. An "enum" is
             | just a subtype and subtypes can't restrict the range of
             | their values. It's possible to have something like a closed
             | tagged union using an interface with an unexported method
             | and type-switching, but the language won't understand this
             | for e.g. detecting whether a switch without default is
             | exhaustive.
             | 
             | There's also no way to declare a non-nil pointer nor to
             | work cleanly with nested nil-able struct fields.
        
             | Yoric wrote:
             | I wasn't planning to turn this into a language flame war,
             | so I'm not going to give details.
             | 
             | Let's just say that the designers of Rust and Go have
             | prioritized very different feature sets when designing
             | their type systems. If you develop in Rust, it's typically
             | because you want to write "fearless code", i.e. attach
             | sophisticated invariants to types to avoid safety issues
             | later in development/coming from client code. If you
             | develop in Go, it's typically because you want
             | "productivity", i.e. reach quickly a point where you code
             | can run and you can start testing it.
             | 
             | As usual, it's a tradeoff. For most tasks, I happen to
             | click better with the former. YMMV
        
         | null_investor wrote:
         | That's widely incorrect, Rust and C are like 2-5x faster than
         | Go?
         | 
         | Go has arguably a better concurrency model, so for some
         | usecases it can work much better, but for some realtime
         | programming stuff, Go is a no-go ba-dum-tss.
         | 
         | For webdev I would never chose Rust over Go, each language has
         | their own advantage.
        
           | lionkor wrote:
           | Go's concurrency model lets you just access the same mutable
           | variable from multiple goroutines - that kind of "good model"
           | you can also get with C and some implementation of channels
           | and green threads.
        
             | brightball wrote:
             | It still offers good structures to help you be careful in
             | these situations, but you're right. It's not as complete of
             | a concurrency model as BEAM languages.
        
             | null_investor wrote:
             | Yes, but having coded in both languages you can see how
             | Go's concurrency system works well with it and fits neatly.
             | 
             | Try using async in Rust and you'll see what I mean, it
             | sucks to use it.
             | 
             | The same applies to cpp, you'll need years and years of
             | experience to write somewhat decent cpp what you can write
             | in Go with a few weeks learning coroutines, and will still
             | be prone to make mistakes. I've done lots of cpp and can
             | affirm that after decades I'm still not sure if my
             | concurrent solution will run as well as my half-assed Go
             | coroutine code.
             | 
             | Go coroutines is a good abstraction that is very fitting to
             | the language, similar to the actor model and the BEAM VM
             | (erlang/elixir).
             | 
             | Use the same abstraction in another language (like JVM or
             | C) and you'll see how the devex makes a huge difference.
        
               | littlestymaar wrote:
               | I'd chose async/await over threads any day, and many
               | people do too, that's why this construct has gained so
               | much popularity in many different languages. Async/await
               | is a rare example in PL design where a new concept is
               | implemented in one language and then adopted by many in
               | the following years.
               | 
               | In fact, the "threads are more ergonomic than
               | async/await" seems to be more of an HN meme than a
               | popular opinion in the wild. (And please don't quote
               | _What color is your function_ since it 's mostly a rant
               | about callback-based concurrency which everyone agree it
               | sucks).
        
               | cube2222 wrote:
               | > since it's mostly a rant about callback-based
               | concurrency which everyone agree it sucks
               | 
               | It really isn't, it's a rant about there being two worlds
               | (colors) of functions - async and not async, which don't
               | compose that well together. I believe Rust is exploring
               | the ability to genericize functions over their asyncness,
               | and I wish them luck in that endeavor.
               | 
               | Async/await is, if anything, a tradeoff for performance
               | and ffi over usability. As they are generally stackless
               | and in the case of rust don't need allocations.
               | 
               | With green threads like in Go you'll generally have to
               | allocate at least a small stack for each goroutine, and
               | you'll have to fix up things for ffi, but it's much more
               | straightforward, all functions have the same color (you
               | just write "blocking" code, but the underlying
               | implementation is async), you don't have an ecosystem
               | split, and you don't have to war with the type system
               | every other day (as many feel like they're doing in Rust
               | when writing async).
               | 
               | It's a pretty good point in case that Java just spent a
               | few years making project loom happen, which effectively
               | brings the Go approach to Java.
               | 
               | Async/await has its place, especially in very
               | performance-critical code, but its usability is, IMO,
               | miles behind threads.
               | 
               | Of course, an ideal world would be green threads with
               | rusts compile-time checking of concurrency safety...
        
               | dgroshev wrote:
               | I'm not sure what you mean by "don't compose well".
               | 
               | You can call sync code from an async function perfectly
               | well. Going the other way around is either very simple in
               | a throwaway code (just use pollster or tokio's block_on),
               | or requires appropriate care (because async means IO and
               | calling a function doing IO means your function now does
               | IO too).
               | 
               | I believe what most people are struggling with is having
               | to think about doing IO instead of just doing IO
               | wherever. But that's a feature, not a bug, and an
               | overarching theme in Rust: you need to think what you're
               | doing and how everything will work together. Yes you
               | can't just put an async call somewhere down the stack
               | without changing the signatures, but _that 's a good
               | thing_.
        
               | LtdJorge wrote:
               | I think you mean block_on. Spawn blocking would be for
               | your first use case, calling sync from async.
        
               | dgroshev wrote:
               | Yes, thank you, my bad. Fixed.
        
               | cube2222 wrote:
               | Rust doesn't model IO in terms of types though, does it?
               | Sure, an async function likely does IO, but maybe it's
               | just receiving from a channel? At the same time 90%
               | (completely made up number) of Rust just does plain
               | blocking IO and you absolutely won't get that in the
               | signature either (other than looking into error types,
               | that is). We're not in Haskell here. You can just make a
               | blocking call in your async code, not notice it does IO
               | (or you're just not proficient in async), and it falls to
               | pieces, your runtime just hung.
               | 
               | While "you can just use pollster" means you now have to
               | wrap every function call to an async-first library in
               | your async-less codebase. Again, this isn't "composing
               | well" in my book, it absolutely does create a divide in
               | the ecosystem.
               | 
               | While in Go all code is written in a blocking fashion,
               | but all IO is done async (either via threadpool or non-
               | blocking syscalls), handled by the runtime, because the
               | threads are green. Of course, that's less performant and
               | falls apart if you start using C libraries that do
               | blocking IO (but it's not like Rust would do better in
               | that latter case).
               | 
               | At the same time, whether modeling IO in the type system
               | is good or not, is I think something where opinions very
               | much differ.
               | 
               | To be clear, again, I'm not bashing Rust async in
               | general, I just think it's a tradeoff and I very much
               | think its usability is worse than just writing
               | straightforward blocking code.
        
               | dgroshev wrote:
               | I disagree with your "just" in "just receiving from a
               | channel". Receiving from a channel is hardly
               | distinguishable from IO, since it can take an arbitrary
               | amount of time and the receiver needs to handle external
               | resource failures (ie the other side of the channel
               | getting dropped). That's very different from normal sync
               | functions parsing JSON, adding matrices, or doing date
               | math. Also note how functions that can't block are still
               | sync, like tokio::sync::oneshot::Sender::send.
               | 
               | I also can't agree with the argument that if it's
               | possible to hang the runtime there's no point in explicit
               | async. Yes people can still make mistakes, but it's
               | better when it's explicitly a mistake and not just
               | haphazard IO everywhere being the norm.
               | 
               | What is the "async-less" codebase for? Why does it have
               | to be "async-less"? A Go codebase is fully async at all
               | times, so why shouldn't a Rust system extend async up to
               | main() (or at least encapsulate IO-heavy parts in a
               | separate async part)?
               | 
               | Runtime behaviour of async rust is ~equivalent to Go, so
               | coming back to my original point, the difference is that
               | 1) Rust allows to have non-async, predictable functions,
               | async is opt-in 2) the opt-in must be explicit 3) you
               | have more choice between intra- and inter-future
               | concurrency (select! vs spawn). All those points are
               | important and empowering, they aren't problems to be
               | solved.
               | 
               | I agree that the flip side is that Rust forces explicit
               | decisions upfront, but that's one of the core premises of
               | Rust.
        
               | littlestymaar wrote:
               | > It really isn't, it's a rant about there being two
               | worlds (colors) of functions - async and not async, which
               | don't compose that well together.
               | 
               | Except the composition problem only manifests itself
               | badly when your "red function" is callback-based. Re-read
               | the blog post and you'll see that the fundamental problem
               | is that red functions are unwieldy (which it really is
               | when you have callbacks, but not when you have
               | async/await).
               | 
               | In Rust async/await is exactly as contaminating as
               | `Option` or `Result`, or "error as return values" in Go
               | or C, which creates the same split (functions which can
               | propagate an error and the ones which don't), but it's
               | almost never being discussed in terms of "function color"
               | because it's not in fact such a big deal.
               | 
               | > I believe Rust is exploring the ability to genericize
               | functions over their asyncness
               | 
               | In fact they (well, in practice mostly Yoshua) would like
               | to genericize over _all effects_ including fallibility
               | because they are well aware that the problems are
               | equivalent. The biggest problem in Rust 's case being the
               | combinatory explosion between different effects.
               | 
               | > Async/await has its place, especially in very
               | performance-critical code, but its usability is, IMO,
               | miles behind threads.
               | 
               | > Of course, an ideal world would be green threads with
               | rusts compile-time checking of concurrency safety...
               | 
               | This is entirely subjective: as I said above I would
               | never willingly works with threads if I can use
               | async/await instead. It's not just a performance thing at
               | all. For many people async/await is just a superior user
               | experience.
               | 
               | In fact it's exactly the same as exceptions vs `Result`:
               | (unchecked) exceptions don't make colored functions
               | whereas `Result` does, but exceptions, like threads hide
               | where something can fail on your code (resp. when your
               | code can get stuck waiting for IO) and this explicitness
               | has value even if it comes with a cost in terms on
               | annotation effort.
               | 
               | Telling someone that prefers async/await that thread is
               | absolutely better is exactly like telling someone that
               | dynamic typing is better than static typing because you
               | have less things to when writing code...
        
               | afdbcreid wrote:
               | > Except the composition problem only manifests itself
               | badly when your "red function" is callback-based. Re-read
               | the blog post and you'll see that the fundamental problem
               | is that red functions are unwieldy (which it really is
               | when you have callbacks, but not when you have
               | async/await).
               | 
               | Not only; the problem also manifests when you
               | interoperate with blue functions, for example because
               | they are provided to you by libraries.
        
               | littlestymaar wrote:
               | But even Go have a red/blue distinction (functions that
               | return an error and the ones that don't), and in practice
               | it's so fine it never occurred to you that it was in fact
               | a red/blue split!
               | 
               | The real _problem_ came in JavaScript with the callback
               | hell, everything else is a minor issue that nobody really
               | notice.
        
               | thadt wrote:
               | Not "threads are more ergonomic than async/await", but
               | "threads and _channels_ can be ergonomic ".
               | 
               | I've been a big fan and user of async/await in various
               | languages for over a decade now, and it can absolutely
               | improve reading sequential asynchronous logic flow, but
               | lets not pretend it isn't without tradeoffs. Compared to
               | wrangling synchronization contexts and function colors,
               | channels can be rather straightforward.
        
               | littlestymaar wrote:
               | Errors as return value are a "function color" too (in
               | practice, you need to propagate it upwards the stack and
               | it "contaminates" your code the same way as async/await).
               | And in fact it interacts better with async/await (like in
               | Rust) than with channels...
        
           | masklinn wrote:
           | > Go has arguably a better concurrency model
           | 
           | Go might have a better concurrency _implementation_ , Go's
           | concurrency _model_ is pretty much just threads and telling
           | you to git gud scrub.
        
             | null_investor wrote:
             | That's true, it's more precise to say that it's an
             | implementation. I'm happy you understood what I meant
             | though.
             | 
             | The developer experience on using it is much better than
             | using coroutines in another lang.
        
         | aranw wrote:
         | The benefit isn't necessarily for speed there are some
         | libraries for example available in Rust that are not in Go.
         | This is something I am exploring at the moment rather than
         | having to totally rewrite the library in Go
        
         | gkbrk wrote:
         | > go isn't that far off from rust on most speed metrics being
         | both compiled
         | 
         | The only way I see this happening for the same code snippet is
         | if the Rust code was being compiled in debug mode instead of
         | release mode.
        
         | Kinrany wrote:
         | Libraries can be written once and then used in both languages.
        
         | afdbcreid wrote:
         | Allocating is much easier in Go and much harder to eliminate in
         | it, for example.
         | 
         | Also, for some kinds of code (usually not CRUD), LLVM does
         | better job than the Go compiler.
         | 
         | And if you do concurrency, Rust's async await can be faster
         | than Go's approach (but it is a lot more complicated, and the
         | perf gain usually doesn't matter).
        
       | nickcw wrote:
       | That was a great read. All that linker wrangling is sure to break
       | on the next version of go/rust/linker isn't it?
       | 
       | I wonder if it would have been easier to disassemble the rust
       | binary and turn it into Go assembly and use that.
       | 
       | That would need a fairly complicated program to process the
       | binary back into assembler. Maybe getting the rust compiler to
       | output assembly and processing into Go assembly would be the way.
       | 
       | Using Go assembly would save fighting with the linker, be more
       | likely to survive upgrades and it would be cross platform (well
       | at least on platforms with the same CPU arch).
        
         | neonsunset wrote:
         | Imagine picking a language with terrible FFI overhead and weak
         | compiler only to fight these two worsts aspects of it in an
         | attempt to fix them.
         | 
         | C# with zero-cost FFI, none of the performance penalty and
         | ability to statically link everything together is a strictly
         | better choice. Saner type system and syntax too.
        
           | kgeist wrote:
           | C#'s FFI is kind of zero-cost with blittable types (int,
           | float). You still need to do marshalling for anything more
           | complex (strings was a common issue on Linux, UTF8<=>UTF16),
           | also memory pinning. Last time I checked, it also notifies
           | the GC the thread entered native code (can't be preempted,
           | the stack needs to be scanned conservatively etc.) After
           | exiting a native function IIRC there's a safepoint check. I
           | remember years ago it was common knowledge that P/Invoke is
           | not suitable for calling hot functions in a loop, you had to
           | create native helper functions which make all the calls in
           | one go. Maybe it has changed?
        
             | neonsunset wrote:
             | Memory pinning is practically free and GC does quite a bit
             | of work to further minimize its impact on throughput,
             | common practice is to use stack buffers or natively
             | allocated memory for marhsalled or other data anyway
             | (remember - free FFI, so you can always do malloc and free
             | which marshallers do). In practice UTF-16<->UTF-8
             | conversion rarely shows up on flamegraph as it turns out
             | built-in transcoding is very fast and does not leave
             | dangling data that GC needs to clean up. It is also not as
             | frequently needed - you can just get a byte* for free out
             | of "hello, world"u8 and pass it without any extra
             | operations (the binding generator will do that for you).
             | 
             | On top of that, complex blittable structures are first and
             | foremost _expressible_ in C# - you can easily have C binary
             | tree, or an array of arrays with byte**. Performance-
             | sensitive code that does interop heavily exploits that to
             | give actual C-like experience.
             | 
             | Also short-lived FFI calls do not need to notify GC (which
             | is also cheap it toggles a boolean), libraries that care
             | about extra last nanosecond annotate them with
             | `[SuppressGCTransition]` which further streamlines assembly
             | around FFI call.
             | 
             | For performing FFI in a loop - the cost comes from the same
             | reason a non-inlineable cross-compilation-unit calls are
             | expensive within C++. Because we're talking plain call
             | level of overhead, which is by definition as cheap as FFI
             | gets. Of course that can still be a bottleneck if you have
             | a small function beyond FFI boundary within a hot loop. In
             | that case you might as well port it to C# to make it
             | inlineable which is going to be faster, or have an FFI call
             | for a batched operation.
             | 
             | Try `dotnet publish -o . -p:StripSymbols=false`ing this: ht
             | tps://github.com/U8String/U8String/tree/main/Examples/Inte.
             | .. and then disassembling it with Ghidra. You will be
             | positively impressed with how codegen looks - there will be
             | simple direct calls into Rust with single bool checks for a
             | potential GC poll after them (and compiler will merge those
             | too after multiple consecutive calls).
        
               | aatd86 wrote:
               | So it's actually about the same situation in Go from what
               | I can understand.
        
               | neonsunset wrote:
               | No, Go FFI is so slow it makes Python look fast which has
               | great FFI performance, just being an interpreted language
               | hurts. It needs to perform stack switching and worker
               | thread pinning, and for some reason even that is slow. I
               | don't know why.
               | 
               | Note on tinygo as I've been put in the jail heh:
               | 
               | Realistically no one runs TinyGo in production as in
               | back-end workloads or larger user-facing applications.
               | And when you do run it, at most you match pre-existing
               | FFI performance of .NET which ranges at 0.5-2ns at
               | throughput (which I assume how you are testing it)
               | depending on flags and codegen around specific arguments.
               | Up to 50 times difference with a standard Go, that is a
               | lot, isn't it? All custom runtime flavours in Go usually
               | come with significant performance issues or other
               | tradeoffs. Something I never need to deal with when I
               | solve the same problems with .NET, maybe occasionally
               | addressing compatibility with AOT for libraries that have
               | not been updated if it's a desired deployment mode.
        
               | randomdata wrote:
               | _> No, Go FFI is so slow it makes Python look fast_
               | 
               | Which Python and which Go? There are so many different
               | implementations and different versions of those
               | implementations that this broad statement is meaningless.
               | 
               | From what I have installed on my machine, CPython 3.11.6
               | seems to take around 80ns to call a C function, gc 1.22.0
               | takes around 50ns to call the same C function, _and_
               | tinygo 0.32.0 only takes around _1ns_!
               | 
               | Such benchmarking is always fraught with problems, so
               | your milage may vary, but on my machine under this
               | particular test Go wins in both cases, and tinygo is
               | doing so at about the same speed as C itself.
        
               | darby_nine wrote:
               | > CPython 3.11.6 seems to take around 80ns to call a C
               | function, gc 1.22.0 takes around 50ns to call the same C
               | function
               | 
               | How does this make any sense? C shares a runtime with
               | CPython. What is it doing where it manages to be slower
               | than go?
        
               | aatd86 wrote:
               | Are there some recent measurements? All I could find is
               | somewhat old and I think there has been some work done
               | since.
        
               | gen2brain wrote:
               | Maybe it just looks slow to you, for example, check the
               | raylib bindings benchmarks, Go is usually at the top,
               | together with Rust, C#, etc. and Python is at the very
               | bottom. Perhaps not the best way to benchmark FFI but it
               | shows that Python is not even usable besides playing.
        
               | jerf wrote:
               | In Go, once you've FFI'd, you're likely to be able to use
               | the Go data directly in C. In Python, you generally have
               | to crawl over the Python-based data at Python speeds,
               | converting it into data that your C library can
               | understand, and then when C is done, convert it back into
               | a Python data type at, again, effectively Python speeds.
               | Unless you're willing for it to just be opaque data that
               | Python effectively can route, but not manipulate. (This
               | works, but is distinctly less useful. Of course
               | sometimes, like for image data, it's the best option.)
               | Combined with the sibling comment that actually times
               | calls and finds Go is faster anyhow, even if it is a
               | microbenchmark, I don't think your argument carries
               | water.
               | 
               | It sounds to me like you've latched on to some propaganda
               | that you like and are happy to share but don't have any
               | personal experience with. Go's FFI is relatively slow for
               | a _compiled_ language, but it isn 't even remotely
               | _uniquely_ slow. _Many_ languages have a C FFI with some
               | sort of relatively expensive C conversion operation, and
               | as I point out here, that actually includes Python in
               | many, if not most, uses (it 's pretty rare to want to
               | write code to bind to C that just ships over two integers
               | and returns another integer or something, usually we're
               | doing something _interesting_ ). It is the languages like
               | Rust or Zig that can do it for effectively free that are
               | the exception, not the rule. Go's FFI costs are not all
               | that expensive compared to programming languages in
               | general, and if you're worried about the FFI performance,
               | Go also generally needs FFI less in the first place than
               | Python because it's a fast language (not _the fastest_ by
               | any means, but fast), and the higher proportion of Go
               | code will generally smoke Python anyhow. Unless you
               | foolishly write a very tight loop of C FFI code in Go,
               | which is a bad idea in most langauges anyhow (again
               | excepting the exceptions like Rust and Zig), Go 's going
               | to outpace a Python program that is mostly Python but
               | uses a bit of FFI here and there by a lot anyhow.
               | 
               | The idea that Go is brought to its knees by a single C
               | call that takes 100ms or something reminds me of the
               | people who think that as soon as you use a garbage
               | collected language you've signed up for a guaranteed
               | 250ms stop-the-world pause every three seconds or
               | something. Is it free? No. Is it expensive? In _relative_
               | terms maybe. In absolute terms, not really. Most
               | programs, most of the time you won 't notice, and if you
               | are in a situation where Python is even a performance
               | _option_ virtually by definition you 're in a situation
               | where it won't be a problem for Go.
               | 
               | This is not rah-rah for Go or slagging on Python. This is
               | just stuff engineers need to know. Python is a very
               | capable language, but you definitely pay for it. It is
               | not free. Every greenfield project, you need to sit down
               | and calculate the costs and pick a good tool, but you're
               | going to make dumb, project-killing decisions if you're
               | using costs that are multiple orders of magnitude off of
               | reality. I've seen it kill projects, it's not just
               | theory.
        
               | neonsunset wrote:
               | Thank you for responding. Indeed, it's not _quite_ as bad
               | as Python in overall performance. But I 'm happy it
               | brought attention to the fact that Go is still pretty
               | inadequate at this.
               | 
               | My main point is engineers keep trying to shoehorn it in
               | domains, where, should they not want to use Rust or Zig
               | as you mentioned (which are great), they should have
               | chosen C# (which is also great for FFI, look at Stride3D,
               | Ryujinx or even its Sqlite driver speed, all of which are
               | FFI heavy), but instead they keep attempting to use Go
               | where they have to work hard to counteract its
               | inadequacy, instead of using a platform where their
               | solution would perform great not _despite_ the tool but
               | _because_ of it.
        
               | randomdata wrote:
               | The biggest problem with FFI in Go (gc, at least) isn't
               | in the FFI operation itself, rather FFI functions that
               | block for a long time mess with the scheduler.
               | 
               | It seems C# suffers the same problem as real-world
               | benchmarks often show it to be slower than Go when
               | performing FFI, even if it should be theoretically
               | faster.
               | 
               | As usual, performance can be hard to predict.
               | Benchmarking the actual code you intend to use is the
               | only way to ensure that what you think is true actually
               | is. If you aren't measuring, you aren't engineering.
        
               | neonsunset wrote:
               | Do you have any example of code that can demonstrate how
               | it is possible to meaningfully slow down .NET and its
               | threadpool and GC when performing FFI where Go does not
               | suffer to a significantly greater extent?
               | 
               | (if you fashion a Go example - that'll be enough and I'll
               | make a C# one)
               | 
               | .NET's threadpool is _specifically_ made with the
               | consideration of worker threads being potentially blocked
               | in mind, and has two mechanisms to counteract it - hill-
               | climbing algorithm that grows and shrinks the active
               | number of threads to minimize task wait time in queues,
               | and another mechanism to actively detect blocked threads
               | (like system sleep or blocked by synchronous network
               | read) and inject additional workers without waiting for
               | hill-climbing to kick in. It is a very resilient design.
               | Go 's and Tokio threadpool are comparatively lower effort
               | - both are work-stealing designs but neither has the
               | active scaling mechanism .NET has already had since .NET
               | Framework days.
               | 
               | GC implementation at the same time is pinning-aware and
               | can shuffle around memory in such a way to allow other
               | objects to participate in collection or promotion to
               | older generations while keeping the pinned memory where
               | it is. There have been years of work towards improving
               | this, and there is also an additional pinned memory heap
               | for long-lived pinned allocations on the rare occasion
               | where just performing malloc is not appropriate.
               | 
               | I doubt there is any other high-level language or
               | platform that can compete on FFI with .NET, something
               | that has been considered as a part of its design since
               | the very first version. If you want better experience
               | your main upgrade options are literally C, C++, Zig,
               | Rust, and honorable mention Swift (it is mostly a side-
               | grade, with the heavy lifting done by LLVM).
        
               | randomdata wrote:
               | _> Do you have any example of code_
               | 
               | No better than your own code. Why not put it to the test?
               | In the end you will either know that you made the right
               | choice, or have the better solution ready to swap in. You
               | can't lose.
               | 
               | The key takeaway from the previous comment isn't some
               | pointless C# vs Go comparison, it is that performance can
               | be hard to predict. Someone else's code isn't yours. It
               | won't tell you anything about yours. Measure and find
               | out!
        
               | neonsunset wrote:
               | You did mention there exists an FFI scenario where Go
               | supposedly performs better. It would be interesting to
               | look at it, given the claim.
        
               | randomdata wrote:
               | What, exactly, is interesting about arbitrary benchmarks?
               | It might just be the C# code is slower because the
               | developer accidentally introduced different, less
               | performant, logic. It doesn't tell you anything. Only
               | your own code can tell you something. I am not sure how
               | to state this more clearly.
               | 
               | What would actually be interesting is to see you gain
               | those important nanoseconds of performance that is so
               | critical to your business. We want to see you succeed
               | (even if you don't seem the want the same for yourself?).
        
               | Capricorn2481 wrote:
               | > It might just be the C# code is slower because the
               | developer accidentally introduced different, less
               | performant, logic
               | 
               | That is why the user is asking for specific code. So they
               | can audit whether this is a case when someone claims Go
               | FFI is fine.
               | 
               | As an outsider, all I see are two people saying "no it's
               | not slow," just about different languages. But until I
               | have a production app in either I'll never know.
        
               | randomdata wrote:
               | _> So they can audit whether this is a case when someone
               | claims Go FFI is fine._
               | 
               | Of course nobody would claim such a thing. The cost is
               | real. Whether or not Go is fine will depend entirely on
               | what kind of problem environment you are dealing with and
               | what your own code actually looks like. Someone else's
               | code will never tell you this. There is no shortcut here
               | other than to measure your own code.
               | 
               |  _> As an outsider, all I see are two people saying  "no
               | it's not slow," just about different languages._
               | 
               | It is slow, relatively speaking. But does that matter in
               | your particular situation? Random internet benchmarks
               | show that Python is always slower, _way_ slower, yet
               | people find all kinds of productive uses for Python -
               | even in domains where computational performance is very
               | important! And if it does matter for what you 're doing,
               | are you sure you actually picked the fastest option?
               | Measure and find out.
               | 
               | It is good to have rough estimates, but all of these
               | languages are operating within the same approximate
               | timescale here. It's not like Go, C#, or any other
               | language is taking minutes to perform FFI. When you
               | really do need to shave those nanoseconds off, guessing
               | isn't going to get you there. Measure!
        
               | Capricorn2481 wrote:
               | I'm in agreement, but it's even simpler than you're
               | making it.
               | 
               | If Go is slow in a certain context, I would want to know
               | what that context is. If there's a certain task that
               | takes a few more ms in goroutines due to some
               | implementation detail, I would know not to use Go if that
               | task needed to be 100,000 times. Perhaps I need to
               | rethink the task itself, or maybe that's not possible for
               | an organizational reason.
               | 
               | It wouldn't be a "random internet benchmark" unless I
               | didn't understand the context. What's random is saying
               | this
               | 
               | >It seems C# suffers the same problem as real-world
               | benchmarks often show it to be slower than Go when
               | performing FFI, even if it should be theoretically faster
               | 
               | How is this better than asking for code examples?
        
               | randomdata wrote:
               | _> If Go is slow in a certain context, I would want to
               | know what that context is._
               | 
               | You'll know as soon as you measure it. Not exactly rocket
               | science, just plain old engineering. Measuring is what
               | engineers do. You wouldn't build a bridge without first
               | measuring the properties of the materials, and you
               | wouldn't build a program without measuring the properties
               | of its 'materials'.
               | 
               | You make a good point that it is strange we don't get
               | better datasheets from 'material manufacturers' about the
               | base measurements. That wouldn't fly in any other
               | engineering discipline, but I guess that's the nature of
               | software still being young. As unfortunate as that may
               | be, you can't fight the state of affairs, you're just
               | going to have to roll up your sleeves. Such is life.
               | 
               |  _> How is this better than asking for code examples?_
               | 
               | Cunningham's law explains why it is better.
        
               | superb_dev wrote:
               | Could it be that Go has other benefits that outweigh ffi
               | being a little slower?
        
               | darby_nine wrote:
               | > but it isn't even remotely uniquely slow.
               | 
               | Most languages share a runtime and stack model with C. Go
               | is "unique" among popular languages in that it decided to
               | do its own thing, which results in much slower FFI calls.
               | Like sure, so did GHC, but most people don't expect GHC
               | to behave like C.
               | 
               | TBH it's enough to put me off of the language outside of
               | writing servers. There's just too much to draw from in
               | terms of libc-based libraries and the drawbacks in Go are
               | too severe to make its interesting concurrency ideas
               | generally worth it.
               | 
               | EDIT: Not to mention if you have WASI as a target go is a
               | _terrible_ choice for the exact same reasons--it has its
               | unique memory and stack model that don 't work well with
               | web assembly.
        
               | jerf wrote:
               | Again, people throw around "much slower" and it comes off
               | like it's 100ms or something. It's not that much slower,
               | it's not even close.
               | 
               | It's only an issue if you're planning on making tens or
               | hundreds of thousands of FFI calls per second, routinely.
               | That describes a non-zero set of software people may want
               | to write. If you are writing one of them, you need to
               | know that. But it doesn't describe anything like a
               | majority of software cases in general.
               | 
               | It is one of the things you need to know, but you need to
               | have a correct view of the costs to make correct
               | decisions, or, at least, not one that's off by orders of
               | magnitude and leads to people running around claiming Go
               | is "uniquely slow" at FFI and bragging about how much
               | faster Python is when it turns out "uniquely slow" Go is
               | actually faster than Python. Costs aren't a matter of
               | feelings or what reinforces your decisions about what
               | language to use or how much you hate that Go doesn't have
               | sum types. Costs are what they are.
               | 
               | My personal favorite, and bear with me because this is
               | going to be generally a negative for Go, is the number of
               | databases that for some reason are getting written in Go.
               | My rule of thumb is that Go is 2-3x slower than C/C++
               | (and, increasingly in that set, Rust). On the grand
               | landscape of programming languages, this puts it
               | distinctly towards the faster end in a general sense;
               | there are very popular languages clocking in at
               | "generally 40x-50x slower and also can't use more than
               | one CPU at a time". But if you're writing a database,
               | you're going into a market where it's virtually
               | _guaranteed_ that 's going to be a problem. Maybe don't
               | do that. But then deciding that you aren't going to use
               | Go to write your command-line app to hit an HTTP API
               | because it's not the fastest language for databases is
               | not a correct engineering conclusion to draw.
        
               | kgeist wrote:
               | The only significant conceptual difference between Go's
               | and C#'s FFI mechanisms when it comes to call overhead
               | that I can think of is the fact that Go has to switch the
               | current stack to a special C stack. In C#, it runs on the
               | same stack.
        
           | fingerlocks wrote:
           | This is _hacker_ news, doing the thing it wasn't designed to
           | do is the point. The harder the challenge, the better.
           | 
           | Yes there are tons of better choices for rust interop. Any
           | LLVM language will work. That's not interesting.
        
         | FiloSottile wrote:
         | That exist(ed)! c2goasm would compile C and then decompile it
         | into Go asm.
         | 
         | https://github.com/minio/c2goasm
        
       | odanalysis wrote:
       | Does this mean you could now use gokrazy to run rust apps on an
       | sbc with better startup speeds than linux??
        
         | sureglymop wrote:
         | If you're calling rust from go, wouldn't you still have the go
         | startup time before you could call the rust code?
        
       | yutijke wrote:
       | https://github.com/petermattis/fastcgo, which is now 7 years old
       | seems to do something similar without the need to about obscure
       | CGO FFI configuration. It also seems to be more generally
       | applicable for any language with C interop.
       | 
       | There had been an issue for having something similar in the
       | language itself - https://github.com/golang/go/issues/42469, but
       | the Golang compiler team rejected it. If you have followed
       | similar discussions around this with the Golang compiler team,
       | you will notice a pattern of interaction that strongly indicates
       | that they are very much opposed to ever accepting this into the
       | compiler.
        
         | dontlaugh wrote:
         | fastcgo links to rustgo
        
       | roundup wrote:
       | Also,
       | https://blog.yuchanns.xyz/post/83397808-6849-4bc5-8a09-18765...
        
       | ethegwo wrote:
       | it makes me think about
       | https://news.ycombinator.com/item?id=41117749 it is the mirror of
       | this project: effient wal to call Go from Rust
        
       | hardwaregeek wrote:
       | lol I remember looking at this when doing our major port from Go
       | to Rust. I noped out once I saw the raw assembly portion. Great
       | for a side project but probably not the move for a cross platform
       | binary run by thousands of people. Still a very cool post and the
       | literature for Go/Rust interop is def lacking
        
       ___________________________________________________________________
       (page generated 2024-07-31 23:01 UTC)