[HN Gopher] Crossing the impossible FFI boundary, and my gradual...
       ___________________________________________________________________
        
       Crossing the impossible FFI boundary, and my gradual descent into
       madness
        
       Author : signa11
       Score  : 88 points
       Date   : 2024-06-17 15:17 UTC (7 hours ago)
        
 (HTM) web link (verdagon.dev)
 (TXT) w3m dump (verdagon.dev)
        
       | jauntywundrkind wrote:
       | There's a huge amount of doom & gloom, prophecies of failure
       | against wasm's component-model, a latent expectation that trying
       | to solve FFI is impossible & destined to failure. But what if?
       | 
       | It's be so neat for language creators to be able to use &
       | leverage other works. Getting there wouldn't be easy, but there's
       | be a standard path to getting the hard fought capability here.
        
       | kodablah wrote:
       | It seems like the struggle here is trying to use Rust
       | transparently/automatically from another language instead of just
       | make bindings easier. I have found that trying to auto-FFI
       | existing Rust types is not the best for languages because there
       | is often an impedance mismatch with how the language treats
       | things and how Rust does. Therefore trying an always-works
       | transparent binding may inevitably end up with people asking for
       | more flexibility to fit the language better (e.g. controlling
       | lifetime semantics, type mappings/copying, etc).
       | 
       | I think it's clearer to take an approach like Neon and PyO3 and
       | other FFI-to-lang helpers do where you just make it easy/safe to
       | write these Vale functions in Rust.
        
         | DSMan195276 wrote:
         | I agree with you, but it's always hard to ignore the allure of
         | not needing to write all the bindings manually. If nobody is
         | willing to write the bulk of the initial bindings then the
         | chance of someone using it seems low, and in theory writing a
         | transparent layer between the two takes less time/effort (in
         | practice I agree that the incompatibilities will make it messy
         | long term).
         | 
         | Rust has the same problem with C APIs, in the past I've went to
         | use something and found that the binding was not there. For a
         | couple functions it's no big deal, but if say half or more of
         | the ones I needed weren't there already then I wouldn't have
         | bothered trying to use it at all.
        
       | toyg wrote:
       | _> Anyone trying to make a new mainstream language is completely
       | insane, unless they 're backed by a huge corporation. There are
       | only two exceptions in the last 25 years that come close: Scala
       | and Kotlin_
       | 
       | Kotlin was designed and backed by JetBrains from the start. Maybe
       | not a "huge" corporation but a pretty big company still (by
       | revenue).
        
         | iudqnolq wrote:
         | I don't know the story of how the Android team went Kotlin-
         | first. If that wasn't a deliberate plan they got quite lucky.
         | Could Kotlin arguably be backed by Google?
        
           | izacus wrote:
           | No, the Android community adopted Kotlin before Google added
           | any support for it from their side.
        
             | kernal wrote:
             | I don't know when the first Kotlin Android app was
             | published, but Kotlin 1.0 was released in 2016 and then
             | announced as a first class language at Google I/O in 2017.
        
           | kernal wrote:
           | Android Studio is based on IntelliJ and there's a lot of
           | collaboration between both teams. The adoption of Kotlin was
           | a logical next step, considering a lot of IntelliJ is written
           | in Kotlin.
        
       | throwawaymaths wrote:
       | Rustler crosses the rust/Erlang barrier relatively well, though
       | it's error messages when you try to cross it wrong are somewhat
       | unhelpful.
        
       | move-on-by wrote:
       | I've not used rust, and quite frankly I think a lot of the post
       | is over my head, but I enjoyed the read nonetheless.
       | 
       | > I don't have any specific plans to turn this C proof-of-concept
       | into a production-quality tool that would enable calling Rust
       | from C, but if anyone wants to take it from here, I'd be happy to
       | assist!
       | 
       | I laughed at this, I'd bet my bottom dollar it's an attempted
       | nerd snip!
        
       | ingve wrote:
       | This could be great for scripting with Neptune! [0]
       | 
       | [0] https://github.com/Srinivasa314/neptune-lang
        
         | yobananaboy wrote:
         | I'm gonna find some reason to use this for my Battleship game
         | too!
        
       | marklar423 wrote:
       | With all this effort required (as the author points out), I start
       | to wonder if a better solution is to communicate via RPC over
       | local sockets.
       | 
       | There will be some overhead, but it might be a wash considering
       | calling over a FFI often involves similar overhead to marshall /
       | unmarshall objects. And the simplicity gains would be massive.
        
         | layer8 wrote:
         | Why over a socket? You could perform the same protocol more
         | efficiently with normal functions in-process. Maybe we need a
         | standard serializing LPC protocol just using the platform ABI.
         | Or maybe this comes down to something like ZeroMQ in-process.
        
           | marklar423 wrote:
           | Mostly because sockets are supported by everything today, and
           | they're easy to understand. What you're describing would
           | certainly work but it looks similar to what the OP did in the
           | blog post, with all the complexity it comes with.
        
             | layer8 wrote:
             | The OP doesn't serialize. My proposal would still serialize
             | as with RPC, but instead of passing the data over a socket,
             | just pass the data as a binary blob over a regular function
             | call.
        
               | spongebobstoes wrote:
               | The main thing on my mind is that the build system would
               | become more bespoke when doing it that way, compared to
               | running a few processes that interact with each other.
               | 
               | The overhead of socket read+write is typically much less
               | than the serialization overhead, although both can be
               | optimized to the point of irrelevance for many
               | applications.
               | 
               | It's also interesting because it ends up looking like a
               | microservices architecture, except all on one machine
               | (even all in one process tree).
        
           | marklar423 wrote:
           | https://zeromq.org/ -> TIL really cool, thanks for the
           | pointer.
        
         | masfuerte wrote:
         | COM [1] was a solution to these problems thirty years ago.
         | 
         | In-process it's just function calls. Cross-process COM has
         | automatic marshalling for standard types ("automation types")
         | or you can define custom marshalling that does whatever you
         | want.
         | 
         | WinRT [2] is a more modern version. It builds on COM and (among
         | other things) provides the basis for the latest UI frameworks
         | in Windows.
         | 
         | [1]: https://en.wikipedia.org/wiki/Component_Object_Model
         | 
         | [2]: https://en.wikipedia.org/wiki/Windows_Runtime
        
           | nsguy wrote:
           | A long time ago I worked on a project where we needed to
           | distribute an in process COM object, so we moved it to DCOM,
           | instantiated multiple instances, and that worked! All in all
           | COM was a fairly pleasant technology. Not really that
           | different than gRPC (e.g. idl vs. proto).
        
       | alexvitkov wrote:
       | If you want to interop well with Rust code, it feels to me like
       | your language has to inherit so many Rust semantics, that I'm
       | questioning myself why I would use it over Rust.
       | 
       | If you're making a new language, just have good interop with C.
       | Most libraries worth using are written in C. Calling into C is
       | trivial* and enforces almost no limitations on what you can do
       | language-design wise.
       | 
       | * trivial, with the somewhat sizable asterisk that you have to
       | rewrite the header files in your language.
        
         | verdagon wrote:
         | I've been looking into this, and I _suspect_ that one actually
         | needs surprisingly little to interoperate safely with Rust.
         | 
         | TL;DR: The lowest common denominator between Rust and any other
         | memory-safe language is a borrow-less affine type.
         | 
         | The key insight is that Rust is actually several different
         | mechanisms stacked on top of each other.
         | 
         | To illustrate, imagine a program in a Rust-like language.
         | 
         | Now, refactor it so you don't have any & references, only &mut.
         | It actually works, if you're willing to refactor a bit: you'll
         | be storing a lot of things in collections and referring to them
         | by index, and cloning even more, but nothing too bad.
         | 
         | Now, go even further and refactor the program to not have any
         | &mut either. This requires some acrobatics: you'll be
         | temporarily removing things from those collections and moving
         | things into and out of functions like in [2], but it's still
         | possible.
         | 
         | You're left with something I refer to as "borrowless affine
         | style" in [1] or "move-only programming" in [0].
         | 
         | I believe that's the bare minimum needed to interoperate with
         | Rust in a memory safe way: unreference-able moveable types.
         | 
         | The big question then becomes: if our language has only these
         | moveable types, and we want to call a Rust function that
         | accepts a reference, what then?
         | 
         | I'd say: make the language move the type in as an argument,
         | take a temporary reference just for Rust, and then move-return
         | the type back to the caller. The rest of our language doesn't
         | need to know about borrowing, it's just a private
         | implementation detail of the FFI.
         | 
         | These weird moveable types are, of course, _extremely
         | unergonomic,_ but they serves as a foundation. A language could
         | use these only for Rust interop, or it could go further: it
         | could add other mechanisms on top such as  & (hard), or &mut
         | (easy), or both (like Rust), or a lot of cloning (like [3]), or
         | generational references (like Vale), or some sort of RefCell/Rc
         | blend, or linear types + garbage collection (like Haskell) and
         | so on.
         | 
         | (This is actually the topic of the next post, you can tell I've
         | been thinking about it a lot, lol)
         | 
         | [0] "Move-only programming" in
         | https://verdagon.dev/grimoire/grimoire#the-list
         | 
         | [1] "Borrowless affine style" in
         | https://verdagon.dev/blog/vale-memory-safe-cpp
         | 
         | [2] https://verdagon.dev/blog/linear-types-borrowing
         | 
         | [3]
         | https://web.archive.org/web/20230617045201/https://degaz.io/...
        
           | rng_civ wrote:
           | Have you taken a look at the paper "Foreign Function Typing:
           | Semantic Type Soundness for FFIs" [0]?
           | 
           | > We wish to establish type soundness in such a setting,
           | where there are two languages making foreign calls to one
           | another. In particular, we want a notion of convertibility,
           | that a type tA from language A is convertible to a type tB
           | from language B, which we will write tA ~ tB , such that
           | conversions between these types maintain type soundness
           | (dynamically or statically) of the overall system
           | 
           | > ...the languages will be translated to a common target. We
           | do this using a realizability model, that is, by up a logical
           | relation indexed by source types but inhabited by target
           | terms that behave as dictated by source types. The
           | conversions tA ~ tB that should be allowed, are the ones
           | implemented by target-level translations that convert terms
           | that semantically behave like tA to terms that semantically
           | behave like tB (and vice versa)
           | 
           | I've toyed with this approach to formalize the FFI for
           | TypeScript and Pyret and it seemed to work pretty well. It
           | might get messier with Rust because you would probably need
           | to integrate the Stacked/Tree Borrows model into the common
           | target.
           | 
           | But if you can restrict the exposed FFI as a Rust-sublanguage
           | without borrows, maybe you wouldn't need to.
           | 
           | [0] (PDF Warning):
           | https://wgt20.irif.fr/wgt20-final23-acmpaginated.pdf
        
           | alexvitkov wrote:
           | Thanks for the write-up. My biggest fear is not references,
           | overloads or memory management, but rather just the layout of
           | their structures.
           | 
           | We have this:                   sizeof(String) == 24
           | sizeof(Option<String>) == 24
           | 
           | Which is cool. But Option<T> is defined like this:
           | enum Option<T> {            Some(T),            None,
           | }
           | 
           | I didn't find any "template specialization" tricks that you
           | would see in C++, as far as I can see the compiler figures
           | out some trick to squeeze Option<String> into 24 bytes.
           | Whatever those tricks are, unless rustc has an option to
           | export the layout of a type, you will need to implement
           | yourself.
        
             | vlovich123 wrote:
             | You don't need to determine the internal representation as
             | long as you're dealing with opaque types and invoking rust
             | functions on it.
             | 
             | As for the tricks used to make both 24 bytes, it's NonNull
             | within String that Option then detects and knows it can
             | represent transparently without any enum tags. For what
             | it's worth you can do similar tricks in c++ using zero-
             | sized types and tags to declare nullable state (in fact
             | std::option already knows to do this for pointer types if I
             | recall correctly)
        
             | ithkuil wrote:
             | Yeah currently "niche optimization" is performed when the
             | compiler can infer that some values of the structure are
             | illegal.
             | 
             | This can be currently done when a type declares the range
             | of an integer to not be complete with the
             | 
             | rustc_layout_scalar_valid_range_start or _end attribute
             | (requires #![feature(rustc_attrs)])
             | 
             | In your example it works for String, because String
             | contains a Vec<U8> which inside contains a capacity field
             | of type struct Cap(usize) but the usize is effectively
             | constrained to contain values from 0..=max_isize
             | 
             | The only way for you to know that is to effectively be the
             | rustc compiler or be able to consume it's output
        
         | jlarocco wrote:
         | I wish Rust would standardize their ABI already. I started a
         | project to call Rust from Common Lisp, but haven't got very
         | far. It's a lot of work, and they can break compatibility at
         | any time.
         | 
         | If they really want to replace C and C++ then they really need
         | to support being called from third party languages.
        
           | guipsp wrote:
           | https://github.com/rust-lang/rust/issues/111423
        
       | revskill wrote:
       | WHy not WASI ?
        
         | Retr0id wrote:
         | How could WASI solve (or be involved in solving) this problem?
        
           | Findecanor wrote:
           | I suspect confusion with the WebAssembly Component Model --
           | whose development is somewhat intertwined with that of
           | WASI's.
           | 
           | It defines a function call ABI between sandboxes. No object
           | is in shared memory: parameters are passed by value or by
           | handle. Has its own IDL and ABI that languages' ABIs need to
           | have adaptors to, if they don't conform.
        
       | jamilbk wrote:
       | The article provides a very detailed exploration of all of the
       | fun challenges you can face designing FFIs with Rust, but there's
       | a good chance you can "get away" with simpler approaches if you
       | think ahead a bit.
       | 
       | In our case, we call into Rust from Kotlin using JNI [0] and
       | Swift using swift-bridge [1]. Thankfully our use case for the FFI
       | [2] is for non-performance-critical calls and the data structures
       | are fairly simple, so we just serialize objects with JSON.
       | 
       | No major issues so far.
       | 
       | One thing I am surprised hasn't been mentioned so far is
       | Mozilla's UniFFI [3] which seems to solve some of the issues
       | brought up in the article. We plan to switch to that once our FFI
       | requirements become more complex.
       | 
       | [0] https://docs.rs/jni/latest/jni/
       | 
       | [1] https://github.com/chinedufn/swift-bridge
       | 
       | [2] https://www.firezone.dev/kb/architecture/tech-
       | stack#client-a...
       | 
       | [3] https://github.com/mozilla/uniffi-rs
        
       | ar7hur wrote:
       | > Anyone trying to make a new mainstream language is completely
       | insane, unless they're backed by a huge corporation. There are
       | only two exceptions in the last 25 years that come close: Scala
       | and Kotlin.
       | 
       | And Clojure! (also a JVM language)
        
         | munchler wrote:
         | I would also add Zig to the list. I certainly hear about it
         | often enough on HN.
        
           | zem wrote:
           | elixir and gleam in the erlang world
        
       | blaise-pabon wrote:
       | I'm a novice on this topic, but I'm surprised that no one has
       | mentioned Python. Is that because it is a solved problem, thanks
       | to https://github.com/PyO3/pyo3 and is no longer a challenge?
        
       | forrestthewoods wrote:
       | C APIs are the best APIs. I do a lot of mixed language work and I
       | would never attempt anything like. Just write a C API and provide
       | trivial FFI bindings for your favorite language.
       | 
       | That said, I thoroughly enjoyed the article and the authors
       | admission of its insanity! Great read. But do the simple thing
       | and call it a day.
        
       ___________________________________________________________________
       (page generated 2024-06-17 23:01 UTC)