[HN Gopher] jank is C++
___________________________________________________________________
jank is C++
Author : Jeaye
Score : 159 points
Date : 2025-07-11 17:22 UTC (5 hours ago)
(HTM) web link (jank-lang.org)
(TXT) w3m dump (jank-lang.org)
| almostgotcaught wrote:
| i commented on reddit (and got promptly downvoted) but since i
| think jank's author is around here (and hopefully is receptive to
| constructive criticism): the CppInterOp approach to cpp interop
| is completely janky (no pun intended). the approach literally
| string munges cpp and then parses/interprets it to emit ABI
| compliant calls. there's no reason to do this _except_ that
| libclang currently doesn 't support any other way. that's not
| jank's fault but it could be "fixed" in libclang. at a minimum
| you could use https://github.com/llvm/llvm-
| project/blob/main/clang/lib/Cod... to emit the code based on
| clang ast. at a maximum would be to use something like
|
| https://github.com/Mr-Anyone/abi
|
| _or_ this if /when it comes to fruition
|
| https://discourse.llvm.org/t/llvm-introduce-an-abi-lowering-...
|
| to generate ABI compliant calls/etc for cpp libs.
|
| note, i say all this with maximum love in my heart for a language
| that would have first class cpp interop - i would immediately
| become jank's biggest proponent/user if its cpp interop were
| robust.
|
| EDIT: for people wanting/needing receipts, you can skim through
| https://github.com/compiler-research/CppInterOp/blob/main/li...
| wk_end wrote:
| > the CppInterOp approach to cpp interop is completely janky
| (no pun intended). the approach literally string munges cpp and
| then parses/interprets it to emit ABI compliant calls.
|
| So, I agree that this sounds janky as heck. My question is:
| besides sounding janky as heck, is there something wrong with
| this? Is it slow/unreliable?
| almostgotcaught wrote:
| i mean it's as prone to error as any other thing that relies
| on string munging. it's probably not that much slower than
| the alternative i proposed - because the trampolines/wrappers
| are jitted and then reused - but it's just not robust enough
| that i would ever imagine building a prod system on top of it
| (eg using cppyy in prod) _let alone baking it into my
| language /runtime_.
| refulgentis wrote:
| The delta between the title and the content gave me extreme
| pause, thanks for sharing that there's, uh, worse problems.
|
| I'm a bit surprised I've seen two articles about jank here
| the last 2 days if these are exemplars of the technical
| approach and communication style. Seems like that wouldn't
| be enough to get on people's radars.
| actionfromafar wrote:
| Given how the world works, that might mean we will all
| sit and curse Jank instead of cursing Node. :)
| Jeaye wrote:
| Which particular delta between the title and the content
| gave you extreme pause?
| refulgentis wrote:
| It said "jank is C++", which I assumed would be
| explaining that jank compiles down to C++ or something
| similar, i.e. there is a layer of abstraction between
| jank and C++, but it effectively "works like" C++.
|
| On re-read, I recognize where it is used in the article:
|
| "jank _is_ C++. There is no runtime reflection, no guess
| work, and no hints. If the compiler can 't find a member,
| or a function, or a particular overload, you will get a
| compiler error."
|
| I assume other interop scenarios don't pull this off*,
| thus it is distinctive. Additionally, I'm not at all
| familiar with Clojure, sadly, but it also sounds like
| there's some special qualities there ("I think that this
| is an interesting way to start thinking about jank,
| Clojure, and static types")
|
| Now I'll riff and just write out the first 3-5 titles
| that come to mind with that limited understanding:
|
| - Implementing compile-time verifiable C++ interop in
| jank
|
| - Sparks of C++ interop: jank, Clojure, & verifying
| interop before runtime
|
| - jank's progress on C++ interop
|
| - Safe C++ interop lessons from jank
|
| * for example, I write a lot of Dart day to day and rely
| on Dart's "FFI" implementation to call C++, which now
| that I'm thinking about, only works because there's a
| code generator that creates "Dart headers" (my term) for
| the C++ libraries. I could totally footgun and call
| arbitrary functions that don't exist.
| Jeaye wrote:
| My reasoning is this:
|
| jank is written in C++. Its compiler and runtime are both
| in C++. jank can compile to C++ directly (or LLVM IR).
| jank can reach into C++ seamlessly, which includes
| reaching into its own compiler/runtime. Thus, the
| boundary between what is C++ and what is Clojure is gone,
| which leaves jank as being both Clojure and C++.
|
| Achieving this singularity is a milestone for jank and, I
| think, is worthy of the title.
| Jeaye wrote:
| > i mean it's as prone to error as any other thing that
| relies on string munging.
|
| This is misleading. Having done a great deal of both (as
| jank also supports C++ codegen as an alternative to IR), if
| the input is a fully analyzed AST, generating IR is
| significantly more error prone than generating C++. Why?
| Well, C++ is statically typed and one can enable warnings
| and errors for all sorts of issues. LLVM IR has a verifier,
| but it doesn't check that much. Handling references,
| pointers, closures, ABI issues, and so many more things
| ends up being a huge effort for IR.
|
| For example, want to access the `foo.bar` member of a
| struct? In IR, you'll need to access foo, which may require
| loading it if it's a reference. You'll need to calculate
| the offset to `bar`, using GEP. You'll need to then
| determine if you're returning a reference to `bar` or if a
| copy is happening. Referencing will require storing a
| pointer, whereas copying may involve a lot more code. If
| we're generating C++, though, we just take `foo` and add a
| `.bar`. The C++ compiler handles the rest and will tell us
| if we messed anything up.
|
| If you're going to hand wave and say anything that's
| building strings is error prone and unsafe, regardless of
| how richly typed and thoroughly analyzed the input is, the
| stance feels much less genuine.
| Jeaye wrote:
| Hey! I'm here and receptive.
|
| I completely agree that Clang could solve this by actually
| supporting my use case. Unfortunately, Clang is very much
| designed for standalone AOT compilation, not intertwined with
| another IR generating mechanism. Furthermore, Clang struggles
| to handle some errors gracefully which can get it into a bad
| state.
|
| I have grown jank's fork of CppInterOp quite significantly, in
| the past quarter, with the full change list being here:
| https://gist.github.com/jeaye/f6517e52f1b2331d294caed70119f1...
| Hoping to get all of this upstreamed, but it's a lot of work
| that is not high priority for me right now.
|
| I think, based on my experience in the guts of CppInterOp, that
| the largest issue is not the C++ code generation. Basically any
| code generation is some form of string building. You linked to
| a part of CppInterOp which is constructing C++ functions.
| What's _actually_ wrong with that, in terms of robustness? The
| strings are generated not based on arbitrary user input, but
| based on Clang QualTypes and Decls. i.e. you need valid Clang
| values to actually get there anyway. Given that the ABI
| situation is an absolute mess, and that jank is already using
| Clang's JIT C++ compiler, I think this is a very viable
| solution.
|
| However, in terms of robustness, I go back to Clang's error
| handling, lack of grace, and poor tooling for use cases like
| this. Based on my experience, _that_ is what will cause
| robustness issues.
|
| Please don't take my response as unreceptive or defensive. I
| really do appreciate the discussion and if I'm saying something
| wrong, or if you want to explain further, please do. For
| alternatives, you linked to https://github.com/Mr-Anyone/abi
| which is 3 months old and has 0 stars (and so I assume 0 users
| and 0 years of battle testing). You also linked to
| https://discourse.llvm.org/t/llvm-introduce-an-abi-lowering-...
| which I agree would be great, _if/when it becomes available_.
|
| So, out of all of the options, I'll ask clearly and sincerely:
| is there really a _better_ option which exists today?
|
| CppInterOp is an implementation detail of jank. If we can
| replace C++ string generation with more IR generation and a
| portable ABI mechanism, _and_ if Clang can provide the
| sufficient libraries to make it so that I don't need to rely on
| C++ strings to be certain that my template specializations get
| the correct instantiation, I am definitely open to replacing
| CppInterOp. From all I've seen, we're not there yet.
| almostgotcaught wrote:
| > which is 3 months old and has 0 stars (and so I assume 0
| users and 0 years of battle testing)
|
| ah my bad i meant to link to this one
| https://github.com/scrossuk/llvm-abi
|
| which inspired the gsoc.
|
| > is there really a _better_ option which exists today?
|
| today the "best in class" approach is swift's which fully
| (well tries to) model cpp AST and do what i suggested
| (emitting code directly):
|
| https://github.com/swiftlang/swift/blob/c09135b8f30c0cec8f5f.
| ..
| Jeaye wrote:
| There are upsides to this approach. Coupling Swift's AST
| with Clang's AST will allow for the best codgen, for sure.
|
| However, the huge downside to this approach, which cannot
| be overlooked, is that Clang (not libclang) is not designed
| to be a library. It doesn't have the backward compatibility
| of a library. Swift (i.e. Apple) is already deep into
| developing Clang, and so I'm sure they can afford the cost
| of keeping up with the breaking changes that happen on
| every Clang release. For a solo dev, I'm not yet sure this
| is actually viable, but I will give it more consideration.
|
| However, I think that raising alarms at C++ codegen is
| unwarranted. As I said before, basically any query builder
| or codegen takes some form of string generation. The way we
| make those safe is to add types in front of them, so we're
| not just formatting user strings into other strings. That's
| exactly what CppInterOp does, where the types added are
| Clang QualTypes and Decls.
| almostgotcaught wrote:
| > For a solo dev, I'm not yet sure this is actually
| viable, but I will give it more consideration.
|
| look i'm not trying to shit on your project - i promise -
| i know calling you out like this publically almost
| requires a political kind of response (i probably
| shouldn't have done it). i agree with you that as a solo
| dev you can't (shouldn't) solve this problem - you have
| enough on your plate making jank great for your core
| users (who probably don't really care about cpp).
|
| > As I said before, basically any query builder or
| codegen takes some form of string generation.
|
| i mean this is a tautology on the level of "everything
| can be represented as strings". yes that's true but types
| (as you mention are important) and all i'm arguing is
| that it's much more robust to start with types and end
| with types instead of starting with strings and ending
| with types.
|
| anyway you don't need to keep addressing my complaints -
| you have enough on your plate.
| rjsw wrote:
| I think that some packages that generate Python bindings for
| C++ use Clang to do it as well.
| xxr wrote:
| These recursive initialism PL names are getting out of hand /s
| Jeaye wrote:
| I've pondered this for a while and I have no idea how jank is a
| recursive acronym. What're you seeing that I'm not?
| eurleif wrote:
| Jank's A Native Klojure? :)
| xxr wrote:
| It's a joke (hence the "/s") on the "[PL name] is [words
| beginning with the rest of the letters of the Pl name]"
| snowclone. However as time approaches infinity I'm sure it
| will get a recursive backronym.
| johnnyjeans wrote:
| I'm not surprised to see that Jank's solution to this is to embed
| LLVM into their runtime. I really wish there was a better way to
| do this.
|
| There are a lot of things I don't like about C++, and close to
| the top of the list is the lack of standardization for name-
| mangling, or even a way mangle or de-mangle names at compile-
| time. Sepples is a royal pain in the ass to target for a dynamic
| FFI because of that. It would be really nice to have some way to
| get symbol names and calling semantics as constexpr const char*
| and not have to deal with generating (or writing) a ton of
| boilerplate and extern "C" blocks.
|
| It's absolutely possible, but it's not low-hanging fruit so the
| standards committee will never put it in. Just like they'll never
| add a standardized equivalent for alloca/VLAs. We're not allowed
| to have basic, useful things. Only more ways to abuse type
| deduction. Will C++26 finally give us constexpr dynamic
| allocations? Will compilers ever actually implement one of the
| three (3) compile-time reflection standards? Stay tuned to find
| out!
| almostgotcaught wrote:
| > LLVM into their runtime
|
| they're not embedding LLVM - they're embedding clang. if you
| look at my comment below, you'll see LLVM is not currently
| sufficient.
|
| > [C++] is a royal pain in the ass to target for a dynamic FFI
| because of that
|
| name mangling is by the easiest part of cpp FFI - the hard part
| is the rest of the ABI. anyone curious can start here
|
| https://github.com/rust-lang/rust-bindgen/issues/778
| Jeaye wrote:
| To be fair, jank embeds both Clang and LLVM. We use Clang for
| C++ interop and JIT C++ compilation. We use LLVM for IR
| generation and jank's compiler back-end.
| johnnyjeans wrote:
| > they're not embedding LLVM - they're embedding clang
|
| They're embedding both, according to the article. But it's
| also just sloppy semantics on my part; when I say LLVM, I
| don't make a distinction of the frontend or any other part of
| it. I'm fully relying on context to include all relevant bits
| of software being used. In the same way I might use "Windows"
| to refer to any part of the Windows operating system like
| dwm.exe, explorer.exe, command.com, ps.exe, etc. LLVM a
| generic catch-all for me, I don't say "LLI" I say "the LLVM
| VM", for example. I can't really consider clang to be
| distinct from that ecosystem, though I know it's a discrete
| piece of software.
|
| > name mangling is by the easiest part of cpp FFI
|
| And it still requires a lot of work, and increases in effort
| when you have multiple compilers, and if you're on a tiny
| code team that's already understaffed, it's not really
| something you can worry about.
|
| https://en.m.wikiversity.org/wiki/Visual_C%2B%2B_name_mangli.
| ..
|
| You're right, writing platform specific code to handle this
| is more than possible. But it takes manhours that might just
| be better spent elsewhere. And that's before we get to the
| part where embedding a C++ compiler is extremely
| inappropriate when you just want a symbol name and an ABI.
|
| But this is besides the point: The fact that it's not a
| problem solved by the gargantuan standard is awful. I also
| consider the ABI to be the exact same issue, that being
| absolutely awful support of runtime code loading, linking and
| interoperation. There's also no real reason for it, other
| than the standards committee being incompetent.
| benreesman wrote:
| Carmack did very much almost exactly the same with the Trinity
| / Quake3 Engine: IIRC it was LCC, maybe tcc, one of the C
| compilers you can actually understand totally as an individual.
|
| He compiled C with some builtins for syscalls, and then
| translated that to his own stack machine. But, he _also_ had a
| target for native DLLs, so same safe syscall interface, but
| they can segv so you have to trust them.
|
| Crazy to think that in one computer program (that still reads
| better than high-concept FAANG C++ from elite lehends, truly
| unique) this _wasn 't even the most dramatic innovation. It was
| the _third* most dramatic revolution in _one program_.
|
| If you're into this stuff, call in sick and read the plan files
| all day. Gives me googebumps.
| johnnyjeans wrote:
| Any particular year?
| wging wrote:
| Quake III Arena was released in 1999. It was open-sourced
| in 2005.
|
| https://github.com/id-Software/Quake-III-Arena
|
| https://en.wikipedia.org/wiki/Id_Tech_3
|
| (from the source release you can see benreesman remembered
| right: it was lcc)
| no_wizard wrote:
| Carmack actually deserves the moniker of 10x engineer. Truly
| his work in his domain has reached far outside it because id
| the quality of his ideas and methodologies
| MangoToupe wrote:
| Linking directly to C++ is truly hell just considering symbol
| mangling. The syntax <-> semantics relationship is ghastly. I
| haven't seen a single project tackle the C++ interface in its
| entirety (outside of clang). It nearly seems impossible.
|
| There's a reason Carmack tackled the C abi and not whatever
| the C++ equivalent is.
| PaulDavisThe1st wrote:
| There is no C ABI (windows compilers do things quite
| differently from linux ones, etc) and there is no certainly
| no C++ equivalent.
| o11c wrote:
| > the lack of standardization for name-mangling, or even a way
| mangle or de-mangle names at compile-time.
|
| Like many things, this isn't a C++ problem. There is a standard
| and almost every target uses it ... and then there's what
| Microsoft does. Only if you have to deal with the latter is
| there a problem.
|
| Now, standards _do_ evolve, and this does give room for
| different system libraries /tools to have a different view of
| what is acceptable/correct (I still have nightmares of trying
| to work through `I...E` vs `J...E` errors) ... but all the
| functionality _does_ exist and work well if you aren 't on the
| bleeding edge (fortunately, C++11 provided the bits that are
| truly _essential_ ; everything since has been merely nice-to-
| have).
| mort96 wrote:
| Like many things people claim "isn't a C++ problem but an
| implementation problem"... This is a C++ problem. Anything
| that's not nailed down by the standard should be expected to
| vary between implementations.
|
| The fact that the standard doesn't specify a name mangling
| scheme leads to the completely predictable result that
| different implementations use different name mangling
| schemes.
|
| The fact that the standard doesn't specify a mechanism to
| mangle and demangle names (be it at runtime or at compile
| time) leads to the completely predictable result that
| different implementations provide different mechanisms to
| mangle and demangle names, and that some implementations
| don't provide such a mechanism.
|
| These issues could, and should, have been fixed in the only
| place they _can_ be fixed -- the standard. ISO is the
| mechanism through which different implementation vendors
| collaborate and find common solutions to problems.
| vlovich123 wrote:
| > Anything that's not nailed down by the standard should be
| expected to vary between implementations.
|
| When you have one implementations you have a standard. When
| you have two implementations and a standard you don't
| actually have a standard in practice. You just have two
| implementations that kind of work similarly in most cases.
|
| While the major compilers do a fantastic job they still
| frequently disagree about even "well defined" behavior
| because the standard was interpreted differently or
| different decisions were made.
| SideQuark wrote:
| > When you have two implementations and a standard you
| don't actually have a standard in practice
|
| This simply isn't true. Plenty of standardized things are
| interchangeable, from internet RFCs followed by zillions
| of players and implementations of various RFCs, medical
| device standards, encryption standards, weights and
| measures, currency codes, country codes, time zones, date
| and time formats, tons of file formats, compression
| standards, the ISO 9000 series, ASCII, testing standards,
| and on and on.
|
| The poster above you is absolutely correct - if something
| is not in the standard, it can vary.
| jdiff wrote:
| If you have an area where the standard is ambiguous about
| its requirements, then you have a bug in the standard.
| And hopefully you also have a report to send along to
| help it communicate itself more clearly.
| o11c wrote:
| This is like getting mad at ISO 8601 because it doesn't
| define the metric system.
|
| _No_ standard stands alone in its own universe;
| complementary standards must necessarily always exist.
|
| Besides, even if the C++ standard suddenly _did_
| incorporate ABI standards by reference, Microsoft would
| just refuse to follow them, and nothing would actually be
| improved.
| no_wizard wrote:
| A better situation than today would be only having to
| deal with Microsoft and Not Microsoft, rather than
| multiple different ways of handling the problem that can
| differ unexpectedly
| josefx wrote:
| > The fact that the standard doesn't specify a name
| mangling scheme leads to the completely predictable result
| that different implementations use different name mangling
| schemes.
|
| The ABI mess predates the standard by years and if we look
| that far back the Annotated C++ Reference Manual included a
| scheme in its description of the language. Many compiler
| writers back then made the intentional choice to ignore it.
| The modern day ISO standard would not fare any better at
| pushing that onto unwilling compiler writers than it fared
| with the c++03 export feature.
| plq wrote:
| > the lack of standardization for name-mangling
|
| I don't see the point of standardizing name mangling. Imagine
| there is a standard, now you need to standardize the memory
| layout of every single class found in the standard library.
| Without that, instead of failing at link-time, your
| hypothetical program would break in ugly ways while running
| because eg two functions that invoke one other have differing
| opinions about where exactly the length of a std::string can be
| found in the memory.
| johnnyjeans wrote:
| The naive way wouldn't be any different than what it's like
| to dynamically load sepples binaries right now.
|
| The real way, and the way befitting the role of the standards
| committee is actually putting effort into standardizing a way
| to talk to and understand the interfaces and structure of a
| C++ binary at load-time. That's exactly what linking is for.
| It should be the responsibility of the software using the FFI
| to move it's own code around and adjust it to conform with
| information provided by the main program as part of the
| dynamic linking/loading process... which is already what it's
| doing. You can mitigate a lot of the edge cases by making
| interaction outside of this standard interface as undefined
| behavior.
|
| The canonical way to do your example is to get the address of
| std::string::length() and ask how to appropriately call it
| (to pass "this, for example.)
| Jeaye wrote:
| I hear you when it comes to C++ portability, ABI, and
| standards. I'm not sure what you would imagine jank using if
| not for LLVM, though.
|
| Clojure uses the JVM, jank uses LLVM. I imagine we'd need
| _something_ to handle the JIT runtime, as well as jank's
| compiler back-end (for IR optimization and target codegen). If
| it's not LLVM, jank would embed something else.
|
| Having to build both of these things myself would make an
| already gargantuan project insurmountable.
| kccqzy wrote:
| > de-mangle names at compile-time
|
| Far from being standardized but it's possible today on GCC and
| Clang. You just abuse __PRETTY_FUNCTION__.
| dataflow wrote:
| That's not demangling a mangled name, it's retrieving the
| unmangled name of a symbol.
| dmoy wrote:
| From two days ago: https://news.ycombinator.com/item?id=44482273
| papichulo2023 wrote:
| Recently I tried D lang and was surprise with the nice interop
| with C++ (the language in general feels pretty good), Carbon is
| nowhere to be seen and havent tried Swift's yet. I hope this is a
| good one.
| actionfromafar wrote:
| Shedskin lang has excellent integration with C++.
| almostgotcaught wrote:
| shedskin isn't actively developed ... or at least it wasn't
| for like 10 years
| https://github.com/shedskin/shedskin/graphs/contributors
| actionfromafar wrote:
| It suddenly sprung to life and gained Python 3
| compatilibity, which makes it much more interesting than
| before despite it's ten year hiatus in Python 2 land.
| Imustaskforhelp wrote:
| Now ofc jank already has gotten C++ support but if I may ask,
| if it had, let's say gotten D lang support, then would that
| have been easier/more doable/practical?
| Mathnerd314 wrote:
| Ok so jank is Clojure but with C++/LLVM runtime rather than JVM.
| So already all of its types are C++ types, that presumably makes
| things a lot easier. Basically it just uses libclang / CppInterOp
| to get the corresponding LLVM types and then emits a function
| call. https://github.com/jank-
| lang/jank/blob/interop/compiler%2Bru...
| Jach wrote:
| Neat project, I can only marvel at your ability to deal with such
| madness. But it would be nice to have better C++ interop in
| higher level languages, there's some _useful_ C++ code out there.
| I also appreciate the brief mention of Clasp, as I was
| immediately thinking of it as I was reading through.
| netbioserror wrote:
| I used Clojure back in the day and use Nim at work these days.
| Linking in to C is trivially easy in Nim. Happy to see this
| working for jank, but C++ is...such a nightmare target.
|
| Any chance of Jank eventually settling on reference counting? It
| checks so many boxes in my book: Simple, predictable, few edge
| cases, fast. I guess it really just depends on how much jank
| programs thrash memory, I remember Clojure having a lot of
| background churn.
| Jeaye wrote:
| I started with reference counting, but the amount of garbage
| Clojure programs churn out ends up bogging everything down
| unless a GC is used. jank's GC will change, going forward, and
| I want jank to grow to support optional affine typing, but the
| Clojure base is likely always going to be garbage collected.
| YuriNiyazov wrote:
| A long long time ago, at ClojureConj 2014, I asked Rich Hickey
| whether a cpp-based clojure was possible, and his answer was
| "well, the primary impediment there is a lack of a garbage
| collector". There were a lot of conversations going on at the
| same time, so I didn't get an opportunity to "delve" into it,
| but:
|
| 1. does that objection make sense? 2. How does jank approach that
| hurdle.
| bertmuthalaly wrote:
| It's the first section in the article -
|
| "I have implemented manual memory management via cpp/new and
| cpp/delete. This uses jank's GC allocator (currently bdwgc),
| rather than malloc, so using cpp/delete isn't generally needed.
| However, if cpp/delete is used then memory collection can be
| eager and more deterministic.
|
| The implementation has full bdwgc support for destructors as
| well, so both manual deletion and automatic collection will
| trigger non-trivial destructors."
___________________________________________________________________
(page generated 2025-07-11 23:00 UTC)