[HN Gopher] ICPP - Run C++ anywhere like a script
       ___________________________________________________________________
        
       ICPP - Run C++ anywhere like a script
        
       Author : davikr
       Score  : 77 points
       Date   : 2024-08-04 03:36 UTC (4 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | celrod wrote:
       | How feasible would it be for something like gdb to be able to use
       | a C++ interpreter (whether icpp, or even a souped up `constexpr`
       | interpreter from the compiler) to help with "optimized out"
       | functions?
       | 
       | gdb also doesn't handle overloaded functions well, e.g. `x[i]`.
        
         | mananaysiempre wrote:
         | It does though? Just compiled a small program that creates a
         | vector, and GDB is perfectly happy accessing it using this
         | syntax. It will even print std::string's correctly if you cast
         | them to const char* by hand. (Linux x86-64, GDB 14.2.)
        
           | scintill76 wrote:
           | > It will even print std::string's correctly if you cast them
           | to const char* by hand
           | 
           | What does that mean? I think `print str.c_str()` has worked
           | for me in GDB before, but sounds like you did something
           | different.
        
             | mananaysiempre wrote:
             | I was observing that `p (const char *)str` also worked in
             | my experiment, but I'm far from a C++ expert and upon
             | double-checking this seems to have been more of an accident
             | than intended behaviour, because there is no operator
             | const_pointer in basic_string that I can find. Definitely
             | use `p str.c_str()`.
        
               | twoodfin wrote:
               | If your std::string was using a short string
               | optimization, that would explain the "accident".
               | 
               | Some implementations even put char[0] at the first byte
               | in the optimized form.
        
               | scintill76 wrote:
               | That explanation doesn't work IMO, unless `str` is a
               | std::string pointer, which is contrary to the syntax GP
               | suggested with `str.c_str()`.
               | 
               | It doesn't seem possible in actual C++ that the cast from
               | non-pointer to pointer would work at all (even if a small
               | string happens to be inlined at offset 0.) Like GP, I
               | looked for a conversion operator, and I don't think it's
               | there. Maybe it is a feature of the gdb parser.
        
               | twoodfin wrote:
               | Good point, but if it's a long string, 2/3 of the most
               | common implementations would make the first word the
               | c_str()-equivalent pointer:
               | 
               | https://devblogs.microsoft.com/oldnewthing/20240510-00/?p
               | =10...
        
               | fluoridation wrote:
               | So it's actually printing *(const char **)&s?
        
               | twoodfin wrote:
               | The first pointer-sized chunk of the string structure is
               | a pointer to the C-string representation. So the cast
               | works as written.
        
               | fluoridation wrote:
               | Well, no, because (const char *)str is nonsense, if str
               | is an std::string.
        
           | celrod wrote:
           | I've defined a few pretty printers, but `operator[]` doesn't
           | work for my user-defined types. Knowing it works for vectors,
           | I'll try and experiment to see if there's something that'll
           | make it work.                 (gdb) p unrolls_[0]       Could
           | not find operator[].       (gdb) p unrolls_[(long)0]
           | Could not find operator[].       (gdb) p
           | unrolls_.data_.mem[0]       $2 = {
           | 
           | `unrolls_[i]` works within C++. This `operator[]` method
           | isn't even templated (although the container type is); the
           | index is hard-coded to be of type `ptrdiff_t`, which is
           | `long` on my platform.
           | 
           | I'm on Linux, gdb 15.1.
        
             | mananaysiempre wrote:
             | > This `operator[]` method isn't even templated (although
             | the container type is)
             | 
             | That might be it. If that operator isn't actually ever
             | emitted out of line, then GDB will (naturally) have nothing
             | to call. If it helps, with the following program
             | template<typename T>       struct Foo {           int
             | operator[](long i) { return i * 3; }       };
             | Foo<bool> bar;       template int
             | Foo<bool>::operator[](long); // [*]              int
             | main(void) {           Foo<int> foo;
             | __asm__("int3");           return foo[19];       }
             | 
             | compiled at -g -O0 I can both `p foo[19]` and `p bar[19]`,
             | but if I comment out the explicit instantiation marked [*],
             | the latter no longer works. At -g -O2, the former does not
             | work because `foo` no longer actually exists, but the
             | latter does, provided the instantiation is left in.
        
         | tester756 wrote:
         | this "optimized out" thing is bullshit as hell
        
         | Conscat wrote:
         | GDB does have hooks for interpreters to be executed within it,
         | but I haven't managed to make this work. https://sourceware.org
         | /gdb/current/onlinedocs/gdb.html/JIT-I....
        
       | Klasiaster wrote:
       | Related is cargo script for Rust: https://doc.rust-
       | lang.org/cargo/reference/unstable.html#scri... (nightly only)
        
         | ComputerGuru wrote:
         | You can do it out of the box with rust, no need for any tools,
         | because you can strategically mix shell and rust in the same
         | code: https://neosmart.net/blog/self-compiling-rust-code/
        
       | bsenftner wrote:
       | Why interpret at all? Back in the mid to early 90's I started
       | embedding C++ compilers into the game engines I wrote, where the
       | "game scripting language" was just #define macros hiding the C++
       | syntax so the game level developers, who worked in this script,
       | could be basically anyone that could code. Their "script" would
       | compile to a DLL that was hot loaded. What they were doing in
       | their scripts would compile in under 5 seconds, and they were
       | good to go. If they ran into problems, one of the game engine
       | developers would just run their "script" in the IDE debugger.
       | 
       | Borrowed this idea from Nothing Real, the developers of Shake,
       | the video/film compositing system.
        
         | jokoon wrote:
         | I don't suppose modern os would let you do that today, sounds
         | like a security nightmare
        
           | shortrounddev2 wrote:
           | This is just JIT'ing. It's used in python and lua commonly
        
             | pjmlp wrote:
             | I wish, JIT and Python still isn't something to brag about.
        
               | greenavocado wrote:
               | Python 3.13 has an experimental JIT compiler
               | https://peps.python.org/pep-0744/
        
               | pjmlp wrote:
               | As I said, not something to brag about.
        
           | fluoridation wrote:
           | Yes, any modern OS lets any process load into its memory
           | space binaries from anywhere the user has permissions, even
           | if those are binaries it generated just now. It can be a
           | security problem if the binaries are generated from untrusted
           | sources (e.g. you download some, say, Haskell, compile it and
           | run it fully automatically).
        
             | shortrounddev2 wrote:
             | Windows has a private heap system where you can disable
             | code execution from the pages allocated to `HeapCreate`, if
             | you don't* set that flag:
             | 
             | https://learn.microsoft.com/en-
             | us/windows/win32/api/heapapi/...
             | 
             | > HEAP_CREATE_ENABLE_EXECUTE
             | 
             | > 0x00040000
             | 
             | > All memory blocks that are allocated from this heap allow
             | code execution, if the hardware enforces data execution
             | prevention. Use this flag heap in applications that run
             | code from the heap. If HEAP_CREATE_ENABLE_EXECUTE is not
             | specified and an application attempts to run code from a
             | protected page, the application receives an exception with
             | the status code STATUS_ACCESS_VIOLATION.
             | 
             | I think POSIX has equivalent memory protection calls, but
             | no equivalent to HeapCreate
        
               | fluoridation wrote:
               | But you can still call VirtualAlloc(), VirtualProtect(),
               | and LoadLibrary(), so this isn't really a security
               | mechanism, but more of a safety mechanism.
               | 
               | I don't think Windows provides a mechanism to disable
               | creating any further executable pages, although I've seen
               | Chrome do it by hooking those functions (and I know it
               | because I've had to bypass it :)).
        
               | pjmlp wrote:
               | App Sandboxing and VBS enclaves go onto that direction.
        
               | shortrounddev2 wrote:
               | I wouldn't expect windows to prevent creating further
               | executable pages; there are legitimate use cases for
               | creating dynamically allocated executable memory. It just
               | means that whatever foreign data you load into those
               | pages can't execute, which _is_ a security mechanism (for
               | example, game save data can be loaded into these heaps so
               | that you can load all game state but without the save
               | file potentially running foreign code)
        
               | fluoridation wrote:
               | There are legitimate uses, but the point would be that
               | the process could ask the system to lock it down with
               | whatever executable code is already present. This could
               | be used to prevent already running code from tampering
               | with the process' behavior by loading new code, or to
               | thwart code injection.
               | 
               | >It just means that whatever foreign data you load into
               | those pages can't execute, which is a security mechanism
               | (for example, game save data can be loaded into these
               | heaps so that you can load all game state but without the
               | save file potentially running foreign code)
               | 
               | But malloc() and all the other standard memory allocation
               | functions already return pointers into non-executable
               | pages, anyway. Perhaps those functions call into this one
               | internally, but using this over whatever your language
               | offers by default offers no additional protection.
        
               | adzm wrote:
               | All of this calls VirtualAlloc behind the scenes, and you
               | can do that yourself as well for manual page allocation.
               | Each page can have options set with VirtualProtect to
               | allow or disallow execution of code within the pages as
               | well.
        
             | IshKebab wrote:
             | iOS won't let you do that. Or at least Apple won't let you
             | do that on iOS.
        
               | fluoridation wrote:
               | What's actually happening is that the SDK doesn't expose
               | the system calls necessary to do it, but I can guarantee
               | that if you can get a native binary to run on the device,
               | you can have it do whatever you want. If that wasn't the
               | case, the few apps that do support JITting wouldn't work.
        
         | externedguy wrote:
         | > Why interpret at all?
         | 
         | a tiny part of points that come to my mind:
         | 
         | - education
         | 
         | - iteration on writing simple functionality
         | 
         | - loading and trying out several APIs to see what's possible (I
         | use it frequently with Elixir / Erlang for example)
         | 
         | It makes life easier for newcomers to wrap their head around
         | something and produce a good solution rather than a "working"
         | one
        
           | fluoridation wrote:
           | But the "interpretation" is kind of in name only. When you
           | run your program there's still going to be a compilation
           | step, it's just the interpreter will merge it with the run
           | step, and it will do it every time you run the program. I'm
           | with the GP, I don't understand the advantage of this
           | approach over traditional AOT compilation (actually this
           | isn't even JIT, it's just deferred AOT).
        
         | corysama wrote:
         | And, then there's
         | 
         | https://liveplusplus.tech/
         | 
         | https://github.com/RuntimeCompiledCPlusPlus/RuntimeCompiledC...
         | 
         | https://learn.microsoft.com/en-us/visualstudio/debugger/hot-...
        
         | pragma_x wrote:
         | Totally valid take. The answer is: it depends.
         | 
         | The advantage of a lot of scripting tech is some form of REPL,
         | which is really just a super-fast code-compile-run loop. In
         | your example, "why?" boils down to how useful/painful those
         | five seconds-per-change are. Maybe that adds up and slows the
         | coder down, or maybe it's no big deal. It all kind of depends
         | on the workflow and how fast you need to be able to iterate on
         | code changes. Moving to a scripted interpreter would eliminate
         | that wait period at the cost of runtime performance, which
         | might be a valuable business tradeoff.
         | 
         | FWIW, that "script" solution sounds awesome for the time. I'll
         | add that five seconds to build a hot-loaded DLL in the 90's is
         | really, really good performance for that solution, regardless
         | of its role as a scripting alternative. Today, that would
         | probably be mere milliseconds to compile - impossible to
         | distinguish from an embedded LUA or JS solution.
        
       | bobajeff wrote:
       | I wonder if I can use this to learn a large c++ codebase like
       | Chromium. One of the issues I had trying to learn chromium was
       | that in order to play and experiment with their classes/functions
       | I needed to spend several minutes to link my little test code
       | with their static libraries just to be able to see if my
       | understanding of them was correct. Which is just too long of time
       | for such experiments so I gave up.
        
         | hackit2 wrote:
         | Last time I checked out Chromium code base it was about 300-400
         | Megs of uncompressed cpp files. Lets not also forget the fact
         | you also needed to run some code generator script that
         | generated another 200 Megs of DOM files, or interface files. At
         | that point in time I gave up and went to sleep and never
         | touched it again.
        
           | lioeters wrote:
           | I really hope Ladybird is able to stay relatively small and
           | approachable, it would be wonderful to have as a truly
           | customizable open-source browser that's not a massive
           | codebase that takes forever to compile and develop.
        
             | hackit2 wrote:
             | I do think there are some geniuses working on the Chromium
             | code base, and I would imagine there are really good
             | reasons for doing it their way. I would imagine also
             | Ladybird over time will face the same problems, and come up
             | with similar solutions as Chromium team.
             | 
             | All I know is most of the large scale C/C++ code bases
             | eventually become these monolithic giant code bases that
             | require some really specialised software tools to compile
             | and link.
        
               | pjmlp wrote:
               | Not only C and C++, I have seen this scale with most
               | Fortune 500 projects I have been involved.
        
         | pragma_x wrote:
         | I agree on all fronts. This parallels the last time I looked
         | at, and gave up building a backend for LLVM. And that was after
         | giving up doing the same for GCC. Those codebases are
         | _impenetrable_.
         | 
         | It's clear as mud how one would hook a jumbo codebase into the
         | REPL. If it's possible, that would be a game changer.
        
           | jcelerier wrote:
           | I added LLVM JIT support to https://ossia.io a few years ago,
           | it's not too bad, but a big issue is that the JIT does not
           | support all the necessary features used by the frontend in
           | terms of relocations, etc. So it happens relatively often
           | that C++ code will compile to LLVM IR without issue, but then
           | fail at the JIT step because some relocation is not supported
           | by the JIT engine yet.
           | 
           | Most of the code is here : https://github.com/ossia/score/tre
           | e/master/src/plugins/score... with the actual LLVM API
           | interoperation contained there : https://github.com/ossia/sco
           | re/tree/master/src/plugins/score...
           | 
           | It's been used for fun projects, for instance for this paper
           | about data sonification : https://www.researchgate.net/profil
           | e/Maxime_Poret/publicatio...
        
       | ranger_danger wrote:
       | This is really cool...
       | 
       | https://github.com/vpand/icpp-qt
       | 
       | Ok _now_ things are getting interesting. I think this could be
       | used to add easily shareable /hackable plugins to existing C++
       | projects.
        
         | fluoridation wrote:
         | I've done it by embedding libclang into an executable. You
         | still have to be really careful to keep ABI compatibility
         | between the host and the JITed plugin, if you want to send and
         | receive complex C++ objects. Most likely you'll need to set up
         | a simple C ABI and reconstruct the objects on either side of
         | the interface. The last thing you want is to send std::string
         | across a DLL boundary.
        
       | Jeaye wrote:
       | Along the lines of scripting is interactive programming. I'm
       | working on a native Clojure dialect on LLVM with C++ interop,
       | qalled jank. It can JIT compile C++ code, can be embedded into
       | any C++-compatible application, and is a full Clojure dialect
       | which doesn't hid any of its C++ runtinme. So you can do inline
       | C++, compile C++ sources alongside your jank. and require them
       | like a normal Clojure namespace. Worth a look if you're using C++
       | but you're craving something more interactive. https://jank-
       | lang.org/
        
       | mindblah wrote:
       | Folks who like this kind of thing should definitely check out
       | CERN's Root framework. I've been using its C++ interpreter in a
       | Jupyter notebook environment to learn C++. It's probably also
       | quite a bit more mature than this project. https://root.cern/
        
       ___________________________________________________________________
       (page generated 2024-08-08 23:01 UTC)