[HN Gopher] ICPP - Run C++ anywhere like a script
___________________________________________________________________
ICPP - Run C++ anywhere like a script
Author : davikr
Score : 77 points
Date : 2024-08-04 03:36 UTC (4 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| celrod wrote:
| How feasible would it be for something like gdb to be able to use
| a C++ interpreter (whether icpp, or even a souped up `constexpr`
| interpreter from the compiler) to help with "optimized out"
| functions?
|
| gdb also doesn't handle overloaded functions well, e.g. `x[i]`.
| mananaysiempre wrote:
| It does though? Just compiled a small program that creates a
| vector, and GDB is perfectly happy accessing it using this
| syntax. It will even print std::string's correctly if you cast
| them to const char* by hand. (Linux x86-64, GDB 14.2.)
| scintill76 wrote:
| > It will even print std::string's correctly if you cast them
| to const char* by hand
|
| What does that mean? I think `print str.c_str()` has worked
| for me in GDB before, but sounds like you did something
| different.
| mananaysiempre wrote:
| I was observing that `p (const char *)str` also worked in
| my experiment, but I'm far from a C++ expert and upon
| double-checking this seems to have been more of an accident
| than intended behaviour, because there is no operator
| const_pointer in basic_string that I can find. Definitely
| use `p str.c_str()`.
| twoodfin wrote:
| If your std::string was using a short string
| optimization, that would explain the "accident".
|
| Some implementations even put char[0] at the first byte
| in the optimized form.
| scintill76 wrote:
| That explanation doesn't work IMO, unless `str` is a
| std::string pointer, which is contrary to the syntax GP
| suggested with `str.c_str()`.
|
| It doesn't seem possible in actual C++ that the cast from
| non-pointer to pointer would work at all (even if a small
| string happens to be inlined at offset 0.) Like GP, I
| looked for a conversion operator, and I don't think it's
| there. Maybe it is a feature of the gdb parser.
| twoodfin wrote:
| Good point, but if it's a long string, 2/3 of the most
| common implementations would make the first word the
| c_str()-equivalent pointer:
|
| https://devblogs.microsoft.com/oldnewthing/20240510-00/?p
| =10...
| fluoridation wrote:
| So it's actually printing *(const char **)&s?
| twoodfin wrote:
| The first pointer-sized chunk of the string structure is
| a pointer to the C-string representation. So the cast
| works as written.
| fluoridation wrote:
| Well, no, because (const char *)str is nonsense, if str
| is an std::string.
| celrod wrote:
| I've defined a few pretty printers, but `operator[]` doesn't
| work for my user-defined types. Knowing it works for vectors,
| I'll try and experiment to see if there's something that'll
| make it work. (gdb) p unrolls_[0] Could
| not find operator[]. (gdb) p unrolls_[(long)0]
| Could not find operator[]. (gdb) p
| unrolls_.data_.mem[0] $2 = {
|
| `unrolls_[i]` works within C++. This `operator[]` method
| isn't even templated (although the container type is); the
| index is hard-coded to be of type `ptrdiff_t`, which is
| `long` on my platform.
|
| I'm on Linux, gdb 15.1.
| mananaysiempre wrote:
| > This `operator[]` method isn't even templated (although
| the container type is)
|
| That might be it. If that operator isn't actually ever
| emitted out of line, then GDB will (naturally) have nothing
| to call. If it helps, with the following program
| template<typename T> struct Foo { int
| operator[](long i) { return i * 3; } };
| Foo<bool> bar; template int
| Foo<bool>::operator[](long); // [*] int
| main(void) { Foo<int> foo;
| __asm__("int3"); return foo[19]; }
|
| compiled at -g -O0 I can both `p foo[19]` and `p bar[19]`,
| but if I comment out the explicit instantiation marked [*],
| the latter no longer works. At -g -O2, the former does not
| work because `foo` no longer actually exists, but the
| latter does, provided the instantiation is left in.
| tester756 wrote:
| this "optimized out" thing is bullshit as hell
| Conscat wrote:
| GDB does have hooks for interpreters to be executed within it,
| but I haven't managed to make this work. https://sourceware.org
| /gdb/current/onlinedocs/gdb.html/JIT-I....
| Klasiaster wrote:
| Related is cargo script for Rust: https://doc.rust-
| lang.org/cargo/reference/unstable.html#scri... (nightly only)
| ComputerGuru wrote:
| You can do it out of the box with rust, no need for any tools,
| because you can strategically mix shell and rust in the same
| code: https://neosmart.net/blog/self-compiling-rust-code/
| bsenftner wrote:
| Why interpret at all? Back in the mid to early 90's I started
| embedding C++ compilers into the game engines I wrote, where the
| "game scripting language" was just #define macros hiding the C++
| syntax so the game level developers, who worked in this script,
| could be basically anyone that could code. Their "script" would
| compile to a DLL that was hot loaded. What they were doing in
| their scripts would compile in under 5 seconds, and they were
| good to go. If they ran into problems, one of the game engine
| developers would just run their "script" in the IDE debugger.
|
| Borrowed this idea from Nothing Real, the developers of Shake,
| the video/film compositing system.
| jokoon wrote:
| I don't suppose modern os would let you do that today, sounds
| like a security nightmare
| shortrounddev2 wrote:
| This is just JIT'ing. It's used in python and lua commonly
| pjmlp wrote:
| I wish, JIT and Python still isn't something to brag about.
| greenavocado wrote:
| Python 3.13 has an experimental JIT compiler
| https://peps.python.org/pep-0744/
| pjmlp wrote:
| As I said, not something to brag about.
| fluoridation wrote:
| Yes, any modern OS lets any process load into its memory
| space binaries from anywhere the user has permissions, even
| if those are binaries it generated just now. It can be a
| security problem if the binaries are generated from untrusted
| sources (e.g. you download some, say, Haskell, compile it and
| run it fully automatically).
| shortrounddev2 wrote:
| Windows has a private heap system where you can disable
| code execution from the pages allocated to `HeapCreate`, if
| you don't* set that flag:
|
| https://learn.microsoft.com/en-
| us/windows/win32/api/heapapi/...
|
| > HEAP_CREATE_ENABLE_EXECUTE
|
| > 0x00040000
|
| > All memory blocks that are allocated from this heap allow
| code execution, if the hardware enforces data execution
| prevention. Use this flag heap in applications that run
| code from the heap. If HEAP_CREATE_ENABLE_EXECUTE is not
| specified and an application attempts to run code from a
| protected page, the application receives an exception with
| the status code STATUS_ACCESS_VIOLATION.
|
| I think POSIX has equivalent memory protection calls, but
| no equivalent to HeapCreate
| fluoridation wrote:
| But you can still call VirtualAlloc(), VirtualProtect(),
| and LoadLibrary(), so this isn't really a security
| mechanism, but more of a safety mechanism.
|
| I don't think Windows provides a mechanism to disable
| creating any further executable pages, although I've seen
| Chrome do it by hooking those functions (and I know it
| because I've had to bypass it :)).
| pjmlp wrote:
| App Sandboxing and VBS enclaves go onto that direction.
| shortrounddev2 wrote:
| I wouldn't expect windows to prevent creating further
| executable pages; there are legitimate use cases for
| creating dynamically allocated executable memory. It just
| means that whatever foreign data you load into those
| pages can't execute, which _is_ a security mechanism (for
| example, game save data can be loaded into these heaps so
| that you can load all game state but without the save
| file potentially running foreign code)
| fluoridation wrote:
| There are legitimate uses, but the point would be that
| the process could ask the system to lock it down with
| whatever executable code is already present. This could
| be used to prevent already running code from tampering
| with the process' behavior by loading new code, or to
| thwart code injection.
|
| >It just means that whatever foreign data you load into
| those pages can't execute, which is a security mechanism
| (for example, game save data can be loaded into these
| heaps so that you can load all game state but without the
| save file potentially running foreign code)
|
| But malloc() and all the other standard memory allocation
| functions already return pointers into non-executable
| pages, anyway. Perhaps those functions call into this one
| internally, but using this over whatever your language
| offers by default offers no additional protection.
| adzm wrote:
| All of this calls VirtualAlloc behind the scenes, and you
| can do that yourself as well for manual page allocation.
| Each page can have options set with VirtualProtect to
| allow or disallow execution of code within the pages as
| well.
| IshKebab wrote:
| iOS won't let you do that. Or at least Apple won't let you
| do that on iOS.
| fluoridation wrote:
| What's actually happening is that the SDK doesn't expose
| the system calls necessary to do it, but I can guarantee
| that if you can get a native binary to run on the device,
| you can have it do whatever you want. If that wasn't the
| case, the few apps that do support JITting wouldn't work.
| externedguy wrote:
| > Why interpret at all?
|
| a tiny part of points that come to my mind:
|
| - education
|
| - iteration on writing simple functionality
|
| - loading and trying out several APIs to see what's possible (I
| use it frequently with Elixir / Erlang for example)
|
| It makes life easier for newcomers to wrap their head around
| something and produce a good solution rather than a "working"
| one
| fluoridation wrote:
| But the "interpretation" is kind of in name only. When you
| run your program there's still going to be a compilation
| step, it's just the interpreter will merge it with the run
| step, and it will do it every time you run the program. I'm
| with the GP, I don't understand the advantage of this
| approach over traditional AOT compilation (actually this
| isn't even JIT, it's just deferred AOT).
| corysama wrote:
| And, then there's
|
| https://liveplusplus.tech/
|
| https://github.com/RuntimeCompiledCPlusPlus/RuntimeCompiledC...
|
| https://learn.microsoft.com/en-us/visualstudio/debugger/hot-...
| pragma_x wrote:
| Totally valid take. The answer is: it depends.
|
| The advantage of a lot of scripting tech is some form of REPL,
| which is really just a super-fast code-compile-run loop. In
| your example, "why?" boils down to how useful/painful those
| five seconds-per-change are. Maybe that adds up and slows the
| coder down, or maybe it's no big deal. It all kind of depends
| on the workflow and how fast you need to be able to iterate on
| code changes. Moving to a scripted interpreter would eliminate
| that wait period at the cost of runtime performance, which
| might be a valuable business tradeoff.
|
| FWIW, that "script" solution sounds awesome for the time. I'll
| add that five seconds to build a hot-loaded DLL in the 90's is
| really, really good performance for that solution, regardless
| of its role as a scripting alternative. Today, that would
| probably be mere milliseconds to compile - impossible to
| distinguish from an embedded LUA or JS solution.
| bobajeff wrote:
| I wonder if I can use this to learn a large c++ codebase like
| Chromium. One of the issues I had trying to learn chromium was
| that in order to play and experiment with their classes/functions
| I needed to spend several minutes to link my little test code
| with their static libraries just to be able to see if my
| understanding of them was correct. Which is just too long of time
| for such experiments so I gave up.
| hackit2 wrote:
| Last time I checked out Chromium code base it was about 300-400
| Megs of uncompressed cpp files. Lets not also forget the fact
| you also needed to run some code generator script that
| generated another 200 Megs of DOM files, or interface files. At
| that point in time I gave up and went to sleep and never
| touched it again.
| lioeters wrote:
| I really hope Ladybird is able to stay relatively small and
| approachable, it would be wonderful to have as a truly
| customizable open-source browser that's not a massive
| codebase that takes forever to compile and develop.
| hackit2 wrote:
| I do think there are some geniuses working on the Chromium
| code base, and I would imagine there are really good
| reasons for doing it their way. I would imagine also
| Ladybird over time will face the same problems, and come up
| with similar solutions as Chromium team.
|
| All I know is most of the large scale C/C++ code bases
| eventually become these monolithic giant code bases that
| require some really specialised software tools to compile
| and link.
| pjmlp wrote:
| Not only C and C++, I have seen this scale with most
| Fortune 500 projects I have been involved.
| pragma_x wrote:
| I agree on all fronts. This parallels the last time I looked
| at, and gave up building a backend for LLVM. And that was after
| giving up doing the same for GCC. Those codebases are
| _impenetrable_.
|
| It's clear as mud how one would hook a jumbo codebase into the
| REPL. If it's possible, that would be a game changer.
| jcelerier wrote:
| I added LLVM JIT support to https://ossia.io a few years ago,
| it's not too bad, but a big issue is that the JIT does not
| support all the necessary features used by the frontend in
| terms of relocations, etc. So it happens relatively often
| that C++ code will compile to LLVM IR without issue, but then
| fail at the JIT step because some relocation is not supported
| by the JIT engine yet.
|
| Most of the code is here : https://github.com/ossia/score/tre
| e/master/src/plugins/score... with the actual LLVM API
| interoperation contained there : https://github.com/ossia/sco
| re/tree/master/src/plugins/score...
|
| It's been used for fun projects, for instance for this paper
| about data sonification : https://www.researchgate.net/profil
| e/Maxime_Poret/publicatio...
| ranger_danger wrote:
| This is really cool...
|
| https://github.com/vpand/icpp-qt
|
| Ok _now_ things are getting interesting. I think this could be
| used to add easily shareable /hackable plugins to existing C++
| projects.
| fluoridation wrote:
| I've done it by embedding libclang into an executable. You
| still have to be really careful to keep ABI compatibility
| between the host and the JITed plugin, if you want to send and
| receive complex C++ objects. Most likely you'll need to set up
| a simple C ABI and reconstruct the objects on either side of
| the interface. The last thing you want is to send std::string
| across a DLL boundary.
| Jeaye wrote:
| Along the lines of scripting is interactive programming. I'm
| working on a native Clojure dialect on LLVM with C++ interop,
| qalled jank. It can JIT compile C++ code, can be embedded into
| any C++-compatible application, and is a full Clojure dialect
| which doesn't hid any of its C++ runtinme. So you can do inline
| C++, compile C++ sources alongside your jank. and require them
| like a normal Clojure namespace. Worth a look if you're using C++
| but you're craving something more interactive. https://jank-
| lang.org/
| mindblah wrote:
| Folks who like this kind of thing should definitely check out
| CERN's Root framework. I've been using its C++ interpreter in a
| Jupyter notebook environment to learn C++. It's probably also
| quite a bit more mature than this project. https://root.cern/
___________________________________________________________________
(page generated 2024-08-08 23:01 UTC)