[HN Gopher] Native Reflection in Rust
___________________________________________________________________
Native Reflection in Rust
Author : jswrenn
Score : 199 points
Date : 2022-12-15 15:54 UTC (7 hours ago)
(HTM) web link (jack.wrenn.fyi)
(TXT) w3m dump (jack.wrenn.fyi)
| unconed wrote:
| My version of Greenspun's Tenth [1] is that any sufficiently
| complex static language contains an adhoc, informally specified,
| bug ridden and slow version of a dynamic "any" type.
|
| Thx OP for providing an example.
|
| [1] https://en.wikipedia.org/wiki/Greenspun's_tenth_rule
| kibwen wrote:
| Rust has a dynamic any type, `std::any::Any`.
| 8jy89hui wrote:
| This is a beautiful (hacky) demo of something that I didn't think
| was possible in Rust (yet). I hope other applications don't
| accidentally start using it just to discover that it doesn't work
| in release mode.
|
| Very impressive work!
| jswrenn wrote:
| Oh, I should add a note about that. Fortunately, it's quite
| easy to tell Rust to generate debuginfo even in release mode.
| bouk wrote:
| It would be really cool if it was possible to natively inspect
| the state of a Rust generator in a type-safe way
| Animats wrote:
| _" When you call .reflect on a dyn Reflect value, deflect figures
| out its concrete type in four steps:"_
|
| * _invokes local_type_id to get the memory address of your
| value's static implementation of local_type_id_
|
| * _maps that memory address to an offset in your application's
| binary_
|
| * _searches your application's debug info for the entry
| describing the function at that offset_
|
| * _parses that debugging information entry (DIE) to determine the
| type of local_type_id's &self parameter_.
|
| This is a rather strange thing to bolt onto a language. I could
| see this as an external tool. The use case seems to be programs
| which used "async" so much they can't figure out the resulting
| state machine. External debug tools to view and examine the async
| state machine might be helpful.
|
| My experience with Rust has been that debugging of safe code is
| just not a problem. Print statements and logging are enough.
| pcwalton wrote:
| > This is a rather strange thing to bolt onto a language. I
| could see this as an external tool.
|
| It _is_ an external tool. This is a crate, not a part of the
| compiler.
| loeg wrote:
| > This is a rather strange thing to bolt onto a language.
|
| It can just be an extremely fun and cute demo, without
| practical application.
| jerf wrote:
| It can also be something that looks cool and doesn't
| necessarily ever get past "kinda works", but piques the
| interest of the core dev team and they take steps to make it
| work even better, resulting in the ultimate "deprecation" of
| this sort of thing by virtue of it being even better
| integrated into the core.
|
| I don't have the context to judge the probability of that in
| this specific case (lots of technical nitty-gritty comes in
| to this sort of thing), but I've certainly seen similar
| things happen in other communities.
| More-nitors wrote:
| how about adding this to debuggers for better object-views?
| (could it be possible to provide near-js/python/java level of
| obj view?)
| gpderetta wrote:
| Thus is already using DWARF debug infos. Using this for
| debugging would be a long way around to arrive where you
| started
|
| You can already script gdb to provide rich views of any
| data structure.
| olvy0 wrote:
| I've used very similar method, at work, to provide C++
| "reflection" between my own system and a system from another
| team.
|
| Basically, the other system is a dynamic library which sends and
| receives C structures from my application. Those structures are
| then mapped into a buffer that is supposed to have the same size
| and there are pointers with metadata pointing into the buffer
| that are supposed to be exactly like the struct elements. Those
| structures can have arbitrary complexity, and are passed around
| through type erasure (essentially char*).
|
| I wrote a "reflection" code for the other team, which runs when
| they register the struct instance to be sent, checks if there's a
| matching PDB [0] around, reads it, and outputs a json including
| the metadata needed, which can then be used to define the
| structures' metadata on our side correctly.
|
| This is all in C/C++ since in some contexts we have soft real-
| time requirements, else I would have used any of the many RPC
| frameworks available.
|
| This has been working for several years now.
|
| This is not a generic solution but it's good enough for in-house
| communication between 2 systems that are maintained by different
| parts of the organization, where the API between them, that like
| I said is based on passing around char* buffers, has been more or
| less set in stone a long time ago. Conway's law [1] and all that.
| Sigh.
|
| [0] We are a Windows shop although the same thing should work
| with DWARF info, same as the OP library works. In fact he says
| "It may never work on Windows, which does not use DWARF to encode
| debug info" but I can say that the same approach does work on
| Windows, for C++ at least. The PDB format might be a tad
| undocumented, but its documentation has been improved in the last
| decade or so since I started working on my library. Writing some
| small test programs is enough to understand how to access it, if
| all you need is meta info on C-style structures. Other stuff is
| more... challenging. But it wasn't necessary for my use-case.
|
| [1] https://en.wikipedia.org/wiki/Conway%27s_law
| kp995 wrote:
| Can't we rely more on Rust's Pattern Matching and it's strong
| type system?
|
| Reflection seems more helpful when the programming language is
| little unsounded.
| jswrenn wrote:
| Absolutely! That's the approach that frunk [0] takes. Frunk
| (and other reflection libraries like it) are suitable for most
| use cases, and make better use of Rust's affordances.
|
| My crate is suitable for cases where you cannot know (or
| control) the set of types you might need to reflect on in
| advance. It's primary use-cases are related to debugging.
|
| [0]: https://docs.rs/frunk
| halfmatthalfcat wrote:
| Is Frunk Rust's Shapeless (from Scala)?
| jswrenn wrote:
| Yep!
| Thaxll wrote:
| Today I learn that Rust does not have reflection.
| estebank wrote:
| Reflection is usually not available in AoT compiled languages.
| The prevalent Rust coding styles rely heavily on monomorphic
| data types and functions, meaning there's nothing _left_ to
| reflect at runtime. But if you want to deal with trait objects
| and need to access the underlying type, you need to use
| Any::downcast or rely on annotations on every type you want to
| reflect on. Or now, leverage DWARF info on Linux with deflect.
| omginternets wrote:
| What are monomorphic data types? What should be my first read
| on the subject?
| estebank wrote:
| It's a fancy way of saying "every time this type is used,
| replace all the generic type params with what was used and
| generate code for it". It's how generics are implemented in
| Rust. If you have struct Foo<T>(T);
|
| And you create Foo(42i32) and Foo(0.0f64), the compiler
| will create the equivalent to struct
| Fooi32(i32); struct Foof64(f64);
|
| In other languages like Java, generics are implemented the
| way that Rust does "trait objects" (&dyn Trait).
|
| Rust is not the only language that does this, to be clear.
|
| If you're interested in a quick intro on the _compiler_
| side of this, you can read https://rustc-dev-guide.rust-
| lang.org/backend/monomorph.html
| shpongled wrote:
| Nice examples - you can also have languages (like SML)
| where monomorphization is simply an implementation
| detail. Some compilers (e.g., MLton) perform
| monomorphization and others don't.
| yakubin wrote:
| That depends on what you mean. SML has "polymorphism"
| boiling down to being able to plug an arbitrary type in
| some places, which is denoted like _' a_. But when people
| talk about generics, they more often talk about C++
| templates, Java generics, Rust traits, etc. whose SML
| equivalent are signatures, structs and functors.
| Signatures are a bit like Rust traits, structs are a bit
| like Rust implementations of traits, whereas functors are
| like Rust's "templates", i.e. wherever you swap angle
| brackets to parametrise something with types constrained
| by traits, or values constrained by types. Except in Rust
| this parametrisation can be slapped on a bunch of things.
| It can be on structs, on functions, on traits, on
| implementations of traits etc. In SML you need to group
| all the "parametrised" things into a struct (and a
| corresponding signature), which is going to be returned
| by a functor.
|
| And now the thing is: with transparent signature
| ascriptions, functors are monomorphised in SML, instead
| of everything being hidden behind signatures (as is in
| the case of Rust with traits when you use _dyn_ ), which
| has semantic consequences. E.g. a struct returned by a
| functor may contain a type. You can't perform proper
| type-checking without monomorphising, because you don't
| know what the exact type is. E.g. in the following
| program, the final line couldn't be type-checked without
| monomorphisation: signature ITERABLE =
| sig type ElemT type SrcT
| val new_iter: SrcT -> unit -> ElemT option end
| signature LIST_ELEM_TYPE = sig type T
| end functor ListIterFun (ListElemType:
| LIST_ELEM_TYPE): ITERABLE = struct type ElemT
| = ListElemType.T type SrcT = ElemT list
| fun new_iter l = let val lr = ref l
| in fn () => case !lr of
| nil => NONE |
| (x::xs) => (lr := xs; SOME x)
| end end structure
| IntElemType: LIST_ELEM_TYPE = struct type T =
| int end structure IntListIter =
| ListIterFun(IntElemType) val next =
| IntListIter.new_iter [1, 2, 3, 4, 5]
|
| If I change the signature ascription on ListIterFun to an
| opaque ascription ( _: > ITERABLE_), the final line won't
| type-check, because it's not obvious from the signature,
| that ElemT is int. So transparent signature ascriptions
| require monomorphisation (Rust traits without _dyn), and
| opaque signature ascriptions free the compiler from
| having to do monomorphisation (Rust traits with_ dyn*).
|
| There was a lot of discussion of this issue when Go was
| settling on a design for its generics, under the phrase
| "reified generics".
| codeflo wrote:
| I only recently realized that certain type system
| features, like polymorphic recursion, make
| monomorphization impossible in the general case. In
| Haskell for example, it's by necessity only an
| optimization that's used where applicable, and not the
| general strategy.
| gloryjulio wrote:
| I think cpp does this too
| estebank wrote:
| It indeed does. The only difference is that Rust has
| traits (similar to C++'s concepts) which require explicit
| mention of what interface the type parameters have inside
| the function, whereas C++'s templates will have a compile
| error _after_ instantiation if you passed something that
| didn 't meet the expected contract. This is closer to
| Rust's macros in operation.
|
| Given fn foo<T>(a: T, b: T) -> T { a +
| b }
|
| The compiler will complain that you should have been
| explicit on how T is going to be used:
| error[E0369]: cannot add `T` to `T` -->
| src/lib.rs:1:32 | 1 | fn foo<T>(a: T,
| b: T) -> T { a + b } |
| - ^ - T | |
| | T |
| help: consider restricting type parameter `T` |
| 1 | fn foo<T: Add<Output = T>>(a: T, b: T) -> T { a + b }
| | +++++++++++++++++
|
| whereas in C++ this would have been accepted _until_ you
| called foo with two things that couldn 't be added
| together, like a Rust macro[1].
|
| [1]: https://play.rust-
| lang.org/?version=nightly&mode=debug&editi...
| codeflo wrote:
| To add to this, even the Foo-wrapper is gone, just the
| i32 remains. Rust values are amorphous data blobs at
| runtime.
| CryZe wrote:
| ABI wise that is not true though. structs have struct
| ABI, even just a newtype struct around an integer will
| not use integer ABI unless annotated with
| #[repr(transparent)].
| estebank wrote:
| Yes, that's true but that is an implementation detail
| that only comes into play when dealing with ABI, and
| _then_ you should be using #[repr(transparent)] to ensure
| that the compiler won 't do something else :)
| codeflo wrote:
| Sure, it's good to point out the difference between "the
| behavior of a typical optimizing compiler" and "things
| actually guaranteed by the language". The context of the
| discussion was the former, I think. I'm not even that
| certain that monomorphization is actually required in
| theory.
| estebank wrote:
| Yes, monomorphization isn't _needed_ in theory, as long
| as the user-visible behavior remains the same, and in
| practice the team is exploring options[1] to identify
| cases where the currently manual practice of writing
| pub fn foo<T: AsRef<X>>(x: T) {
| inner_foo(x.as_ref()); } fn inner_foo(_:
| &X) { todo!() }
|
| can be instead done by the compiler automatically
| (turning monomorphized code back into polymorphic code,
| hence the polimorphization hame).
|
| [1]: https://rustc-dev-guide.rust-
| lang.org/backend/monomorph.html...
| estebank wrote:
| Expanding on trait objects: these are implemented as
| "V-Tables", structs holding pointers to the trait's
| methods and to the underlying type. This means that if
| you _need_ to know what the underlying type, you have to
| do something fancy, usually referred to as "reflection".
| Also, invocation of generic functions that use V-Tables
| require "chasing pointers", which makes cache locality
| worse (because data might not be in the same cache read
| as the v-table itself), but makes the generated binary
| smaller (because if you have something like Foo<T> used
| with 1000 types, with monomorphization you end up with
| 2000 generated types in the binary, instead of 1001 with
| trait objects).
| Joker_vD wrote:
| Pretty sure that some usage patterns of polymorphic types
| can not be completely monomorphized. Here's example in
| Golang: package main
| import ( "fmt" ) type
| wrapper[T any] struct { Value T }
| func (w wrapper[T]) String() string { return
| fmt.Sprintf("{%v}", w.Value) } func
| stringWrapped[T any](n int, v T) string { if
| n == 0 { return fmt.Sprintf("%v", v)
| } return stringWrapped(n-1, wrapper[T]{Value:
| v}) } func main() { n :=
| 0 fmt.Scanf("%d", &n) result :=
| stringWrapped(n, "test") fmt.Println(result)
| }
|
| Go refuses to compile because it can't possibly generate
| all instances of wrapper[T] that this program may use:
| wrapper[string], wrapper[wrapper[string]],
| wrapper[wrapper[wrapper[string]]], etc.
| estebank wrote:
| Rust will complain about a recursion limit being reached
| during instantiation[1]. The solution in Rust is to use
| &dyn Trait or Box<dyn Trait> instead.[2]
|
| [1]: https://play.rust-
| lang.org/?version=stable&mode=debug&editio...
|
| [2]: https://play.rust-
| lang.org/?version=stable&mode=debug&editio...
|
| ^ This blows the stack because it keeps calling itself
| with no break condition, but shows how the type system
| accepted the code.
| gpderetta wrote:
| I think this is called polymorphic recursion in Haskell
| circles.
|
| In C++ you can monomorphize as long as you can somehow
| prove the recursion terminates at compile time (for
| example by threading a static recursion counter).
| dgb23 wrote:
| Not exactly the same thing but JITs can turn dynamic
| objects into structs if the structure is consistent. JS
| runtimes and Julia do this as far as I know.
| adgjlsfhk1 wrote:
| Julia doesn't do this. It just has structs in the first
| place.
| mmis1000 wrote:
| Firefox's js runtime also do tricks like generate multi
| copy of optimized function when the function has multi
| call site instead make one with lots of if else. So it no
| longer suffer from the problem that function that
| frequently get multi different type of parameters from
| different call site has poor performance.
|
| It's probably exactly how templates work, except the
| details are invisible to users.
|
| https://hacks.mozilla.org/2020/11/warp-improved-js-
| performan...
| estebank wrote:
| Yes! Java as well. And this is how those languages can
| show impressive benchmarks for consistent workloads. In
| theory they can even surpass AoT languages. In practice
| it depends on the specifics.
| [deleted]
| planede wrote:
| That's runtime reflection.
|
| Compile time reflection AFAIK is available in D and Zig, and
| is planned for C++.
| elcritch wrote:
| That's right. Nim does as well. It's amazing. Once you get
| used to having CTTI and being able to use it, it's hard to
| program without it. Bonus points if you can do basic
| dependent types too.
|
| In C++ with SFINAE you can effectively do CTTI-style
| programming in C++. C++ has long had runtime type
| reflection as well (RTTI), though it needs to be compiled
| in. Looks like there's a boost library for CTTI.
| Conscat wrote:
| The C++ reflection improves a lot in C++20, but it's
| still very limited compared to that aspect of Nim, or
| even Zig. The std::meta::info and "splices" based on
| Haskell for C++26 are incredibly exciting to me. I have
| many use cases in mind. Splices in combination with
| std::embed will make C++ basically just a bad Racket (but
| one with inline assembly!).
| yakubin wrote:
| Yup. I consider runtime reflection an antifeature, which
| has negative performance effects, is unsafe (see e.g.
| log4j) and leads to fragile code.
|
| I would however welcome static reflection with open arms.
| In Rust in particular, I'd prefer it if derive was
| implemented using static reflection, rather than proc
| macros.
| nestorD wrote:
| The usual argument is that between having macro and focusing on
| a strong type system, there are very few legitimate usecase for
| reflection left in Rust.
| snordgren wrote:
| Rust has very little influence from reflection-heavy languages
| like Java and C#. On their list of influences
| (https://doc.rust-lang.org/reference/influences.html), Java is
| not even mentioned, and C# is only mentioned for its
| attributes. There is very little overlap between the design
| philosophies that influenced Rust and Java/C#.
|
| Ruts does not support inheritance either. But I have never
| missed either feature in a Rust program.
| Tuna-Fish wrote:
| Reflection is typically provided by a runtime, and languages
| that don't have runtimes usually don't have it. You shouldn't
| expect a low-level systems language to have reflection. There
| is no zero-cost way of implementing it.
| spacechild1 wrote:
| This is of course only true for runtime reflection. And which
| language does not have a runtime?
| Joker_vD wrote:
| Except Rust has runtime: [0]. And so, usually, does C (in
| hosted implementations).
|
| [0] https://doc.rust-lang.org/reference/runtime.html
| pornel wrote:
| These are a couple of functions executables can call at run
| time, but they're more like an extra standard library. It's
| not a runtime in the same sense as a runtime in dynamic or
| GC languages that manages all objects and is able to know
| types of arbitrary objects and inspect/trace them.
|
| Rust has no run-time type information except limited
| downcasts via `dyn Any` or explicitly derived traits on
| per-type basis, and these features compile to type-specific
| monomorphic code rather than calling some run-time
| reflection.
| throwaway894345 wrote:
| Pretty sure you don't need a runtime to track runtime
| type info. What we think of as a "runtime" in GC
| languages is usually several distinct things (a
| scheduler, a GC, and maybe some other stuff in the case
| of Java/.Net).
| [deleted]
| armchairhacker wrote:
| Does this still work if the application is complied in release
| mode or with optimizations?
|
| Even if not, this is still very useful for debugging
| jswrenn wrote:
| It only works if DWARF is generated. By default, the `release`
| profile of Cargo sets `debug = false` [0]. But, it's quite easy
| to override this setting, and have a build that is both
| optimized and includes debuginfo.
|
| [0]: https://doc.rust-
| lang.org/cargo/reference/profiles.html#rele...
| jeroenhd wrote:
| Does using DWARF info imply that this will break when you strip
| the resulting executable? I often strip my Rust binaries because
| it practically halves the application size, which can become
| quite a lot in a language where you're statically linking
| everything.
|
| Regardless, quite an ingenious use of standard ELF features, I
| didn't think this would be possible in Rust without adding some
| kind of VM around reflection code.
| jswrenn wrote:
| Yes, unfortunately that's a tradeoff here. Rust does support
| splitting debug info into other files, but Deflect doesn't
| support loading split debuginfo _yet_.
| HideousKojima wrote:
| C# has similar issues where they have to be conservative about
| what them trim from binaries for AoT in case it is used for
| reflection, so I imagine you'd run into the same issues for
| almost any compiled language you want to implement reflection
| for.
| davidhyde wrote:
| Great writeup! The defmt logging crate uses a linker script to
| extract debug symbols so that you get nicely formatted stack
| traces on embedded systems. It works on linux, macos and windows.
| I wonder if the same technique can be applied to this project. It
| needs a runner though so may not be the right approach.
|
| https://github.com/knurling-rs/defmt
___________________________________________________________________
(page generated 2022-12-15 23:00 UTC)