[HN Gopher] Untapped potential in Rust's type system
___________________________________________________________________
Untapped potential in Rust's type system
Author : lukastyrychtr
Score : 187 points
Date : 2021-06-13 07:37 UTC (1 days ago)
(HTM) web link (www.jakobmeier.ch)
(TXT) w3m dump (www.jakobmeier.ch)
| willeh wrote:
| Interesting article, but I think the key to writing idiomatic
| Rust is not to stretch what the type system can do but rather be
| happy at what It can express and avoid unnecessary abstraction.
| The compile-time guarantees that we have to prove in Rust, also
| serve to give a hint for when not to abstract.
| nine_k wrote:
| I think pushing the boundaries of what is idiomatic can
| sometimes be valuable. Look how much idiomatic C++ has changed
| since 1995. Or what is idiomatic in JavaScript now.
| fnord123 wrote:
| "sometimes" is doing a lot of heavy lifting here.
| klyrs wrote:
| This article is about curiosity, not best practices.
| marshray wrote:
| Sometimes life hands you lemons, and sometimes it hands you
| _Box <dyn Fruit>_.
| dshpala wrote:
| I've always wondered if it's possible to use Rust' ownership
| chains to attribute memory usage? I.e. calculate total bytes
| retained by an object?
| ashtonkem wrote:
| Shouldn't that be possible with any memory management system? I
| believe Java has a tree behind the scenes in memory for
| management.
| squiggleblaz wrote:
| The Java model ties one object to another. The rust model
| ties a consumed memory to a function. A debug log output that
| shows how much memory this function call required, in the
| same way it shows how much time it uses?
| mhh__ wrote:
| It didn't use ownership, and I don't think that's the paradigm,
| but I did something like this in D by tagging allocating
| members with a user defined attribute which counted in bytes
| the allocated memory the summed it for the whole tree
| (structure).
|
| Not worth the effort but it can be done in not much code
| Hywan wrote:
| At my work, we have developped `loupe`, a Rust crate that does
| precisely that, https://github.com/wasmerio/loupe/. I'm the
| main author of it.
|
| `loupe` provides the `MemoryUsage` trait; It allows to know the
| size of a value in bytes, recursively. So it traverses most of
| the types, and its fields or variants as deep as possible.
| Hopefully, it tracks already visited values so that it doesn't
| enter an infinite loupe loop.
|
| We are using it inside Wasmer, a WebAssembly runtime.
| simias wrote:
| I don't think ownership really lets you do anything more than
| what "sizeof" tells you in C. In particular:
|
| - Some types have dynamic memory allocation (Vec, HashMaps
| etc...), so those would have to be computed at runtime and can
| change at any moment.
|
| - Some types have shared ownership (Rc/Arc), so it's unclear
| how you would measure memory usage then.
|
| - Some types, especially in foreign interfaces, will
| effectively just hold a pointer to some black box data, you'd
| need a special API to figure out how much memory it hides. For
| instance what's the memory usage of a database handle or a JPEG
| compression library context?
|
| - When you care about memory usage things like fragmentation
| are usually very important, and the amount of memory used by a
| given object can be misleading. If you have a string that takes
| up 12 bytes but it's the only object left in the middle of a
| 4KiB page, then just counting "12 bytes" for this object is
| misleading because you have a huge fragmentation overhead.
|
| The only advantage of the borrow checker is that safe Rust
| forces you to make ownership relationships explicit but not all
| Rust code is safe and there are many escape hatches that muddy
| the water (like Rc/Arc mentioned above, but also threads and a
| few other things).
| squiggleblaz wrote:
| > - Some types have dynamic memory allocation (Vec, HashMaps
| etc...), so those would have to be computed at runtime and
| can change at any moment.
|
| I thought that's exactly the question that was being asked?
| Something like a htop for a rust program. This could be
| helpful in improving code efficiency.
|
| > - Some types have shared ownership (Rc/Arc), so it's
| unclear how you would measure memory usage then.
|
| Same way we measure the memory usage of programs running on
| systems that support shared libraries and mmapped data.
| Seeking perfection here is the obstactal; the goal is to
| attribute memory usage to an object (and preferably, a
| function call) so that we can improve its characteristics.
|
| > - Some types, especially in foreign interfaces, will
| effectively just hold a pointer to some black box data, you'd
| need a special API to figure out how much memory it hides.
| For instance what's the memory usage of a database handle or
| a JPEG compression library context?
|
| It's true that if you integrate with external systems, you
| don't get the benefits of the system you chose to make your
| home. This doesn't mean you should try to get as much benefit
| as possible.
|
| > - When you care about memory usage things like
| fragmentation are usually very important, and the amount of
| memory used by a given object can be misleading. If you have
| a string that takes up 12 bytes but it's the only object left
| in the middle of a 4KiB page, then just counting "12 bytes"
| for this object is misleading because you have a huge
| fragmentation overhead.
|
| This is interesting to me, and I don't know much about it. It
| sounds like it's not an actual limitation to the utility of
| the tool, but it should certainly guide how it is built and
| how its results are interpreted.
|
| > The only advantage of the borrow checker is that safe Rust
| forces you to make ownership relationships explicit but not
| all Rust code is safe and there are many escape hatches that
| muddy the water (like Rc/Arc mentioned above, but also
| threads and a few other things).
|
| In conclusion, I don't think that renders the activity
| pointless. The fact that hazy information is hazy and
| requires careful interpretation doesn't make it useless.
| People find test cases and static type checks useful even
| though they don't answer the question "is my code correct".
| And we're all the time relying on half measures that answer
| parts of the questions here (I once developed a program and
| used the load averages from uptime to tell me if it was
| performant enough :/). The pursuit of perfection is the enemy
| of improvement. It might be that there is some fatal flaw,
| but I don't think you've mentioned any here.
| josephg wrote:
| I don't think the ownership system gives you that directly. But
| I think you could make a trait with a #[derive] macro which
| could add up the owned memory of an object. And for complex
| objects, recursively calculate the owned memory for all of an
| object's fields.
| fanf2 wrote:
| I think the plan is to support this kind of thing with per-
| collection allocators - https://github.com/rust-lang/wg-
| allocators
| eptcyka wrote:
| An interesting, yet somewhat unrelated discussion could be had
| about cases where one might want the type ID to not change when
| the type changes structurally. There are many structures in many
| ABI interfaces that are structurally different between releases,
| yet still binary compatible. This particular characteristic has
| always been defined adhoc, and I do wonder what a more formal
| system might mean. This almost feels like the issue of identity
| in distributed systems.
| lasagnaphil wrote:
| Code generation probably works the best here, although it makes
| the build system more complex. You would have the typeid
| mappings as an actually existing file that you can commit into
| your git repo.
|
| Or if your language supports metaprogramming (something like
| Nim, Zig, or Jai), you could write compile-time code that maps
| each of your annotated types into deterministically-determined
| ids. (An example of this in Nim:
| https://gist.github.com/PhilipWitte/dd6c670fca3baf573490)
| zozbot234 wrote:
| > There are many structures in many ABI interfaces that are
| structurally different between releases, yet still binary
| compatible.
|
| These should be defined in the API as simple newtype wrappers
| over some basic binary-level type (generally either unsigned
| binary word or u8 array) with conversions to and from safer
| higher-level types defined in Rust code, leaving it to the
| compiler to optimize these to no-ops whenever possible. This
| ensures that "binary compatible" types also keep the same
| structural identity, and conversely, that binary incompatible
| types are automatically detected as well.
| demurgos wrote:
| I am currently working on a configuration manager (to get rid of
| Ansible for my needs). To support running code as different
| users, I spawn multiple processes. I want to be able to call
| functions across processes without limitations, and be forward-
| compatible with a system where the processes could actually be on
| different machines.
|
| I basically had similar goals and implemented it with a similar
| design as the one described in the last part. I had the same
| interrogations about what should should be captured by a
| universal type id. In my own code, I decide to punt the
| discussion and require the user code to pick a globally unique
| name. I planned initially on using Typetag [0], a library
| achieving the same result as `UniversalId` from the article. The
| reason why I couldn't use `Typetag` in the end is the lack of
| support for generics. Generics are a tough problem to deal with
| when deriving an id because it is not clear how they should
| influence it.
|
| In my case I have a common pattern where structs are defined as:
| Message<S: AsRef<str>> { message: S, }
|
| This allows me define methods on both borrowed (`Message<&str>`)
| and owned (`Message<String>`) versions without issues. When
| remaining inside the process, the borrowed version is passed
| around and when sent across the process boundaries I can
| deserialize it to an owned version. This pattern prevents me from
| using Typetag and I am still not sure how it should be solved.
|
| A related problem is also how the registry is built on the
| receiving end. In my project the registry is built manually,
| similarly to the example in the article. There are also crates
| such as Inventory [1] and Linkme [2]. Which allow to mark types
| at their definition point and then collect them in order to
| register them.
|
| [0]: https://github.com/dtolnay/typetag
|
| [1]: https://github.com/dtolnay/inventory
|
| [2]: https://github.com/dtolnay/linkme
| lpghatguy wrote:
| I've written a lot of structs that have "maybe borrowed"
| contents. I usually reach for std::borrow::Cow for this. It's
| not exactly the same: we need to branch at runtime instead of
| having multiple generated codepaths. struct
| Message<'a> { message: Cow<'a, str>, }
| demurgos wrote:
| I agree that in this case I can move to branch at runtime
| (the runtime cost is irrelevant for my use case), I still
| feel that it illustrates how it complicates deriving a
| universal type id.
|
| The more general issue is that removing generic requires the
| lib to pick the implementation and remove choice from the
| consumer. Sometimes it's not that important, sometimes it
| matters more. Another example I have in my project is that I
| have a trait describing a `User` (with e.g. `get_name`) it
| may be implemented as a `LinuxUser` or `WindowsUser`, each
| providing extra fields. How do you generate a universal id
| for a struct generic over a `User`?
| weltensturm wrote:
| Heh, this sounds just like the global event system I made for my
| D window manager:
| https://github.com/weltensturm/flatman/blob/master/common/ev...
| (which has some extra jingles with method annotations and event
| filters for convenience)
| richardwhiuk wrote:
| Note, you have to be quite careful with typeid - you can end up
| with types not matching because you have two different versions
| of a crate.
|
| Normally this would be a compile time error, but using typeid can
| turn this into a runtime error.
|
| https://docs.rs/assert-type-eq/0.1.0/assert_type_eq/ allows you
| to assert that two types are the same, forcing a compile time
| error for this scenario.
| simiones wrote:
| Off-topic mostly, but is this sort of "single line"
| function/macro typical for Rust crates? This seems like
| something I would expect from a Node library, not a serious
| programming language community that cares about robust
| software.
| kzrdude wrote:
| It is relatively common to have narrow scope crates in Rust.
| SpaceNugget wrote:
| That isn't a single line macro. Did you mean a single macro
| crate? Most crates in rust have more than a single
| macro/function, but some do. Regardless, it's not at all
| related to how "robust" or not the language ecosystem is.
| JeremyBanks wrote:
| Take a look at the code:
| https://github.com/Metaswitch/assert-type-eq-
| rs/blob/master/...
|
| I'd rather not write that myself.
| roblabla wrote:
| Macros can be quite hard to understand, so having a simple
| crate wrapping it with extensive documentation and tests can
| be quite beneficial. And yes, this also applies to node. The
| attack is entirely unnecessary.
| [deleted]
| nemetroid wrote:
| I think this is a valid question. C++ provides
| std::is_same<U, T>, why do Rust users need an external
| dependency (especially given the fact that the type system is
| one of Rust's major strengths)?
|
| My best guess, going by a comment in the source code, is that
| this macro will not be necessary in the future, and therefore
| makes less sense to put in the standard library:
|
| > Until RFC 1977 (public dependencies) is accepted, the
| situation where multiple different versions of the same crate
| are present is possible.
|
| https://github.com/rust-lang/rfcs/pull/1977
| steveklabnik wrote:
| Rust is pretty conservative with the standard library, and
| macros have a really high bar historically, because they
| were not namespaced.
___________________________________________________________________
(page generated 2021-06-14 23:02 UTC)