[HN Gopher] Untapped potential in Rust's type system
       ___________________________________________________________________
        
       Untapped potential in Rust's type system
        
       Author : lukastyrychtr
       Score  : 187 points
       Date   : 2021-06-13 07:37 UTC (1 days ago)
        
 (HTM) web link (www.jakobmeier.ch)
 (TXT) w3m dump (www.jakobmeier.ch)
        
       | willeh wrote:
       | Interesting article, but I think the key to writing idiomatic
       | Rust is not to stretch what the type system can do but rather be
       | happy at what It can express and avoid unnecessary abstraction.
       | The compile-time guarantees that we have to prove in Rust, also
       | serve to give a hint for when not to abstract.
        
         | nine_k wrote:
         | I think pushing the boundaries of what is idiomatic can
         | sometimes be valuable. Look how much idiomatic C++ has changed
         | since 1995. Or what is idiomatic in JavaScript now.
        
           | fnord123 wrote:
           | "sometimes" is doing a lot of heavy lifting here.
        
         | klyrs wrote:
         | This article is about curiosity, not best practices.
        
         | marshray wrote:
         | Sometimes life hands you lemons, and sometimes it hands you
         | _Box <dyn Fruit>_.
        
       | dshpala wrote:
       | I've always wondered if it's possible to use Rust' ownership
       | chains to attribute memory usage? I.e. calculate total bytes
       | retained by an object?
        
         | ashtonkem wrote:
         | Shouldn't that be possible with any memory management system? I
         | believe Java has a tree behind the scenes in memory for
         | management.
        
           | squiggleblaz wrote:
           | The Java model ties one object to another. The rust model
           | ties a consumed memory to a function. A debug log output that
           | shows how much memory this function call required, in the
           | same way it shows how much time it uses?
        
         | mhh__ wrote:
         | It didn't use ownership, and I don't think that's the paradigm,
         | but I did something like this in D by tagging allocating
         | members with a user defined attribute which counted in bytes
         | the allocated memory the summed it for the whole tree
         | (structure).
         | 
         | Not worth the effort but it can be done in not much code
        
         | Hywan wrote:
         | At my work, we have developped `loupe`, a Rust crate that does
         | precisely that, https://github.com/wasmerio/loupe/. I'm the
         | main author of it.
         | 
         | `loupe` provides the `MemoryUsage` trait; It allows to know the
         | size of a value in bytes, recursively. So it traverses most of
         | the types, and its fields or variants as deep as possible.
         | Hopefully, it tracks already visited values so that it doesn't
         | enter an infinite loupe loop.
         | 
         | We are using it inside Wasmer, a WebAssembly runtime.
        
         | simias wrote:
         | I don't think ownership really lets you do anything more than
         | what "sizeof" tells you in C. In particular:
         | 
         | - Some types have dynamic memory allocation (Vec, HashMaps
         | etc...), so those would have to be computed at runtime and can
         | change at any moment.
         | 
         | - Some types have shared ownership (Rc/Arc), so it's unclear
         | how you would measure memory usage then.
         | 
         | - Some types, especially in foreign interfaces, will
         | effectively just hold a pointer to some black box data, you'd
         | need a special API to figure out how much memory it hides. For
         | instance what's the memory usage of a database handle or a JPEG
         | compression library context?
         | 
         | - When you care about memory usage things like fragmentation
         | are usually very important, and the amount of memory used by a
         | given object can be misleading. If you have a string that takes
         | up 12 bytes but it's the only object left in the middle of a
         | 4KiB page, then just counting "12 bytes" for this object is
         | misleading because you have a huge fragmentation overhead.
         | 
         | The only advantage of the borrow checker is that safe Rust
         | forces you to make ownership relationships explicit but not all
         | Rust code is safe and there are many escape hatches that muddy
         | the water (like Rc/Arc mentioned above, but also threads and a
         | few other things).
        
           | squiggleblaz wrote:
           | > - Some types have dynamic memory allocation (Vec, HashMaps
           | etc...), so those would have to be computed at runtime and
           | can change at any moment.
           | 
           | I thought that's exactly the question that was being asked?
           | Something like a htop for a rust program. This could be
           | helpful in improving code efficiency.
           | 
           | > - Some types have shared ownership (Rc/Arc), so it's
           | unclear how you would measure memory usage then.
           | 
           | Same way we measure the memory usage of programs running on
           | systems that support shared libraries and mmapped data.
           | Seeking perfection here is the obstactal; the goal is to
           | attribute memory usage to an object (and preferably, a
           | function call) so that we can improve its characteristics.
           | 
           | > - Some types, especially in foreign interfaces, will
           | effectively just hold a pointer to some black box data, you'd
           | need a special API to figure out how much memory it hides.
           | For instance what's the memory usage of a database handle or
           | a JPEG compression library context?
           | 
           | It's true that if you integrate with external systems, you
           | don't get the benefits of the system you chose to make your
           | home. This doesn't mean you should try to get as much benefit
           | as possible.
           | 
           | > - When you care about memory usage things like
           | fragmentation are usually very important, and the amount of
           | memory used by a given object can be misleading. If you have
           | a string that takes up 12 bytes but it's the only object left
           | in the middle of a 4KiB page, then just counting "12 bytes"
           | for this object is misleading because you have a huge
           | fragmentation overhead.
           | 
           | This is interesting to me, and I don't know much about it. It
           | sounds like it's not an actual limitation to the utility of
           | the tool, but it should certainly guide how it is built and
           | how its results are interpreted.
           | 
           | > The only advantage of the borrow checker is that safe Rust
           | forces you to make ownership relationships explicit but not
           | all Rust code is safe and there are many escape hatches that
           | muddy the water (like Rc/Arc mentioned above, but also
           | threads and a few other things).
           | 
           | In conclusion, I don't think that renders the activity
           | pointless. The fact that hazy information is hazy and
           | requires careful interpretation doesn't make it useless.
           | People find test cases and static type checks useful even
           | though they don't answer the question "is my code correct".
           | And we're all the time relying on half measures that answer
           | parts of the questions here (I once developed a program and
           | used the load averages from uptime to tell me if it was
           | performant enough :/). The pursuit of perfection is the enemy
           | of improvement. It might be that there is some fatal flaw,
           | but I don't think you've mentioned any here.
        
         | josephg wrote:
         | I don't think the ownership system gives you that directly. But
         | I think you could make a trait with a #[derive] macro which
         | could add up the owned memory of an object. And for complex
         | objects, recursively calculate the owned memory for all of an
         | object's fields.
        
         | fanf2 wrote:
         | I think the plan is to support this kind of thing with per-
         | collection allocators - https://github.com/rust-lang/wg-
         | allocators
        
       | eptcyka wrote:
       | An interesting, yet somewhat unrelated discussion could be had
       | about cases where one might want the type ID to not change when
       | the type changes structurally. There are many structures in many
       | ABI interfaces that are structurally different between releases,
       | yet still binary compatible. This particular characteristic has
       | always been defined adhoc, and I do wonder what a more formal
       | system might mean. This almost feels like the issue of identity
       | in distributed systems.
        
         | lasagnaphil wrote:
         | Code generation probably works the best here, although it makes
         | the build system more complex. You would have the typeid
         | mappings as an actually existing file that you can commit into
         | your git repo.
         | 
         | Or if your language supports metaprogramming (something like
         | Nim, Zig, or Jai), you could write compile-time code that maps
         | each of your annotated types into deterministically-determined
         | ids. (An example of this in Nim:
         | https://gist.github.com/PhilipWitte/dd6c670fca3baf573490)
        
         | zozbot234 wrote:
         | > There are many structures in many ABI interfaces that are
         | structurally different between releases, yet still binary
         | compatible.
         | 
         | These should be defined in the API as simple newtype wrappers
         | over some basic binary-level type (generally either unsigned
         | binary word or u8 array) with conversions to and from safer
         | higher-level types defined in Rust code, leaving it to the
         | compiler to optimize these to no-ops whenever possible. This
         | ensures that "binary compatible" types also keep the same
         | structural identity, and conversely, that binary incompatible
         | types are automatically detected as well.
        
       | demurgos wrote:
       | I am currently working on a configuration manager (to get rid of
       | Ansible for my needs). To support running code as different
       | users, I spawn multiple processes. I want to be able to call
       | functions across processes without limitations, and be forward-
       | compatible with a system where the processes could actually be on
       | different machines.
       | 
       | I basically had similar goals and implemented it with a similar
       | design as the one described in the last part. I had the same
       | interrogations about what should should be captured by a
       | universal type id. In my own code, I decide to punt the
       | discussion and require the user code to pick a globally unique
       | name. I planned initially on using Typetag [0], a library
       | achieving the same result as `UniversalId` from the article. The
       | reason why I couldn't use `Typetag` in the end is the lack of
       | support for generics. Generics are a tough problem to deal with
       | when deriving an id because it is not clear how they should
       | influence it.
       | 
       | In my case I have a common pattern where structs are defined as:
       | Message<S: AsRef<str>> {             message: S,         }
       | 
       | This allows me define methods on both borrowed (`Message<&str>`)
       | and owned (`Message<String>`) versions without issues. When
       | remaining inside the process, the borrowed version is passed
       | around and when sent across the process boundaries I can
       | deserialize it to an owned version. This pattern prevents me from
       | using Typetag and I am still not sure how it should be solved.
       | 
       | A related problem is also how the registry is built on the
       | receiving end. In my project the registry is built manually,
       | similarly to the example in the article. There are also crates
       | such as Inventory [1] and Linkme [2]. Which allow to mark types
       | at their definition point and then collect them in order to
       | register them.
       | 
       | [0]: https://github.com/dtolnay/typetag
       | 
       | [1]: https://github.com/dtolnay/inventory
       | 
       | [2]: https://github.com/dtolnay/linkme
        
         | lpghatguy wrote:
         | I've written a lot of structs that have "maybe borrowed"
         | contents. I usually reach for std::borrow::Cow for this. It's
         | not exactly the same: we need to branch at runtime instead of
         | having multiple generated codepaths.                   struct
         | Message<'a> {             message: Cow<'a, str>,         }
        
           | demurgos wrote:
           | I agree that in this case I can move to branch at runtime
           | (the runtime cost is irrelevant for my use case), I still
           | feel that it illustrates how it complicates deriving a
           | universal type id.
           | 
           | The more general issue is that removing generic requires the
           | lib to pick the implementation and remove choice from the
           | consumer. Sometimes it's not that important, sometimes it
           | matters more. Another example I have in my project is that I
           | have a trait describing a `User` (with e.g. `get_name`) it
           | may be implemented as a `LinuxUser` or `WindowsUser`, each
           | providing extra fields. How do you generate a universal id
           | for a struct generic over a `User`?
        
       | weltensturm wrote:
       | Heh, this sounds just like the global event system I made for my
       | D window manager:
       | https://github.com/weltensturm/flatman/blob/master/common/ev...
       | (which has some extra jingles with method annotations and event
       | filters for convenience)
        
       | richardwhiuk wrote:
       | Note, you have to be quite careful with typeid - you can end up
       | with types not matching because you have two different versions
       | of a crate.
       | 
       | Normally this would be a compile time error, but using typeid can
       | turn this into a runtime error.
       | 
       | https://docs.rs/assert-type-eq/0.1.0/assert_type_eq/ allows you
       | to assert that two types are the same, forcing a compile time
       | error for this scenario.
        
         | simiones wrote:
         | Off-topic mostly, but is this sort of "single line"
         | function/macro typical for Rust crates? This seems like
         | something I would expect from a Node library, not a serious
         | programming language community that cares about robust
         | software.
        
           | kzrdude wrote:
           | It is relatively common to have narrow scope crates in Rust.
        
           | SpaceNugget wrote:
           | That isn't a single line macro. Did you mean a single macro
           | crate? Most crates in rust have more than a single
           | macro/function, but some do. Regardless, it's not at all
           | related to how "robust" or not the language ecosystem is.
        
           | JeremyBanks wrote:
           | Take a look at the code:
           | https://github.com/Metaswitch/assert-type-eq-
           | rs/blob/master/...
           | 
           | I'd rather not write that myself.
        
           | roblabla wrote:
           | Macros can be quite hard to understand, so having a simple
           | crate wrapping it with extensive documentation and tests can
           | be quite beneficial. And yes, this also applies to node. The
           | attack is entirely unnecessary.
        
           | [deleted]
        
           | nemetroid wrote:
           | I think this is a valid question. C++ provides
           | std::is_same<U, T>, why do Rust users need an external
           | dependency (especially given the fact that the type system is
           | one of Rust's major strengths)?
           | 
           | My best guess, going by a comment in the source code, is that
           | this macro will not be necessary in the future, and therefore
           | makes less sense to put in the standard library:
           | 
           | > Until RFC 1977 (public dependencies) is accepted, the
           | situation where multiple different versions of the same crate
           | are present is possible.
           | 
           | https://github.com/rust-lang/rfcs/pull/1977
        
             | steveklabnik wrote:
             | Rust is pretty conservative with the standard library, and
             | macros have a really high bar historically, because they
             | were not namespaced.
        
       ___________________________________________________________________
       (page generated 2021-06-14 23:02 UTC)