[HN Gopher] The weird of function-local types in Rust
       ___________________________________________________________________
        
       The weird of function-local types in Rust
        
       Author : lukastyrychtr
       Score  : 101 points
       Date   : 2024-08-17 08:54 UTC (4 days ago)
        
 (HTM) web link (elastio.github.io)
 (TXT) w3m dump (elastio.github.io)
        
       | jerf wrote:
       | I have found, across several languages I've used, that types
       | embedded into functions are generally a bad idea, and I think the
       | general principle is that types generally end up needing to be
       | exposed to any code that will also test that code. So, for
       | instance, it's fine to confine types to some particular module,
       | as long as those types are internal-only, but confining them
       | within functions generally becomes a bad idea.
       | 
       | I know the complaints many of you are gearing up to type, but my
       | statement is a bit more complicated than you may have realized on
       | first read; the key is the word "becoming", that I'm looking at
       | the lifetime of the code and not a snapshot. The problem with
       | embedding types into those smaller scopes is that while it may
       | work at first... of course it does, it compiles, right?... they
       | become an impediment to a number of operations over time. First,
       | as I mentioned, testing is very likely _at some point over the
       | evolution of the module_ to want to either provide input or
       | examine output, intermediate or otherwise, that exists in those
       | types. Second, as the code grows, you want to be able to refactor
       | things freely, and types embedded in functions form a barrier to
       | refactoring because to refactor you 'll have to do something to
       | expose that type now to multiple functions. You do not want
       | barriers to refactoring. Barriers to refactoring are a bigger
       | expense over the long term than any small local gain from putting
       | a type _here_ instead of _there_ , especially when anyone should
       | have "Jump to Definition" readily available in this post-LSP era.
       | 
       | Considered over time, over the evolution of the code base, I've
       | just never had any super-local types like this "survive". Every
       | time I think I've found an exception, I've either had my test
       | code or the desire to refactor force me to lift it to the module
       | level. So I just start there now.
       | 
       | To the extent there is an exception, testing-only code may be.
       | Testing-only code has very different constraints than production
       | code anyhow. Even then, though, I still find that refactoring
       | problem arises, and test code needs to be refactorable too.
       | 
       | On the plus side, while I label them "a bad idea", they are not a
       | "bad idea" that destroys your code base or anything. On the grand
       | scale of "bad ideas" in code, this is down in the "inconvenience"
       | part of the scale. It is almost self-evidently not some sort of
       | disaster and I am not claiming it is. You can always lift it out
       | and move on. But it is one of the many little hygiene habits that
       | add up that helps keep code fluid and refactoring always
       | available to me at a minimum activation-energy cost, because that
       | is really important.
       | 
       | (This applies specifically to types that you explicitly define.
       | You can in Haskell, for instance, bash a new type together
       | anywhere simply by creating a tuple (x, y). But this doesn't
       | trigger what I'm talking about because any other bit of the code
       | can bash the exact same type together simply by creating another
       | tuple of the same type, and they'll unify just fine without
       | having to share a type definition in common. No impediment of any
       | kind is created by a new tuple type in that language.)
        
         | chowells wrote:
         | Sometimes I think I want local types in Haskell. I'm creating
         | some new type and some instances for it solely for this
         | function, why does it need global scope?
         | 
         | Then I get around to remembering module scope is the actual
         | important thing in Haskell, make sure that type isn't exported
         | from the primary API modules, and get on with life.
        
           | saghm wrote:
           | Yeah, I think it's pretty rare to actually need a "local"
           | type as opposed to just making it private to a module in Rust
           | as well. The use case the article gives is one of the few
           | where it I could see it being useful; if you're using a macro
           | in the body of a function and want it to generate a type,
           | there's not anywhere else you can define it (without an
           | additional macro invocation outside the function, which often
           | defeats the purpose of trying to wrap up all the boilerplate
           | into one concise place).
        
         | Joker_vD wrote:
         | One of the somewhat useful (but still mind-boggling) uses of
         | local types I've encountered was in Go, writing a custom
         | unmarshaller:                   func (s *MyLovelyStruct)
         | UnmarshalJSON(b []byte) error {             err :=
         | json.Unmarshal(b, s)             if err != nil {
         | return err             }                  return
         | validateAndMassageMyLovelyStruct(s)         }
         | 
         | I want to re-use the default "dump JSON key/values into the
         | struct's fields" logic, then add something on top of it. But as
         | written, this method will blow up with stack overflow because
         | json.Unmarshal(b, s) will call s.UnmarshalJSON(b), if it can.
         | So what you can do is this:                   func (s
         | *MyLovelyStruct) UnmarshalJSON(b []byte) error {
         | type IncognitoStruct MyLovelyStruct             tmp :=
         | IncognitoStruct{}                  err := json.Unmarshal(b,
         | &tmp)             if err != nil {                 return err
         | }                  *s = MyLovelyStruct(tmp)             return
         | validateAndMassageMyLovelyStruct(s)         }
         | 
         | The IncognitoStruct, even if it has exact same fields as
         | MyLovelyStruct (and is castable to it), does not have any of
         | its methods, so json.Unmarshal(b, &tmp) does not recursively
         | call this UnmarshalJSON() method.
         | 
         | But even that uses type _aliases_ , not the proper, data-
         | holding, types themselves. I never found any motivation to use
         | those; the package-local types are quite enough.
        
           | nkozyra wrote:
           | Wouldn't generics in the validateAndMassageMyLovelyStruct()
           | function avoid this kind of workaround?
        
             | jerf wrote:
             | No, because the method infinitely recurses before then.
        
           | jerf wrote:
           | Yeah, that's one in my bag-of-JSON-tricks. (Hopefully my bag-
           | of-JSON-tricks gets at least a little less populated with
           | json v2.) This is also useful if you want to selectively
           | override a particular Unmarshal for any other reason, which
           | is what I needed it for. But then I needed to customize the
           | Unmarshal and back to a top-level type it went. :)
           | 
           | But this sort of thing is why I tried to emphasize at the end
           | that I'm not trying to "ring the alarm bell" or anything.
           | When it works, it works, and it's not like I would call what
           | you have there Bad Code or anything. I personally would have
           | pulled it to a top level type immediately and that's just a
           | preference, not something I'd go to bat over in a code review
           | or anything.
        
           | oefrha wrote:
           | Sometimes you just want to JSON-encode a []struct{<ad hoc
           | stuff>} or something like that, so it's entirely reasonable
           | to use a func-local named type rather than repeating the
           | anonymous struct.
           | 
           | And to gp: if your func local type ends up observable and
           | even testable, of course it shouldn't be func local.
           | Otherwise you're describing testing implementation rather
           | than behavior, indicating you're writing bad tests.
        
             | jerf wrote:
             | "Otherwise you're describing testing implementation rather
             | than behavior, indicating you're writing bad tests."
             | 
             | Yeah, people have been threatening me for decades with the
             | claim that if I write tests to test internals I'll have to
             | refactor like crazy someday. I'm still waiting for someday
             | to come. Meanwhile, it has caught a lot of bugs.
             | 
             | I'm open to the possibility that there's something
             | different about the way I write code that causes me to not
             | have this problem. Stay tuned to my blog over the next
             | couple of months if that intrigues you. In the meantime, as
             | reality fails to correspond to theory, I go with reality.
        
           | masklinn wrote:
           | I don't know how common it is in the wild, but deserializing
           | to a function-local type is routinely used by Serde's
           | documentation for examples e.g. https://serde.rs/deserialize-
           | struct.html
        
         | lesuorac wrote:
         | In general sure.
         | 
         | However, in a specific case, I find local types useful to do
         | that for code that makes JSON requests to another service. You
         | could care that the request has a certain intermediate
         | structure but serialization is fairly deterministic so I don't
         | see an advantage of that over just testing you send a HTTP
         | request with a valid (String) body.
         | 
         | If you control that endpoint then it would make sense so share
         | that type between them but if you don't then might as well make
         | the type scoped to just that method.
        
         | edflsafoiewq wrote:
         | I usually see them used for drop guards in Rust, ie. an
         | alternative to try-finally.
        
         | monocasa wrote:
         | I have used function local enums for the states of a function
         | local state machine pretty successfully.
        
           | jerf wrote:
           | Once you pass trivial that would definitely pass into the
           | realm of thing I'd want to expose to my tests. "Starting from
           | this state and given this set of inputs do I get to this
           | state?" is a pretty basic test, and maybe there's a bunch of
           | people way smarter than me who can naturally run complex
           | state machines in their head, but I find myself frequently
           | surprised by at least _one_ thing they do and I do not
           | generally just splat them down into the code correctly on the
           | first try.
        
             | monocasa wrote:
             | In the cases where I do this, it's when I try really hard
             | not to actually expose the state machine to tests anyway
             | because there's a better way to hit them in tests with
             | their expected inputs and outputs rather than an
             | implementation detail of the state machine itself.
        
         | tazu wrote:
         | > First, as I mentioned, testing is very likely at some point
         | over the evolution of the module to want to either provide
         | input or examine output, intermediate or otherwise, that exists
         | in those types.
         | 
         | I have not found this to be true at all. I frequently have very
         | long (300+ line) pure functions that _do one thing_ , and the
         | tests are designed to be oblivious to whatever intermediate
         | representations are used. In fact, I think it's an anti-pattern
         | to pull types out just for testing: tests should not be so
         | granular that they affect how you design functions.
         | 
         | For example, a function that takes a JSON string containing
         | multiple objects and returns an SQL string for a batched INSERT
         | operation. I can easily achieve 100% coverage with a table-
         | driven test just checking inputs and outputs.
         | 
         | I frequently use function-local types and function-local
         | functions that are reused within the function. Testing has
         | never been a problem.
         | 
         | > Second, as the code grows, you want to be able to refactor
         | things freely, and types embedded in functions form a barrier
         | to refactoring because to refactor you'll have to do something
         | to expose that type now to multiple functions.
         | 
         | This hasn't been my experience either. When refactoring, I'm
         | usually doing _more_ encapsulation, not less. On a first pass,
         | I write everything to have access to everything else. Only
         | after I have a clearer idea of boundaries do I refactor.
        
         | epage wrote:
         | > To the extent there is an exception, testing-only code may
         | be. Testing-only code has very different constraints than
         | production code anyhow. Even then, though, I still find that
         | refactoring problem arises, and test code needs to be
         | refactorable too.
         | 
         | For me, I avoid defining anything within a function _except_
         | when that thing being defined is what is being tested in a
         | test, e.g. https://github.com/clap-
         | rs/clap/blob/87647d268c8c27e3298b2c0...
        
       | dathinab wrote:
       | > So there is just no way to refer to the User struct outside of
       | the function scope, right?...
       | 
       | no matter what tricks you come up with, treat it as that (in case
       | of it being associated to a type treat it as a anonymous type
       | accidentally expose)
       | 
       | also please _never_ place a module in a function, for various
       | subtle reasons it's technically possible but you really really
       | should not do it
       | 
       | I mean in general limit what items (types, impl blocks) you place
       | in function to very limited cases. If you have a type complex
       | enough so that you need a builder defined in a function you are
       | definitely doing something wrong I think.
       | 
       | > Does this mean generating child modules for privacy in macros
       | is generally a bad idea? It depends...
       | 
       | IMHO if we look at derive like macro usage, yes it's always a bad
       | idea.
       | 
       | Derive like thinks should mainly generate impl blocks, if it
       | really really is necessary types and only if there really is no
       | other way modules.
       | 
       | Furthermore they should if possible not introduce any of this in
       | the scope. E.g. it's a not uncommon pattern to place all
       | generated code in `const _:() = {/ _here_ /};` which is basically
       | a trick/hack to create a new scope similar to a function scope
       | into which you can place items (functions, imports, types, impl
       | blocks) without polluting the parent scope (and yes that doesn't
       | work for modules they are always scoped by other modules).
       | 
       | So does that mean the builder derive does it all wrong?
       | 
       | I don't think so sometimes you need to do bad decisions because
       | there are no good solutions.
        
       | noelwelsh wrote:
       | This seems like an oversight in the design of Rust. I would think
       | that each function call should create a distinct function-local
       | type, so the trick they use to extract the type from the function
       | shouldn't work. I think what's needed is path-dependent types [1]
       | as found in Scala.
       | 
       | [1]: http://lampwww.epfl.ch/~amin/dot/fpdt.pdf
        
         | GrantMoyer wrote:
         | Rust generally uses lexical scoping, and each function/closure
         | has a unique (possibly anonymous) type per definition, not a
         | type per call. I would therefore expect local types to be per
         | definition too, so the behavior seems fine to me.
        
           | noelwelsh wrote:
           | I disagree. If I define two modules
           | 
           | mod one { struct Cat { name: String } }
           | 
           | and
           | 
           | mod two { struct Cat { name: String } }
           | 
           | I have two distinct types called Cat which are not
           | equivalent.
           | 
           | one::Cat != two::Cat // doesn't compile but illustrates the
           | point, I hope
           | 
           | Similarly, when I _call_ a function I create a new
           | environment (think, stack frame) for each call which contains
           | values that are distinct from all other calls. I would expect
           | the same to hold for types defined within a function.
        
             | masklinn wrote:
             | > Similarly, when I call a function I create a new
             | environment (think, stack frame) for each call which
             | contains values that are distinct from all other calls. I
             | would expect the same to hold for types defined within a
             | function.
             | 
             | Rust types are not runtime objects.
             | 
             | Also just because a function call creates a new environment
             | doesn't mean everything is part of that environment.
             | `static` items are singletons, even if defined within a
             | function (which is a common case when the function should
             | be the only thing directly interacting with the static).
        
             | GrantMoyer wrote:
             | I don't think the analogy to modules is quite right. I
             | think that maps better to:                 fn foo() {struct
             | Cat;}       fn bar() {struct Cat;}
             | 
             | and foo::Cat != bar::Cat. Whereas the a single function
             | with a local type maps better to:                 mod foo
             | {struct Cat;}       mod bar {pub use ::foo::Cat;}       mod
             | baz {pub use ::foo::Cat;}
             | 
             | and bar::Cat does equal baz::Cat.
             | 
             | But maybe I only think that construct maps better because
             | I'm predisposed to the interpetation I described. I do see
             | what your saying, and agree that Rust could work that way;
             | I'm just not convinced it's a bug that it doesn't.
             | 
             | The behavior you describe would be more surprising to me
             | than the existing behavior, but clearly that's not a
             | universal sentiment, and I'm not sure which behavior would
             | be less surprising to most people.
        
               | noelwelsh wrote:
               | Seems reasonable to me. I was thinking more of closures,
               | which capture their environment, but those are of course
               | a distinct type in Rust.
        
             | ulbu wrote:
             | a stack frame is a runtime object, while a type exists only
             | to the compiler. the suggestion to create it in a call just
             | makes no sense. a type is a definition, not an instance.
        
           | tialaramex wrote:
           | Why "possibly anonymous" ? I don't think we can ever name any
           | of these types. Rust's Existential types exist so that we can
           | say we return such a thing, without being able to name it.
        
             | GrantMoyer wrote:
             | You're right, and I was being imprecise. All closure and
             | "function item" types are unnameable; only function pointer
             | types can be named, for example `fn(int) -> int`[1].
             | 
             | [1]: https://doc.rust-lang.org/reference/types/function-
             | item.html
        
       ___________________________________________________________________
       (page generated 2024-08-21 23:01 UTC)