[HN Gopher] The weird of function-local types in Rust
___________________________________________________________________
The weird of function-local types in Rust
Author : lukastyrychtr
Score : 101 points
Date : 2024-08-17 08:54 UTC (4 days ago)
(HTM) web link (elastio.github.io)
(TXT) w3m dump (elastio.github.io)
| jerf wrote:
| I have found, across several languages I've used, that types
| embedded into functions are generally a bad idea, and I think the
| general principle is that types generally end up needing to be
| exposed to any code that will also test that code. So, for
| instance, it's fine to confine types to some particular module,
| as long as those types are internal-only, but confining them
| within functions generally becomes a bad idea.
|
| I know the complaints many of you are gearing up to type, but my
| statement is a bit more complicated than you may have realized on
| first read; the key is the word "becoming", that I'm looking at
| the lifetime of the code and not a snapshot. The problem with
| embedding types into those smaller scopes is that while it may
| work at first... of course it does, it compiles, right?... they
| become an impediment to a number of operations over time. First,
| as I mentioned, testing is very likely _at some point over the
| evolution of the module_ to want to either provide input or
| examine output, intermediate or otherwise, that exists in those
| types. Second, as the code grows, you want to be able to refactor
| things freely, and types embedded in functions form a barrier to
| refactoring because to refactor you 'll have to do something to
| expose that type now to multiple functions. You do not want
| barriers to refactoring. Barriers to refactoring are a bigger
| expense over the long term than any small local gain from putting
| a type _here_ instead of _there_ , especially when anyone should
| have "Jump to Definition" readily available in this post-LSP era.
|
| Considered over time, over the evolution of the code base, I've
| just never had any super-local types like this "survive". Every
| time I think I've found an exception, I've either had my test
| code or the desire to refactor force me to lift it to the module
| level. So I just start there now.
|
| To the extent there is an exception, testing-only code may be.
| Testing-only code has very different constraints than production
| code anyhow. Even then, though, I still find that refactoring
| problem arises, and test code needs to be refactorable too.
|
| On the plus side, while I label them "a bad idea", they are not a
| "bad idea" that destroys your code base or anything. On the grand
| scale of "bad ideas" in code, this is down in the "inconvenience"
| part of the scale. It is almost self-evidently not some sort of
| disaster and I am not claiming it is. You can always lift it out
| and move on. But it is one of the many little hygiene habits that
| add up that helps keep code fluid and refactoring always
| available to me at a minimum activation-energy cost, because that
| is really important.
|
| (This applies specifically to types that you explicitly define.
| You can in Haskell, for instance, bash a new type together
| anywhere simply by creating a tuple (x, y). But this doesn't
| trigger what I'm talking about because any other bit of the code
| can bash the exact same type together simply by creating another
| tuple of the same type, and they'll unify just fine without
| having to share a type definition in common. No impediment of any
| kind is created by a new tuple type in that language.)
| chowells wrote:
| Sometimes I think I want local types in Haskell. I'm creating
| some new type and some instances for it solely for this
| function, why does it need global scope?
|
| Then I get around to remembering module scope is the actual
| important thing in Haskell, make sure that type isn't exported
| from the primary API modules, and get on with life.
| saghm wrote:
| Yeah, I think it's pretty rare to actually need a "local"
| type as opposed to just making it private to a module in Rust
| as well. The use case the article gives is one of the few
| where it I could see it being useful; if you're using a macro
| in the body of a function and want it to generate a type,
| there's not anywhere else you can define it (without an
| additional macro invocation outside the function, which often
| defeats the purpose of trying to wrap up all the boilerplate
| into one concise place).
| Joker_vD wrote:
| One of the somewhat useful (but still mind-boggling) uses of
| local types I've encountered was in Go, writing a custom
| unmarshaller: func (s *MyLovelyStruct)
| UnmarshalJSON(b []byte) error { err :=
| json.Unmarshal(b, s) if err != nil {
| return err } return
| validateAndMassageMyLovelyStruct(s) }
|
| I want to re-use the default "dump JSON key/values into the
| struct's fields" logic, then add something on top of it. But as
| written, this method will blow up with stack overflow because
| json.Unmarshal(b, s) will call s.UnmarshalJSON(b), if it can.
| So what you can do is this: func (s
| *MyLovelyStruct) UnmarshalJSON(b []byte) error {
| type IncognitoStruct MyLovelyStruct tmp :=
| IncognitoStruct{} err := json.Unmarshal(b,
| &tmp) if err != nil { return err
| } *s = MyLovelyStruct(tmp) return
| validateAndMassageMyLovelyStruct(s) }
|
| The IncognitoStruct, even if it has exact same fields as
| MyLovelyStruct (and is castable to it), does not have any of
| its methods, so json.Unmarshal(b, &tmp) does not recursively
| call this UnmarshalJSON() method.
|
| But even that uses type _aliases_ , not the proper, data-
| holding, types themselves. I never found any motivation to use
| those; the package-local types are quite enough.
| nkozyra wrote:
| Wouldn't generics in the validateAndMassageMyLovelyStruct()
| function avoid this kind of workaround?
| jerf wrote:
| No, because the method infinitely recurses before then.
| jerf wrote:
| Yeah, that's one in my bag-of-JSON-tricks. (Hopefully my bag-
| of-JSON-tricks gets at least a little less populated with
| json v2.) This is also useful if you want to selectively
| override a particular Unmarshal for any other reason, which
| is what I needed it for. But then I needed to customize the
| Unmarshal and back to a top-level type it went. :)
|
| But this sort of thing is why I tried to emphasize at the end
| that I'm not trying to "ring the alarm bell" or anything.
| When it works, it works, and it's not like I would call what
| you have there Bad Code or anything. I personally would have
| pulled it to a top level type immediately and that's just a
| preference, not something I'd go to bat over in a code review
| or anything.
| oefrha wrote:
| Sometimes you just want to JSON-encode a []struct{<ad hoc
| stuff>} or something like that, so it's entirely reasonable
| to use a func-local named type rather than repeating the
| anonymous struct.
|
| And to gp: if your func local type ends up observable and
| even testable, of course it shouldn't be func local.
| Otherwise you're describing testing implementation rather
| than behavior, indicating you're writing bad tests.
| jerf wrote:
| "Otherwise you're describing testing implementation rather
| than behavior, indicating you're writing bad tests."
|
| Yeah, people have been threatening me for decades with the
| claim that if I write tests to test internals I'll have to
| refactor like crazy someday. I'm still waiting for someday
| to come. Meanwhile, it has caught a lot of bugs.
|
| I'm open to the possibility that there's something
| different about the way I write code that causes me to not
| have this problem. Stay tuned to my blog over the next
| couple of months if that intrigues you. In the meantime, as
| reality fails to correspond to theory, I go with reality.
| masklinn wrote:
| I don't know how common it is in the wild, but deserializing
| to a function-local type is routinely used by Serde's
| documentation for examples e.g. https://serde.rs/deserialize-
| struct.html
| lesuorac wrote:
| In general sure.
|
| However, in a specific case, I find local types useful to do
| that for code that makes JSON requests to another service. You
| could care that the request has a certain intermediate
| structure but serialization is fairly deterministic so I don't
| see an advantage of that over just testing you send a HTTP
| request with a valid (String) body.
|
| If you control that endpoint then it would make sense so share
| that type between them but if you don't then might as well make
| the type scoped to just that method.
| edflsafoiewq wrote:
| I usually see them used for drop guards in Rust, ie. an
| alternative to try-finally.
| monocasa wrote:
| I have used function local enums for the states of a function
| local state machine pretty successfully.
| jerf wrote:
| Once you pass trivial that would definitely pass into the
| realm of thing I'd want to expose to my tests. "Starting from
| this state and given this set of inputs do I get to this
| state?" is a pretty basic test, and maybe there's a bunch of
| people way smarter than me who can naturally run complex
| state machines in their head, but I find myself frequently
| surprised by at least _one_ thing they do and I do not
| generally just splat them down into the code correctly on the
| first try.
| monocasa wrote:
| In the cases where I do this, it's when I try really hard
| not to actually expose the state machine to tests anyway
| because there's a better way to hit them in tests with
| their expected inputs and outputs rather than an
| implementation detail of the state machine itself.
| tazu wrote:
| > First, as I mentioned, testing is very likely at some point
| over the evolution of the module to want to either provide
| input or examine output, intermediate or otherwise, that exists
| in those types.
|
| I have not found this to be true at all. I frequently have very
| long (300+ line) pure functions that _do one thing_ , and the
| tests are designed to be oblivious to whatever intermediate
| representations are used. In fact, I think it's an anti-pattern
| to pull types out just for testing: tests should not be so
| granular that they affect how you design functions.
|
| For example, a function that takes a JSON string containing
| multiple objects and returns an SQL string for a batched INSERT
| operation. I can easily achieve 100% coverage with a table-
| driven test just checking inputs and outputs.
|
| I frequently use function-local types and function-local
| functions that are reused within the function. Testing has
| never been a problem.
|
| > Second, as the code grows, you want to be able to refactor
| things freely, and types embedded in functions form a barrier
| to refactoring because to refactor you'll have to do something
| to expose that type now to multiple functions.
|
| This hasn't been my experience either. When refactoring, I'm
| usually doing _more_ encapsulation, not less. On a first pass,
| I write everything to have access to everything else. Only
| after I have a clearer idea of boundaries do I refactor.
| epage wrote:
| > To the extent there is an exception, testing-only code may
| be. Testing-only code has very different constraints than
| production code anyhow. Even then, though, I still find that
| refactoring problem arises, and test code needs to be
| refactorable too.
|
| For me, I avoid defining anything within a function _except_
| when that thing being defined is what is being tested in a
| test, e.g. https://github.com/clap-
| rs/clap/blob/87647d268c8c27e3298b2c0...
| dathinab wrote:
| > So there is just no way to refer to the User struct outside of
| the function scope, right?...
|
| no matter what tricks you come up with, treat it as that (in case
| of it being associated to a type treat it as a anonymous type
| accidentally expose)
|
| also please _never_ place a module in a function, for various
| subtle reasons it's technically possible but you really really
| should not do it
|
| I mean in general limit what items (types, impl blocks) you place
| in function to very limited cases. If you have a type complex
| enough so that you need a builder defined in a function you are
| definitely doing something wrong I think.
|
| > Does this mean generating child modules for privacy in macros
| is generally a bad idea? It depends...
|
| IMHO if we look at derive like macro usage, yes it's always a bad
| idea.
|
| Derive like thinks should mainly generate impl blocks, if it
| really really is necessary types and only if there really is no
| other way modules.
|
| Furthermore they should if possible not introduce any of this in
| the scope. E.g. it's a not uncommon pattern to place all
| generated code in `const _:() = {/ _here_ /};` which is basically
| a trick/hack to create a new scope similar to a function scope
| into which you can place items (functions, imports, types, impl
| blocks) without polluting the parent scope (and yes that doesn't
| work for modules they are always scoped by other modules).
|
| So does that mean the builder derive does it all wrong?
|
| I don't think so sometimes you need to do bad decisions because
| there are no good solutions.
| noelwelsh wrote:
| This seems like an oversight in the design of Rust. I would think
| that each function call should create a distinct function-local
| type, so the trick they use to extract the type from the function
| shouldn't work. I think what's needed is path-dependent types [1]
| as found in Scala.
|
| [1]: http://lampwww.epfl.ch/~amin/dot/fpdt.pdf
| GrantMoyer wrote:
| Rust generally uses lexical scoping, and each function/closure
| has a unique (possibly anonymous) type per definition, not a
| type per call. I would therefore expect local types to be per
| definition too, so the behavior seems fine to me.
| noelwelsh wrote:
| I disagree. If I define two modules
|
| mod one { struct Cat { name: String } }
|
| and
|
| mod two { struct Cat { name: String } }
|
| I have two distinct types called Cat which are not
| equivalent.
|
| one::Cat != two::Cat // doesn't compile but illustrates the
| point, I hope
|
| Similarly, when I _call_ a function I create a new
| environment (think, stack frame) for each call which contains
| values that are distinct from all other calls. I would expect
| the same to hold for types defined within a function.
| masklinn wrote:
| > Similarly, when I call a function I create a new
| environment (think, stack frame) for each call which
| contains values that are distinct from all other calls. I
| would expect the same to hold for types defined within a
| function.
|
| Rust types are not runtime objects.
|
| Also just because a function call creates a new environment
| doesn't mean everything is part of that environment.
| `static` items are singletons, even if defined within a
| function (which is a common case when the function should
| be the only thing directly interacting with the static).
| GrantMoyer wrote:
| I don't think the analogy to modules is quite right. I
| think that maps better to: fn foo() {struct
| Cat;} fn bar() {struct Cat;}
|
| and foo::Cat != bar::Cat. Whereas the a single function
| with a local type maps better to: mod foo
| {struct Cat;} mod bar {pub use ::foo::Cat;} mod
| baz {pub use ::foo::Cat;}
|
| and bar::Cat does equal baz::Cat.
|
| But maybe I only think that construct maps better because
| I'm predisposed to the interpetation I described. I do see
| what your saying, and agree that Rust could work that way;
| I'm just not convinced it's a bug that it doesn't.
|
| The behavior you describe would be more surprising to me
| than the existing behavior, but clearly that's not a
| universal sentiment, and I'm not sure which behavior would
| be less surprising to most people.
| noelwelsh wrote:
| Seems reasonable to me. I was thinking more of closures,
| which capture their environment, but those are of course
| a distinct type in Rust.
| ulbu wrote:
| a stack frame is a runtime object, while a type exists only
| to the compiler. the suggestion to create it in a call just
| makes no sense. a type is a definition, not an instance.
| tialaramex wrote:
| Why "possibly anonymous" ? I don't think we can ever name any
| of these types. Rust's Existential types exist so that we can
| say we return such a thing, without being able to name it.
| GrantMoyer wrote:
| You're right, and I was being imprecise. All closure and
| "function item" types are unnameable; only function pointer
| types can be named, for example `fn(int) -> int`[1].
|
| [1]: https://doc.rust-lang.org/reference/types/function-
| item.html
___________________________________________________________________
(page generated 2024-08-21 23:01 UTC)