[HN Gopher] Approximating sum types in Python with Pydantic
___________________________________________________________________
Approximating sum types in Python with Pydantic
Author : woodruffw
Score : 111 points
Date : 2024-08-12 13:27 UTC (2 days ago)
(HTM) web link (blog.yossarian.net)
(TXT) w3m dump (blog.yossarian.net)
| blorenz wrote:
| Discriminated unions are also a wonderful part of the zod
| library. I use them to overload endpoints for multiple relevant
| operations.
| roshankhan28 wrote:
| +1 for the discriminated unions
| myvoiceismypass wrote:
| I used to use io-ts heavily but zod is my go to now - and it's
| so ergonomic and easy for typescript newbies to pick up and
| grasp.
| wizerno wrote:
| A slightly related discussion on Type Unions in C# from a week
| ago: https://news.ycombinator.com/item?id=41183240
| reubenmorais wrote:
| One caveat of the tip in the "Deduplicating shared variant state"
| section about including an underspecified discriminator field in
| the base class, is that it doesn't play well if you're using
| Literals instead of Enums as the discriminator type. Python does
| not allow you to narrow a literal type of a field in a subclass,
| so the following doesn't type check: from typing
| import Literal class _FrobulatedBase:
| kind: Literal['foo', 'bar'] value: str
| class Foo(_FrobulatedBase): kind: Literal['foo'] =
| 'foo' foo_specific: int class
| Bar(_FrobulatedBase): kind: Literal['bar'] = 'bar'
| bar_specific: bool "kind" overrides symbol of
| same name in class "_FrobulatedBase" Variable is mutable
| so its type is invariant Override type "Literal['foo']"
| is not the same as base type "Literal['foo', 'bar']"
|
| https://pyright-play.net/?code=GYJw9gtgBALgngBwJYDsDmUkQWEMo...
| yorwba wrote:
| > it doesn't play well if you're using Literals instead of
| Enums as the discriminator type
|
| The original example code with Enums doesn't type-check either,
| and for the same reason:
|
| If the type checker allowed that, someone could take an object
| of type Foo, assign it to a variable of type _FrobulatedBase,
| then use that variable to modify the kind field to 'bar' and
| now you have an illegal Foo with kind 'bar'.
| woodruffw wrote:
| mypy typechecks that just fine[1].
|
| However, I think that's possibly a bug :-) -- I agree that
| narrowing a literal via subclassing is unsound. That's why the
| example in the blog used `str` for the superclass, not the
| closure of all `Literal` variants.
|
| (I use this pattern pretty extensively in Python codebases that
| are typechecked with mypy, and I haven't run into many issues
| with mypy failing to understand the variant shapes -- the
| exception to this so far has been with `RootModel`, where mypy
| has needed Pydantic's mypy plugin[2] to understand the
| relationship between the "root" type and its underlying union.
| But it's possible that this is essentially unsound as well.)
|
| [1]: https://mypy-
| play.net/?mypy=latest&python=3.12&gist=f35da62e...
|
| [2]: https://docs.pydantic.dev/latest/integrations/mypy/
| reubenmorais wrote:
| Using str in the superclass equally unsound and also doesn't
| type check. There's no good way to do it, as the
| discriminator type is by definition disjoint between all
| kinds.
| carderne wrote:
| I think it would be useful to differentiate more clearly between
| what is offered by Python's type system, and what is offered by
| Pydantic.
|
| That is, you can approximate Rusts's enum (sum type) with pure
| Python using whatever combination of Literal, Enum, Union and
| dataclasses. For example (more here[1]):
| @dataclass class Foo: ... @dataclass class Bar:
| ... Frobulated = Foo | Bar
|
| Pydantic adds de/ser, but if you're not doing that then you can
| get very far without it. (And even if you are, there are lighter-
| weight options that play with dataclasses like cattrs, pyserde,
| dataclasses-json).
|
| [1] https://threeofwands.com/algebraic-data-types-in-python/
| intalentive wrote:
| Pydantic offers runtime checks.
|
| Also I'd add msgspec to your list at the end. Lightweight and
| fast, handles validation during decoding.
| mejutoco wrote:
| Typeguard too. The @typechecked annotation on any function or
| method will blow up with an error at runtime if types do not
| match
| carderne wrote:
| Good point, but that's not always desirable. If you have
| strict type-checking and _aren't_ doing ser/de, it's likely
| not necessary (eg Rust doesn't do runtime checks).
| woodruffw wrote:
| Yep, someone brought this up on another discussion forum. The
| post was intended to be explicitly about accomplishing the
| ser/de half as well, hence the emphasis on Pydantic :-)
|
| (Python's annotated types are very powerful, and you can do
| this and more with them if you don't immediately need ser/de!
| But they also have limitations, e.g. I believe Union wasn't
| allowed in isinstance checks or matching until a recent
| version.)
| carderne wrote:
| Yeah and in practise most people will probably be using
| Pydantic anyway. Just wanted to point out it's not strictly
| necessary. :)
| nsagent wrote:
| If you're looking for serialization/deserialization, you
| might consider confactory [1]. I created to be a factory for
| objects defined in configs. It actually builds the Python
| objects without much effort on the user. It simply makes use
| of type annotations (though you can define your own
| serializers and deserializers).
|
| It also supports complex structures like union types, lists,
| etc. I used it to create cresset [2], a package that allows
| building Pytorch models directly from config files.
|
| [1]: https://pypi.org/project/confactory/ [2]:
| https://pypi.org/project/cresset/
| jghn wrote:
| Meta comment.
|
| Something I've wondered of late. I keep seeing these articles pop
| up and they're trying to recreate ADTs for Python in the manner
| of Rust. But there's a long history of ADTs in other languages.
| For instance we don't see threads on recreating Haskell's ADT
| structures in Python.
|
| Is this an artifact of Rust is hype right now, especially on HN?
| As in the typical reader is more familiar with Rust than Haskell,
| and thus "I want to do what I'm used to in Rust in Python" is
| more likely to resonate than "I want to do what I'm used to in
| Haskell in Python"?
|
| At the end of the day it doesn't *really* matter as the
| underlying construct being modeled is the same. It's the
| translation layer that I'm wondering about.
| steveklabnik wrote:
| I think "hype" has some connotations that I wouldn't
| necessarily agree with, and I don't think it's as much "on HN"
| as "people who write Python," but I would agree that I would
| expect at this point more Python folks to be familiar with Rust
| than Haskell, and so that to be the reason, yes.
| jghn wrote:
| The reason I said hype is that it's a cycle here. If you go
| back 10 years every example *would* have been in Haskell. Or
| perhaps Scala. They were the cool languages of the era. And
| the topics here painted a picture that their use in the
| broader world was more common than they really were. And I
| say that as someone who used both Haskell & Scala in my day
| job at the time. HN would have you believe that I was the
| norm, but I very much was not.
|
| That's not to say it's bad, or a problem. If it gets more
| people into these concepts that's great.
| woodruffw wrote:
| (Author of the post.)
|
| I think so, in the sense that Rust has successfully translated
| ADTs and other PLT-laden concepts from SML/Haskell into syntax
| that a large base of engineers finds intuitive. Whether or not
| that's hype is a value judgement, but that _is_ the reason I
| picked it for the example snippet: I figured more people would
| "get" it with less explanation required :-)
| jghn wrote:
| Got it. It makes sense and was what I figured is the case. I
| find it interesting as it's a sign of the times watching the
| evolution of what the "base language" is in threads like this
| over time. I mentioned in another comment that several years
| ago it'd have been Haskell or Scala. If one went back further
| (before my time!) it'd probably have been in OCaml or
| something.
| mattarm wrote:
| In my experience learning a bit of OCaml after Rust, and
| then looking at Haskell, the three aren't all that
| different in terms of the basics of how ADTs are declared
| and used, especially for the simpler cases.
| jghn wrote:
| Agreed. As a concept they're all the same thing.
|
| Another way of phrasing my query is that given these are
| all basically ML-style constructs, why would the examples
| not be ML? And I was assuming the answer to that is "the
| sorts of people reading these blogs in 2024 are more
| familiar with Rust"
| runeblaze wrote:
| I think a second reason might be that translating
| OCaml/Haskell concepts to Python has that academic
| connotation to it. Rust also (thanks to PyO3) has more
| affinity to Python than the ML languages. I guess it
| isn't a surprise that this post has Python, C++, and
| Rust, all "commonly" used for Python libraries.
| darby_nine wrote:
| Is there any reason why you've singled out Rust as particularly
| notable here and not any of the many other languages with them?
| OCaml, Elm, F#, Scala, I think more recent versions of Java,
| Kotlin, Nim, TypeScript, and Swift all support ADTs. Python
| _already_ supports them, albeit with very little runtime
| support. Rust doesn 't particularly stand out in such a broad
| field of languages. They're so useful a language needs a good
| reason these days to _not_ support them.
| jghn wrote:
| You're making the exact point that I was raising.
| darby_nine wrote:
| I'm sorry, I'm still completely confused where rust came
| from or what particular relevance it has to the
| conversation beyond the short segment in the article?
|
| My point being--you see articles about ADTs involving non-
| rust languages all the time. Why single rust out?
| pjmlp wrote:
| It is quite common to see people in Rust circles mentioning
| Rust being innovative for feature XYZ, that was initially in a
| ML variant, Ada, Eiffel, ....
|
| I would say familarity, and lack of exposure to programming
| languages in general.
| woodruffw wrote:
| Nowhere in this post or in any Rust community post I'm aware
| of does anybody claim that sum types (or product types, or
| affine/linear types, etc.) are a Rust novelty.
|
| As a stretch, I've seen Rust content where people claim that
| Rust has successfully _popularized_ a handful of relatively
| obscure PLT concepts. But this is a much, much weaker claim
| than Rust innovating or inventing them outright, and it 's
| one that's largely supported by the size of the Rust
| community versus the size of Haskell or even the largest ML
| variant communities.
|
| (I say this as someone who wrote OCaml for a handful of years
| before I touched Rust.)
| pjmlp wrote:
| Where did I specially mentioned it was this post, and not
| in general?
|
| Here is another common one, _" It would be great a Rust
| like but with GC"_.
| woodruffw wrote:
| > Here is another common one, "It would be great a Rust
| like but with GC".
|
| What in this phrase suggests or implies that Rust has
| innovated something that an earlier FP language actually
| did? Something that resembles Go's managed runtime but
| with Rust's sum types seems like a very reasonable thing
| to want, and doesn't exist _per se_ without buying either
| into a very foreign syntax and thus a much smaller
| community and library ecosystem.
|
| (Or as another phrasing: what is _actually_ wrong with
| someone saying this? Insufficient credit given to other
| languages? Do people apply this standard to C with BCPL
| and ALGOL? I haven 't seen them do so.)
| jghn wrote:
| > Something that resembles Go's managed runtime but with
| Rust's sum types ..... Or as another phrasing: what is
| actually wrong with someone saying this?
|
| I don't think there's anything wrong per se. Although I
| do think it contributes to the sentiment that people may
| be ascribing things as being novel to Rust, even when not
| intended as in this case. To be fair, that's what sent me
| down the mental path earlier that prompted this
| subthread. And that's when I figured it was more a matter
| of being the implementation most likely to resonate with
| the audience.
|
| And I don't think it's a matter of needing to give credit
| to other languages. But phrasing it like "Something with
| a managed runtime, but with sum types" is generic enough,
| unless there's something specific about either of those.
| For instance the phrasing I gave does exist in plenty of
| places, but perhaps "Something that resembles Go's
| managed runtime with sum types" perhaps does not. I don't
| know enough about Go to say that.
|
| In other words, is there something specific about
| *Rust*'s sum types that one is after in this example? Or
| just the concept of sum types.
| woodruffw wrote:
| > In other words, is there something specific about
| _Rust_ 's sum types that one is after in this example? Or
| just the concept of sum types.
|
| I think, concretely, it's the fact that Rust's syntax is
| more intuitive to the average engineer than ML or
| Haskell. Maybe that's a failure of SWE education! But
| generally speaking, it's easier to explain what Rust does
| to someone who has taken a year or two of Java, C, or C++
| than to explain ML to them.
| jghn wrote:
| I agree and think you're right, to a point. But I would
| posit that a much higher percentage of devs than the
| typical HNer would expect would find the Rust syntax to
| be pretty arcane. Although I grant that they'd find
| Haskell to be *more* arcane for sure.
|
| And that stopping point I think is where the perception
| of Rust's popularity on sites like HN is much higher than
| in the general public. And by that I mean people who at
| least grok, if not use, Rust and not people who like the
| idea of Rust.
|
| For instance, keep in mind that even during the heyday of
| Scala here on HN the rest of the JVM world was
| complaining that Scala syntax was too arcane.
| woodruffw wrote:
| No particular disagreement there!
| pjmlp wrote:
| It implies completely lack of knowledge that something
| like that already exists, predating Rust by a few
| decades.
|
| The contexts where it pops up, it is as if it would be
| yet to come, such language.
|
| Speaking of C and BCPL, indeed we do, because many
| wrongly believe in this urban myth, that without them
| there was nothing else as high level systems programming
| languages, even though JOVIAL came to be in 1958,
| followed by ALGOL and PL dialects, Bootstrap CPL was
| never planned to be used beyond that purpose, and there
| was a rich research outside Bell Labs in systems
| programming in high level languages.
|
| Instead we got stuck with something that 50 years later
| are still trying to fix, with Rust being part of the
| solution.
| woodruffw wrote:
| > It implies completely lack of knowledge that something
| like that already exists, predating Rust by a few
| decades.
|
| I don't understand why you think this: we explain things
| all the time without presuming that the particular choice
| of explanation implies ignorance of a preceding concept.
| In high school physics, for example, you wouldn't assume
| that your teacher doesn't know who Ptolemy is because
| they start with Newton.
|
| The value of an explanation is in its effectiveness, not
| a pedantic lineage of the underlying concept. The latter
| is interesting, at least to me, but I'm not going to bore
| my readers by walking them through 65 years of language
| evolution just to get back to the same basic concept that
| they're able to intuit immediately from a ~6 line code
| snippet.
|
| (It's also condescending to do so: there's no evidence
| whatsoever that Rust's creators, maintainers, community,
| etc. _aren 't_ familiar with the history of PL
| development.)
| myvoiceismypass wrote:
| FWIW I seem to often find myself reaching for Haskell-isms when
| writing Typscript or Scala. And I've never actually written
| production Haskell code! But so many concepts like this just
| map nicely. "Parse don't validate", "make illegal states
| unreprsentable", etc - all those patterns.
| __mharrison__ wrote:
| A pretty good article. Would be a great article if they used real
| world examples instead of made up "formulated" ones.
| jamincan wrote:
| I know that Foo and Frobulator and so on have history in code
| examples, but I personally find examples with them require more
| careful reading than examples built on real concepts.
| adamc wrote:
| The problem I see with it is this: Now, instead of understanding
| Python, which is straightforward, you have to understand a bunch
| about Pydantic and type unions. In a large shop of Python
| programmers, I would expect many would not follow most of this.
|
| Essentially, if this is a feature you must have, Python seems
| like the wrong language. Maybe if you only need it in spots this
| makes sense...
| Spivak wrote:
| Pydantic is truly a godsend to the Python ecosystem. It is a
| full implementation of "parse don't validate" and does so using
| Python's existing type declarations. It uses the same forms as
| dataclasses, SQLAlchemy, and Django that have been part of
| Python forever so most Python programmers are familiar with it.
| And the reason you reach for it is that it eliminates whole
| classes of errors when the boundary between your program and
| the outside world is only via .model_validate() and
| .model_dump(). The outside world including 3rd-party API calls.
| The data either comes back to you _exactly_ like you expect it
| to, or it errs. It 's hundreds of tests that you simply don't
| have to write.
|
| In the same way that SQLite bills itself as the better
| alternative to fopen(), Pydantic is the better alternative to
| json.loads()/json.dumps().
| adamc wrote:
| I don't think you are wrong and I have at times missed having
| such an option. But... I saw Java go down this path of cool
| features that you needed to learn, on top of the basic
| language, and eventually it took Java to an environment where
| learning the toolset and environment was complex, and vastly
| changed the calculus of how approachable the language was. In
| my mind, anyway, it went from being a useful if incomplete
| tool to being a more complete language that was not really
| worth messing with unless you were going to make a big
| commitment.
|
| Every step that takes Python in that direction is a mistake,
| because if we need to make a huge commitment, Python probably
| isn't the right language. A large part of the appeal of
| Python is that it is easy to learn, easy to bring devs up to
| speed on if they don't know it, easy to debug and understand.
| That's why people use it despite its performance
| shortcomings, despite its concurrency issues, etc. (That and
| the benefit of a large and fairly high quality library.)
| Spivak wrote:
| I think you're right but I take a different view of it and
| think it's great. Python is changing so that the language
| you switch to when you need more performance or type safety
| is... Python. At some level you have to meet users where
| they are and large complex applications are already written
| in Python. And it's those kinds of developers are more
| invested in the future direction of the language.
|
| I think Python's journey is very similar to Go in this
| regard where as the language matures and more people start
| using it for large applications you start having to
| compromise on the ease of on-boarding in favor of the users
| who are trying to get work done. Both Python and Go added
| generics around the same time.
| hnthrowaway6543 wrote:
| > instead of understanding Python, which is straightforward,
| you have to understand a bunch about Pydantic and type unions.
|
| This like saying "instead of understanding Python, you have to
| understand a bunch about SQLAlchemy and ORMs" or "instead of
| understanding Python, you need to understand GRPC and data
| streaming."
|
| Ultimately every library you add to a project is cognitive
| overhead. Major frameworks or tools like sqlalchemy,
| Flask/Django, Pandas, etc. have a _lot_ of cognitive overhead.
| The engineering decision is whether that cognitive overhead is
| worth what the library provides.
|
| The measurement of worth is really dependent on your use case.
| If your use for Python is data scientists doing iterative,
| interactive work in Jupyter notebooks, Pydantic is probably not
| worth it. If you're building a robust data pipeline or backend
| web app with high availability requirements but dealing with
| suspect data parsing, Pydantic might be worth it.
| adamc wrote:
| You're not wrong, but the distinction here that I was
| responding to was the idea of needing to use Pydantic
| routinely for typechecking. Libraries that you _have_ to know
| might as well be language features.
|
| The phrasing of "The engineering decision" in your reply is
| telling -- you are coming from it as an engineer. But I'm
| looking at the population of Python programmers, which
| extends far beyond software engineers. The more such people
| have to learn, the more problematic the language becomes.
| Python succeeded despite _not_ being a statically compiled
| language with clear typechecking because there is an audience
| for which those aren 't the critical factors.
|
| As I said in another response, it reminds me of what happened
| to Java. Maybe that's just my own quirk, but none of these
| changes are free.
| hnthrowaway6543 wrote:
| > Libraries that you have to know might as well be language
| features.
|
| What you _have_ to know depends on where you 're working
| and what you're doing. You don't _have_ to know GRPC Python
| libraries, unless it 's a company that uses GRPC for
| internal communication. You don't _have_ to know Flask
| unless you 're building a REST API using Flask. You don't
| _have_ to know beautifulsoup unless you 're building a web
| scraper. You don't _have_ to know Pydantic unless you 're
| working on a project that uses Pydantic for data
| validation.
|
| > The phrasing of "The engineering decision" in your reply
| is telling -- you are coming from it as an engineer. But
| I'm looking at the population of Python programmers, which
| extends far beyond software engineers.
|
| You don't have to _be_ a software engineer to make an
| engineering decision. When a data scientist uses conda
| because they don 't want to manage their Python environment
| manually, but runs into performance issues on production
| because their Docker containers are multiple gigabytes
| larger than they should be -- that's the result of an
| engineering decision. When a business analyst writes a
| Python script and manually installs packages without a
| requirements file, then tries to get it running on a new
| computer 8 months later but can't because they forget which
| package versions they used -- that's the result of an
| engineering decision. So when you deploy your code without
| any data validation that runs fine now, but breaks in
| unexpected ways next week because the result of an external
| REST API you're calling changed unexpectedly...
|
| > The more such people have to learn, the more problematic
| the language becomes. Python succeeded despite not being a
| statically compiled language with clear typechecking
| because there is an audience for which those aren't the
| critical factors. ... none of these changes are free.
|
| I agree with all this, which is why I said that the
| engineering decision is deciding whether or not the cost is
| worth it. Different projects, companies, and people will
| have different needs.
|
| Your original assertion was that "if this is a feature you
| must have, Python seems like the wrong language" -- but
| this contradicts what you're saying. The overhead of
| learning a single Python library is far, far less than,
| say, introducing Rust into a company that only uses Python
| for everything else.
| adamc wrote:
| Yes, at that point in time you wouldn't switch from
| Python. Hence the comments about Java, as an example of
| where escalating complexity can take you.
|
| I think my assertion, less pithily, was "if having the
| best type-checking system was critical to you, probably
| you wouldn't pick Python". And I think that's correct.
| People pick it for other features.
|
| I didn't say I hated having the option. I expressed
| reservations, which I still have.
| hnthrowaway6543 wrote:
| > if having the best type-checking system was critical to
| you, probably you wouldn't pick Python
|
| Agree, but the only situation where as a developer you
| can pick a language/ecosystem on its own merits,
| independently of anything else, is on personal projects.
| Even if you're a startup CTO building a greenfield app,
| you have account for hiring and train developers. It's
| perfectly sensible that you would want to use Python +
| mypy/pyright/pydantic/etc for extra robustness since it's
| easy to find Python devs, with a relatively small
| learning curve if they haven't use those tools, vs going
| full-on Rust or Haskell, which would require much more
| rare + expensive people and/or a much longer training
| period.
| mejutoco wrote:
| I think you are underestimating python developers. When
| python became popular popular languages did not have such
| expressive type systems. Java and perl were popular then.
|
| Also, here it is claimed the library should be part of the
| language, and at the same time it us assumed it is too
| complicated for the users to understand. It seems like the
| feature being a library solves this, if we let go of the
| self-imposed requirement of it being part of the language.
| adamc wrote:
| Usually syntax makes things easier, certainly for types.
| That's why we have syntax.
|
| I don't claim Python developers cannot understand it. But
| every additional thing adds to the cognitive burden.
| woodruffw wrote:
| I think an important piece of context here is that this is
| _not_ useful for non-ser /de patterns in Python: if all you
| have is pure Python types that don't need to cross
| serialization boundaries, then you can do all of this in pure
| Python (and refine it with Python's very mature type
| annotations).
|
| In practice, however, Pydantic is one of the most popular
| packages/frameworks in Python because people _do_ in fact need
| this kind of complexity. In particular, it makes wrangling
| complicated object hierarchies that come from REST APIs much
| easier /error prone.
| rocgf wrote:
| As it just so happens, I was struggling with this in Python
| recently and this post describes a better solution than what I
| came up with.
|
| > Essentially, if this is a feature you must have, Python seems
| like the wrong language.
|
| While I don't disgree in the absolute sense, there are
| constraints. You can't just switch language or change the
| problem you're solving. If you have the need for more type
| safety, then this is a price worth paying.
| jfyusldfwjasdf wrote:
| Completely disagree. Pydantic, FastAPI, Type Hints, mypy,
| pyright have all made python much more enjoyable to use and
| less error prone.
| mejutoco wrote:
| I think it is normal to know popular libraries of a language.
| For python, django, drf, fastapi, pydantic or jinja are very
| common.
|
| There are some people resisting type checks in python but I
| think fewer and fewer. I dont think people refusing to learn
| basic concepts and libraries are a reason to not use something.
|
| Also, I am not a big fan of not doing something useful because
| we need to do a bit of learning. It seems like a variant of "we
| have always done it this way". Plus it is a strawman attributed
| to python developers, IMO.
| ks2048 wrote:
| It's a shame there's so many different names for a set of very
| related (or identical?) concepts. For example wikipedia says
| "tagged union" is also known as "variant, variant record, choice
| type, discriminated union, disjoint union, sum type, or
| coproduct". [https://en.wikipedia.org/wiki/Tagged_union]
___________________________________________________________________
(page generated 2024-08-14 23:01 UTC)