[HN Gopher] Pydantic V2 leverages Rust's Superpowers [video]
___________________________________________________________________
Pydantic V2 leverages Rust's Superpowers [video]
Author : BerislavLopac
Score : 91 points
Date : 2023-04-23 15:35 UTC (7 hours ago)
(HTM) web link (fosdem.org)
(TXT) w3m dump (fosdem.org)
| keithasaurus wrote:
| As someone who built a pure python validation library[0] that's
| much faster than pydantic (~1.5x - 12x depending on the
| benchmark), I have to say that this whole focus on Rust seems
| premature. There's clearly a lot of room for pydantic to optimize
| its Python implementation.
|
| Beyond that, rust seems like a great fit for tooling (i.e. ruff),
| but as a library used at runtime, it seems a little odd to make a
| validation library (which can expect to receive any kind of legal
| python data) to also be constrained by a separate set of data
| types which are legal in rust.
|
| [0]: https://github.com/keithasaurus/koda-validate
| [deleted]
| Attummm wrote:
| Personally, I think it's great to have many projects solving
| the same problem and pushing each other further. Although the
| differences between the faster validations are small, the older
| ones were quite slow. This will save unnecessary CPU cycles,
| making it eco-friendly. And now the bar will be even higher
| with a Rust version, which is really great.
|
| [0]Maat is 2.5 times faster than Pydantic on their own
| benchmark, as stated in their readme.
|
| [0]https://github.com/Attumm/Maat
| jammycrisp wrote:
| While I agree that there are ways to write a faster validation
| library in python, there are also benefits to moving the logic
| to native code.
|
| msgspec[1] is another parsing/validation library, written in C.
| It's on average 50-80x faster than pydantic for parsing and
| validating JSON [2]. This speedup is only possible because we
| make use of native code, letting us parse JSON directly and
| efficiently into the proper python types, removing any
| unnecessary allocations.
|
| It's my understanding that pydantic V2 currently doesn't do
| this (they still have some unnecessary intermediate allocations
| during parsing), but having the validation logic already in
| compiled code makes integrating this with the parser
| theoretically possible later on. With the logic in python this
| efficiency gain wouldn't be possible.
|
| [1]: https://github.com/jcrist/msgspec
|
| [2]: https://jcristharif.com/msgspec/benchmarks.html#benchmark-
| sc...
| keithasaurus wrote:
| Definitely true. I've just soured on the POV that native code
| is the first thing one should reach for. I was surprised that
| it only took a few days of optimizations to convert my
| validation library to being significantly faster than
| pydantic, when pydantic as already largely compiled via
| cython.
|
| If you're interested in both efficiency and maintainability,
| I think you need to start by optimizing the language of
| origin. It seems to me that with pydantic, the choice has
| consistently been to jump to compilation (cython, now rust)
| without much attempt at optimizing within Python.
|
| I'm not super-familiar with how things are being done on an
| issue-to-issue / line-to-line basis, but I see this rust
| effort taking something like a year+, when my intuition is
| some simpler speedups in python could have been in a matter
| of days or weeks (which is not to say they would be of the
| same magnitude of performance gains).
| scolvin wrote:
| I agree that pydantic could have been faster while still being
| written in Python.
|
| The argument for Rust: 1. If I'm going to rewrite - why not go
| the whole hog and do it in rust - and thereby get a 20x
| improvement, not 2x. 2. By using Rust we can have add more
| customisation with virtually no performance impact, with Python
| that's not the case.
|
| Of course we could make Pydantic faster by removing features,
| but that would be very disappointing for existing user.
|
| As mentioned by other commenters, your comment about
| "constrained" does not apply.
| iudqnolq wrote:
| > to also be constrained by a separate set of data types which
| are legal in rust.
|
| This isn't really how writing rust/python iterop works. You
| tend to have opaque handles you call python methods on. Here's
| a decent example I found skimming the code.
|
| https://github.com/pydantic/pydantic-core/blob/main/src/inpu...
| masklinn wrote:
| > it seems a little odd to make a validation library (which can
| expect to receive any kind of legal python data) to also be
| constrained by a separate set of data types which are legal in
| rust.
|
| That... makes no sense? Rust can interact with Python objects,
| there is no "constrained".
| keithasaurus wrote:
| In the sense of using escape hatches back to python, that's
| true. Main point is that from a complexity standpoint, why do
| python -> rust -> python, when there's still a lot of room to
| run in just python?
| iudqnolq wrote:
| Because it's not python -> rust -> python, it's python ->
| rust -> python c api.
| LtWorf wrote:
| I also wrote a pure python validation library [0] that is much
| faster than pydantic. It also handles unions correctly (unlike
| pydantic).
|
| Pydantic2 is indeed much faster than any pure python
| implementation I've seen, but it also introduces some bugs. And
| on pypy it is as slow as it ever was, because it falls back to
| python code.
|
| I wrote mine because nothing else existed at the time, but
| whenever I've had to use pydantic I've found it to be quircky
| and to have strange opinions about types, that are not shared
| by type validators. Using it with mypy (despite the extension)
| is not so easy nor useful.
|
| [0]: https://ltworf.github.io/typedload/performance.html
| satvikpendem wrote:
| Rust is the future of tooling [0]. While [0] is about JS tooling
| specifically, we're seeing the same effects in other languages as
| well. Turns out, you probably don't want to write infrastructure
| tooling in slow, dynamically typed languages when faster, more
| safe languages exist. Python knows this already, with much of the
| scientific computing libraries being just wrappers over the core
| C++ codebases. JS is beginning to catch up as well, with swc
| (speedy web compiler), stc (speedy type checker), Turbopack
| (Webpack successor) and so on, with Vercel leading the charge
| mainly.
|
| [0] https://leerob.io/blog/rust#the-future-of-javascript-tooling
| nine_k wrote:
| I'd say that you don't want to write _the second crop_ of the
| tooling in a language like JS or Python.
|
| This is because the first crop, like mypy or babel or jslint
| existed, and has shown the general direction. But for the first
| crop, a slow-running but fast-turnaround language was
| essential, to my mind. The first iteration had to move fast,
| and _change the direction_ fast, because it wasn 't yet clear
| what direction was going to be right.
| bratao wrote:
| Someone recommended here msgspec as a Pydantic alternative for
| serialization/validation and wow. It is fantastic. I really
| recommend it https://github.com/jcrist/msgspec
| jammycrisp wrote:
| Thanks, glad you like it!
| jonatron wrote:
| Subtitled Video:
| https://jonatron.github.io/fosdem2023whisper/files/rust_how_...
|
| Transcript:
| https://jonatron.github.io/fosdem2023whisper/files/rust_how_...
|
| (transcribed with whisper)
| AdilZtn wrote:
| PyO3 is the key actor, in this rust-binding trend.
| the__alchemist wrote:
| Code style in Pydantic user code reminds me of Rust's Serde
| serialization lib. Not directly relevant to the point of this
| video, but it makes me curious if it was inspired by Serde.
| nightski wrote:
| I'm rather amazed at the sheer number of human hours invested
| (wasted?) into making a terrible language(s) better just so
| things are slightly easier for beginners.
| aobdev wrote:
| I'm someone who spent a lot of time focusing on finding the
| most efficient tools, writing the most elegant code, using the
| most "pure" frameworks, etc. and I realized that this did very
| little to help me obtain my goals.
|
| Unless your objectives are not commercial or will never scale
| beyond the ability of a single person, the overhead of human
| communication and collaboration will be enormous compared the
| inefficiencies of shuffling electrons and flipping bits with
| imperfect instructions. Sometimes this means using technically-
| inferior tools better than idealistic tools improperly to get
| the job done.
| mjr00 wrote:
| Assuming you're talking about Python, it's a product of its
| time. At the time it was released, it was very much following
| the newest "best practices" from "experts". You had _lots_ of
| well-known people in the industry going to conferences and
| talking about how static typing was Actually Bad and unit
| testing was the one true way. (I won 't bother explaining the
| retrospective idiocy of that.)
|
| It's not a great language in a vacuum... _but_ it is the most
| popular backend language in the world, by most estimations;
| there 's a library for literally everything, and the entire
| data science ecosystem lives in the Python world. If you're on
| the management side, hiring a Python developer is orders of
| magnitudes easier than finding a Rust developer.
|
| So you've really only got a few options... 1) throw everything
| out to rewrite it in Rust (et al); 2) accept that things suck
| and do nothing about it; 3) accept that things suck, but make
| some tooling to make it suck less. The rewrite strategy is a
| nonstarter for a lot of massive legacy codebases, so 3 is the
| only option that makes sense.
|
| So it's really _not_ that amazing how much time people invest
| in improving Python, when you think about it. Similar situation
| with PHP and Javascript.
| nine_k wrote:
| Python is not a terrible language; neither is JS, or Ruby, or
| Lisp, despite being highly dynamic.
|
| Python is indispensable for interactive experimentation (see
| Jupyter), and is a good glue language (see everything from
| Pytorch to Blender). It's also indispensable for _rapid_
| prototyping.
|
| Python is not a good high-performance language. If performance
| is something that limits you, you should write performance-
| critical parts in a language optimized for that. Rust is a fine
| choice, but many other options exist, from Java to Haskell, and
| Python-integrated solutions like Cython also help. Note that
| usually 80-90% of your code is not performance-critical.
|
| Same applies to systems where you want to formally prove
| certain correctness properties; both Python and C would be
| terrible choices.
|
| By the same token, Rust is not a terrible language, despite its
| complicated syntax, the constant struggle with lifetimes and
| the borrow checker, and long compilation times. It shines when
| you need performance and correctness. If you need easy
| experimentation in a REPL, use a different language.
|
| Use the right tool for the job.
| megaman821 wrote:
| There is no shortage of programming languages that can choose
| from. Python seems to make the right tradeoffs between
| accessibility and power. The market has spoken.
| jerrygenser wrote:
| It's not just about the market speaking. There's path
| dependence between it being easy to use for scripting and
| scientific things as well as bumpy which made it natural to
| supplant things like Matlab in teaching. And then basic
| knowledge translates to industry jobs.
|
| It's not popular purely based on its merits as a language
| _zamorano_ wrote:
| You're being downvoted, but I sort of agree...
|
| How in the world can I stop worrying about errors like using a
| non existant class member and having a runtime exception? How
| can I refactor?
|
| Ok, the core devs add annotations, then type hints to the
| language.
|
| Some other people create MyPy, the JetBrains people use some
| other linting thing that does well, but does not match MyPy.
| The Pydantic appear and some other people says its awesome, but
| yet again, different to the others...
|
| And all that to enforce things we already had 20 years ago in
| Java 1.0
|
| How about using dynamic typing for small script and prototyping
| and using better tools for bigger projects?
| hotfixguru wrote:
| Samuel also presented a slightly altered talk at PyCon US
| yesterday[0], which was awesome!
|
| GitHub link to pydantic[1], and pydantic-core[2].
|
| [0]: https://twitter.com/samuel_colvin/status/1649928041462915072
| (slides in comments)
|
| [1]: https://github.com/pydantic/pydantic
|
| [2]: https://github.com/pydantic/pydantic-core
| fbdab103 wrote:
| Have not watched the video, but I did find the slides from a
| different version[0].
|
| [0] https://slides.com/samuelcolvin/how-pydantic-v2-leverages-
| ru...
|
| Edit: Realized that the linked slides were from a different
| iteration of the ~same content.
| zikohh wrote:
| This is the one from the video
| https://slides.com/samuelcolvin/deck-0e6306
___________________________________________________________________
(page generated 2023-04-23 23:01 UTC)