[HN Gopher] Pydantic V2 leverages Rust's Superpowers [video]
       ___________________________________________________________________
        
       Pydantic V2 leverages Rust's Superpowers [video]
        
       Author : BerislavLopac
       Score  : 91 points
       Date   : 2023-04-23 15:35 UTC (7 hours ago)
        
 (HTM) web link (fosdem.org)
 (TXT) w3m dump (fosdem.org)
        
       | keithasaurus wrote:
       | As someone who built a pure python validation library[0] that's
       | much faster than pydantic (~1.5x - 12x depending on the
       | benchmark), I have to say that this whole focus on Rust seems
       | premature. There's clearly a lot of room for pydantic to optimize
       | its Python implementation.
       | 
       | Beyond that, rust seems like a great fit for tooling (i.e. ruff),
       | but as a library used at runtime, it seems a little odd to make a
       | validation library (which can expect to receive any kind of legal
       | python data) to also be constrained by a separate set of data
       | types which are legal in rust.
       | 
       | [0]: https://github.com/keithasaurus/koda-validate
        
         | [deleted]
        
         | Attummm wrote:
         | Personally, I think it's great to have many projects solving
         | the same problem and pushing each other further. Although the
         | differences between the faster validations are small, the older
         | ones were quite slow. This will save unnecessary CPU cycles,
         | making it eco-friendly. And now the bar will be even higher
         | with a Rust version, which is really great.
         | 
         | [0]Maat is 2.5 times faster than Pydantic on their own
         | benchmark, as stated in their readme.
         | 
         | [0]https://github.com/Attumm/Maat
        
         | jammycrisp wrote:
         | While I agree that there are ways to write a faster validation
         | library in python, there are also benefits to moving the logic
         | to native code.
         | 
         | msgspec[1] is another parsing/validation library, written in C.
         | It's on average 50-80x faster than pydantic for parsing and
         | validating JSON [2]. This speedup is only possible because we
         | make use of native code, letting us parse JSON directly and
         | efficiently into the proper python types, removing any
         | unnecessary allocations.
         | 
         | It's my understanding that pydantic V2 currently doesn't do
         | this (they still have some unnecessary intermediate allocations
         | during parsing), but having the validation logic already in
         | compiled code makes integrating this with the parser
         | theoretically possible later on. With the logic in python this
         | efficiency gain wouldn't be possible.
         | 
         | [1]: https://github.com/jcrist/msgspec
         | 
         | [2]: https://jcristharif.com/msgspec/benchmarks.html#benchmark-
         | sc...
        
           | keithasaurus wrote:
           | Definitely true. I've just soured on the POV that native code
           | is the first thing one should reach for. I was surprised that
           | it only took a few days of optimizations to convert my
           | validation library to being significantly faster than
           | pydantic, when pydantic as already largely compiled via
           | cython.
           | 
           | If you're interested in both efficiency and maintainability,
           | I think you need to start by optimizing the language of
           | origin. It seems to me that with pydantic, the choice has
           | consistently been to jump to compilation (cython, now rust)
           | without much attempt at optimizing within Python.
           | 
           | I'm not super-familiar with how things are being done on an
           | issue-to-issue / line-to-line basis, but I see this rust
           | effort taking something like a year+, when my intuition is
           | some simpler speedups in python could have been in a matter
           | of days or weeks (which is not to say they would be of the
           | same magnitude of performance gains).
        
         | scolvin wrote:
         | I agree that pydantic could have been faster while still being
         | written in Python.
         | 
         | The argument for Rust: 1. If I'm going to rewrite - why not go
         | the whole hog and do it in rust - and thereby get a 20x
         | improvement, not 2x. 2. By using Rust we can have add more
         | customisation with virtually no performance impact, with Python
         | that's not the case.
         | 
         | Of course we could make Pydantic faster by removing features,
         | but that would be very disappointing for existing user.
         | 
         | As mentioned by other commenters, your comment about
         | "constrained" does not apply.
        
         | iudqnolq wrote:
         | > to also be constrained by a separate set of data types which
         | are legal in rust.
         | 
         | This isn't really how writing rust/python iterop works. You
         | tend to have opaque handles you call python methods on. Here's
         | a decent example I found skimming the code.
         | 
         | https://github.com/pydantic/pydantic-core/blob/main/src/inpu...
        
         | masklinn wrote:
         | > it seems a little odd to make a validation library (which can
         | expect to receive any kind of legal python data) to also be
         | constrained by a separate set of data types which are legal in
         | rust.
         | 
         | That... makes no sense? Rust can interact with Python objects,
         | there is no "constrained".
        
           | keithasaurus wrote:
           | In the sense of using escape hatches back to python, that's
           | true. Main point is that from a complexity standpoint, why do
           | python -> rust -> python, when there's still a lot of room to
           | run in just python?
        
             | iudqnolq wrote:
             | Because it's not python -> rust -> python, it's python ->
             | rust -> python c api.
        
         | LtWorf wrote:
         | I also wrote a pure python validation library [0] that is much
         | faster than pydantic. It also handles unions correctly (unlike
         | pydantic).
         | 
         | Pydantic2 is indeed much faster than any pure python
         | implementation I've seen, but it also introduces some bugs. And
         | on pypy it is as slow as it ever was, because it falls back to
         | python code.
         | 
         | I wrote mine because nothing else existed at the time, but
         | whenever I've had to use pydantic I've found it to be quircky
         | and to have strange opinions about types, that are not shared
         | by type validators. Using it with mypy (despite the extension)
         | is not so easy nor useful.
         | 
         | [0]: https://ltworf.github.io/typedload/performance.html
        
       | satvikpendem wrote:
       | Rust is the future of tooling [0]. While [0] is about JS tooling
       | specifically, we're seeing the same effects in other languages as
       | well. Turns out, you probably don't want to write infrastructure
       | tooling in slow, dynamically typed languages when faster, more
       | safe languages exist. Python knows this already, with much of the
       | scientific computing libraries being just wrappers over the core
       | C++ codebases. JS is beginning to catch up as well, with swc
       | (speedy web compiler), stc (speedy type checker), Turbopack
       | (Webpack successor) and so on, with Vercel leading the charge
       | mainly.
       | 
       | [0] https://leerob.io/blog/rust#the-future-of-javascript-tooling
        
         | nine_k wrote:
         | I'd say that you don't want to write _the second crop_ of the
         | tooling in a language like JS or Python.
         | 
         | This is because the first crop, like mypy or babel or jslint
         | existed, and has shown the general direction. But for the first
         | crop, a slow-running but fast-turnaround language was
         | essential, to my mind. The first iteration had to move fast,
         | and _change the direction_ fast, because it wasn 't yet clear
         | what direction was going to be right.
        
       | bratao wrote:
       | Someone recommended here msgspec as a Pydantic alternative for
       | serialization/validation and wow. It is fantastic. I really
       | recommend it https://github.com/jcrist/msgspec
        
         | jammycrisp wrote:
         | Thanks, glad you like it!
        
       | jonatron wrote:
       | Subtitled Video:
       | https://jonatron.github.io/fosdem2023whisper/files/rust_how_...
       | 
       | Transcript:
       | https://jonatron.github.io/fosdem2023whisper/files/rust_how_...
       | 
       | (transcribed with whisper)
        
       | AdilZtn wrote:
       | PyO3 is the key actor, in this rust-binding trend.
        
       | the__alchemist wrote:
       | Code style in Pydantic user code reminds me of Rust's Serde
       | serialization lib. Not directly relevant to the point of this
       | video, but it makes me curious if it was inspired by Serde.
        
       | nightski wrote:
       | I'm rather amazed at the sheer number of human hours invested
       | (wasted?) into making a terrible language(s) better just so
       | things are slightly easier for beginners.
        
         | aobdev wrote:
         | I'm someone who spent a lot of time focusing on finding the
         | most efficient tools, writing the most elegant code, using the
         | most "pure" frameworks, etc. and I realized that this did very
         | little to help me obtain my goals.
         | 
         | Unless your objectives are not commercial or will never scale
         | beyond the ability of a single person, the overhead of human
         | communication and collaboration will be enormous compared the
         | inefficiencies of shuffling electrons and flipping bits with
         | imperfect instructions. Sometimes this means using technically-
         | inferior tools better than idealistic tools improperly to get
         | the job done.
        
         | mjr00 wrote:
         | Assuming you're talking about Python, it's a product of its
         | time. At the time it was released, it was very much following
         | the newest "best practices" from "experts". You had _lots_ of
         | well-known people in the industry going to conferences and
         | talking about how static typing was Actually Bad and unit
         | testing was the one true way. (I won 't bother explaining the
         | retrospective idiocy of that.)
         | 
         | It's not a great language in a vacuum... _but_ it is the most
         | popular backend language in the world, by most estimations;
         | there 's a library for literally everything, and the entire
         | data science ecosystem lives in the Python world. If you're on
         | the management side, hiring a Python developer is orders of
         | magnitudes easier than finding a Rust developer.
         | 
         | So you've really only got a few options... 1) throw everything
         | out to rewrite it in Rust (et al); 2) accept that things suck
         | and do nothing about it; 3) accept that things suck, but make
         | some tooling to make it suck less. The rewrite strategy is a
         | nonstarter for a lot of massive legacy codebases, so 3 is the
         | only option that makes sense.
         | 
         | So it's really _not_ that amazing how much time people invest
         | in improving Python, when you think about it. Similar situation
         | with PHP and Javascript.
        
         | nine_k wrote:
         | Python is not a terrible language; neither is JS, or Ruby, or
         | Lisp, despite being highly dynamic.
         | 
         | Python is indispensable for interactive experimentation (see
         | Jupyter), and is a good glue language (see everything from
         | Pytorch to Blender). It's also indispensable for _rapid_
         | prototyping.
         | 
         | Python is not a good high-performance language. If performance
         | is something that limits you, you should write performance-
         | critical parts in a language optimized for that. Rust is a fine
         | choice, but many other options exist, from Java to Haskell, and
         | Python-integrated solutions like Cython also help. Note that
         | usually 80-90% of your code is not performance-critical.
         | 
         | Same applies to systems where you want to formally prove
         | certain correctness properties; both Python and C would be
         | terrible choices.
         | 
         | By the same token, Rust is not a terrible language, despite its
         | complicated syntax, the constant struggle with lifetimes and
         | the borrow checker, and long compilation times. It shines when
         | you need performance and correctness. If you need easy
         | experimentation in a REPL, use a different language.
         | 
         | Use the right tool for the job.
        
         | megaman821 wrote:
         | There is no shortage of programming languages that can choose
         | from. Python seems to make the right tradeoffs between
         | accessibility and power. The market has spoken.
        
           | jerrygenser wrote:
           | It's not just about the market speaking. There's path
           | dependence between it being easy to use for scripting and
           | scientific things as well as bumpy which made it natural to
           | supplant things like Matlab in teaching. And then basic
           | knowledge translates to industry jobs.
           | 
           | It's not popular purely based on its merits as a language
        
         | _zamorano_ wrote:
         | You're being downvoted, but I sort of agree...
         | 
         | How in the world can I stop worrying about errors like using a
         | non existant class member and having a runtime exception? How
         | can I refactor?
         | 
         | Ok, the core devs add annotations, then type hints to the
         | language.
         | 
         | Some other people create MyPy, the JetBrains people use some
         | other linting thing that does well, but does not match MyPy.
         | The Pydantic appear and some other people says its awesome, but
         | yet again, different to the others...
         | 
         | And all that to enforce things we already had 20 years ago in
         | Java 1.0
         | 
         | How about using dynamic typing for small script and prototyping
         | and using better tools for bigger projects?
        
       | hotfixguru wrote:
       | Samuel also presented a slightly altered talk at PyCon US
       | yesterday[0], which was awesome!
       | 
       | GitHub link to pydantic[1], and pydantic-core[2].
       | 
       | [0]: https://twitter.com/samuel_colvin/status/1649928041462915072
       | (slides in comments)
       | 
       | [1]: https://github.com/pydantic/pydantic
       | 
       | [2]: https://github.com/pydantic/pydantic-core
        
       | fbdab103 wrote:
       | Have not watched the video, but I did find the slides from a
       | different version[0].
       | 
       | [0] https://slides.com/samuelcolvin/how-pydantic-v2-leverages-
       | ru...
       | 
       | Edit: Realized that the linked slides were from a different
       | iteration of the ~same content.
        
         | zikohh wrote:
         | This is the one from the video
         | https://slides.com/samuelcolvin/deck-0e6306
        
       ___________________________________________________________________
       (page generated 2023-04-23 23:01 UTC)