[HN Gopher] Making Python fast - Adventures with mypyc
___________________________________________________________________
Making Python fast - Adventures with mypyc
Author : meadsteve
Score : 177 points
Date : 2022-09-27 12:42 UTC (10 hours ago)
(HTM) web link (blog.meadsteve.dev)
(TXT) w3m dump (blog.meadsteve.dev)
| meadsteve wrote:
| I recently experimented with using mypyc to make some of my
| python a little faster. I was pleasantly surprised with how well
| it worked for very little code change so I thought I'd share my
| experiences.
|
| The blog post wanders around a little because I had to add
| setuptools and wheel building as my project had previously
| skipped this.
| 4140tm wrote:
| I just found out about Lagom from this blog post and it's
| exactly what I have been looking for.
|
| All other Python options I've seen feel too involved or leak
| too much into your code. Lagom seems to balance everything just
| right.
|
| Thank you!
| cinntaile wrote:
| Haha, I can imagine Steve is quite pleased with this comment.
| You should look up the meaning of the (Swedish) word lagom.
| meadsteve wrote:
| Thanks 4140tm and thanks cinntaile. I was very pleased.
| That was very much the intention of the name
| raphaelrk wrote:
| I recently benchmarked "numpy vs js" matrix multiplication
| performance, and was surprised to find js significantly
| outperforming numpy. For multiplying two 512x512 matrices:
| python numpy: ~3.30ms numpy
| with numba: ~2.90ms node tfjs:
| ~1.00ms gpu.js: ~4.00ms ndarray:
| ~118.00ms vanilla loop: ~138.00ms
| mathjs: ~1876.00ms browser tfjs
| webgpu: ~.16ms tfjs webgl: ~.76ms
| tfjs wasm: ~2.51ms gpu.js:
| ~6.00ms tfjs cpu: ~244.65ms mathjs:
| ~3469.00ms c accelerate: ~.06ms
|
| Source here: https://github.com/raphaelrk/matrix-mul-test
| bobsmooth wrote:
| What's with this fascination with making python fast? It's not
| supposed to be fast, it's supposed to be simple. If you want
| speed use a compiled language. Trying to make python fast is like
| trying to strap a turbocharger to a tricycle.
| alanwreath wrote:
| I agree with you -- but I also don't say no to free food.
|
| I mean regardless of whether mypy was going to make my code run
| faster I would have used it for the shear confidence it gives
| wrt to my code correctness. The fact that I can use that same
| code (untouched) to speed it up... that's just means I get to
| have my cake and eat it too :P
| meadsteve wrote:
| Yeah this is exactly it for me. I already had type
| annotations and ran mypy to help with correctness. And I
| tried this out because it felt like a nice thing to get for
| free.
| intrepidhero wrote:
| I like the concept of using mypyc to leverage type hints to
| compile python. But I was pretty frustrated recently when I got
| bit by a bug in mypyc[1] while trying to use black. Especially
| since I wasn't using mypyc myself and so didn't realize it was
| even in my dependency tree. Beware adding "alpha" quality
| software as a dependency to your supposedly production ready
| tool.
|
| [1] https://github.com/psf/black/issues/2846
| BiteCode_dev wrote:
| Mind you, it still requires to have a c compiler, to be installed
| separately. It's very easy on linux, but a x-code install on mac,
| and can be fiddling on windows.
|
| Still nice, but not like golang or rust where you have a stand
| alone solution.
|
| It's an alternative to nuitka, which I recommend to try out.
| atoav wrote:
| Anybody using Python and Rust should also check out maturin and
| pyo3. I run some (non public) Python modules created in Rust and
| both the performance and the testability is stellar.
| meadsteve wrote:
| Yeah these are great approaches too. I'd actually considered a
| rewrite of the core in rust before I went with mypyc. But it
| was nice not to have to do a rewrite.
| atoav wrote:
| Totally understandable. More options are better anyways.
| jblindsay wrote:
| I have the exact same experience. Both Maturin and PyO3 have
| been a game changer for the work that I have been doing lately.
| It works so seamlessly.
| kodablah wrote:
| We built the logic backing the Temporal Python SDK[0] in Rust
| and leverage PyO3 (and PyO3 Asyncio). Unfortunately Maturin
| didn't let us do some of the advanced things we needed to do
| for wheel creation (at the time, unsure now), so we use
| setuptools-rust with Poetry.
|
| 0 - https://github.com/temporalio/sdk-python
| atoav wrote:
| I had no issues with the standard maturin way of building
| wheels - but my requirements were not special at all. I also
| did this maybe 5 months ago, so maybe it has indeed gotten
| better, I cannot tell.
| wcdolphin wrote:
| Is anyone else using MyPyC in production and can share their
| experience? Did you attempt the compile it all approach, or
| incrementally add? What do compile times look like at scale?
|
| Would love to buy you a coffee and hear about your experience and
| the challenges and benefits.
| bsenftner wrote:
| Worth mentioning Taichi, a high-performance parallel programming
| language embedded in Python. I've experimented with it a bit, and
| high-performance is very true. One can pretty much just write
| ordinary Python, plus enhancing existing Python is not that
| difficult either.
|
| From their docs:
|
| You can write computationally intensive tasks in Python while
| obeying a few extra rules imposed by Taichi to take advantage of
| the latter's high performance. Use decorators @ti.func and
| @ti.kernel as signals for Taichi to take over the implementation
| of the tasks, and Taichi's just-in-time (JIT) compiler would
| compile the decorated functions to machine code. All subsequent
| calls to them are executed on multi-CPU cores or GPUs. In a
| typical compute-intensive scenario (such as a numerical
| simulation), Taichi can lead to a 50x~100x speed up over native
| Python code.
|
| Taichi's built-in ahead-of-time (AOT) system also allows you to
| export your code as binary/shader files, which can then be
| invoked in C/C++ and run without the Python environment.
|
| https://www.taichi-lang.org/
| kingkongjaffa wrote:
| Can this work with pyinstaller to make an executable faster?
| Cyphase wrote:
| I can't see why not. I've packaged some complex dependencies
| with PyInstaller - on Windows. There is always a way. This
| wouldn't even be particularly difficult.
| ok_dad wrote:
| I'll be that guy who says I love Python but it's been shoved into
| too many spaces now. It's been a great tool for me for writing
| things that require a lot of I/O and aren't CPU bound.
|
| I am even rethinking that now because I was able to write a
| program in Go with an HTTP API and using JSON as the usual API
| interchange format in one night (all stdlib too), and it was so
| easy that I plan to pitch using it for several services we need
| to rewrite at work that are currently in Python. That would be
| very similar to what I wrote in a day.
|
| If Python doesn't fix their packaging, performance, and the
| massive expansion in the language, I think it's going to start
| losing ground to other languages.
| rkrzr wrote:
| I didn't know that you can compile individual modules with mypyc.
| That's very interesting since it allows a gradual adoption of the
| compiler, which really helps with big codebases.
|
| Do you know if there are any requirements for which modules can
| be compiled? E.g. can they be imported in other modules or do
| they have to be a leaf in the import tree/graph ?
| traverseda wrote:
| Having read through the docs Mypyc has a concept of "native
| classes" and python classes, and it looks like you can use a
| "native" (compiles) class from regular python and vice-versa.
|
| So my reading is that it should be pretty seamless.
| an1sotropy wrote:
| I'm curious how to compare this with a PyPy FAQ:
| https://doc.pypy.org/en/latest/faq.html#would-type-annotatio...
| which describes a bit about why type hints aren't as helpful to
| optimize code under PyPy as one (including myself) might think.
|
| Can someone explain more about how mypyc is in a better position
| to produce better optimizations than pypy, or am I confused about
| this?
| detaro wrote:
| pypy argues that considering type annotations gives them less
| useful data than their existing tracing does, and thus pypy
| wouldn't be faster if it considered them. Something like mypyc
| by design has no chance of doing tracing, and thus has to work
| with annotations. (I also don't see where you get the claim
| from that that mypyc has better optimizations than pypy? But
| the two also follow different designs, so they might be good at
| different things)
| an1sotropy wrote:
| sorry I didn't mean to claim that mypyc does have better
| optimizations, I meant to be asking if that was possible. My
| superficial read was: this post about mypyc goes from type
| hints to compiling to "faster", and then I remembered the
| pypy FAQ which says type hints didn't help with that.
|
| But if mypyc has no runtime information to go on (which pypy
| does have), then certainly having some type information is
| better than having none.
| peterkelly wrote:
| mypyc is cool and all, but I can't help thinking about how Node
| just JITs everything automatically without the need for any
| special steps like this.
| BiteCode_dev wrote:
| That's what Microsoft is paying Guido for, for the next
| versions of python.
| chrisseaton wrote:
| I think that's not really the plan - they're talking about
| just basic template compilation, nothing like V8
| https://github.com/markshannon/faster-
| cpython/blob/master/pl....
| chrisseaton wrote:
| That's not Node - that's V8. And it's possible to do the same
| thing for Python - there's nothing magic about JavaScript
| compared to Python - it's just a lot of engineering work to do
| it, which is beyond what this project's scope is. PyPy does it,
| but not inside standard Python.
| peterkelly wrote:
| I'm well aware of V8 and pypy. I also really like Python as a
| language, especially with mypy.
|
| It just makes me sad that in a world with multiple high-
| performance JIT engines (including pypy, for Python itself),
| the standard Python version that most people use is an
| interpreter. I know it's largely due to compatibility reasons
| (C extensions being deeply intertwined with CPython's API).
|
| There _is_ a really important (if not "magic") difference
| between JavaScript and Python. JS has always (well, since IE
| added support) been a language with multiple widely-used
| implementations in the wild, which has prevented the
| emergence of a third-party package ecosystem which is heavily
| tied to one particular implementation. Python on the other
| hand is for a large proportion of the userbase considered
| CPython, with alternate implementations being second class
| citizens, despite some truly impressive efforts on the
| latter.
|
| The fact that packages written in JS are not tied to (or at
| least work best with) a single implementation is also what
| made it possible for developers of JS engines to experiment
| with different implementation approaches, including JIT.
| While I'm not intimately familiar with writing native
| extension modules for Node (having dabbled only a little), my
| understanding is the API surface is much narrower than
| Python, allowing for changes in the engine that avoid
| breaking APIs. But there is less need for native modules in
| JS, because of the presence of JIT in all major engines.
| mkoubaa wrote:
| This is in the process of being addressed - look into the
| HPy project
| zzzeek wrote:
| > It just makes me sad that in a world with multiple high-
| performance JIT engines (including pypy, for Python
| itself), the standard Python version that most people use
| is an interpreter. I know it's largely due to compatibility
| reasons (C extensions being deeply intertwined with
| CPython's API).
|
| this is misleading, if one sees the phrase "interpreter" as
| that code is represented as syntax-derived trees or other
| datastructures which are then traversed at runtime to
| produce results - someone correct me if I'm wrong but this
| would apply to well known interpreted languages like Perl
| 5. cPython is a _bytecode_ interpreter, not conceptually
| unlike the Java VM before JITs were added. It just happens
| to compile scripts to bytecode on the fly.
| chrisseaton wrote:
| Bytecode is just another data structure that you traverse
| at runtime to produce results. It's a postfix
| transformation of the AST. It's still an interpreter.
| zzzeek wrote:
| so you'd call the pre-JIT JVM an "interpreter" and you'd
| call Java an interpreted language?
| chrisseaton wrote:
| > so you'd call the pre-JIT JVM an "interpreter"
|
| Yeah? I think almost everyone would?
|
| > and you'd call Java an interpreted language?
|
| Java is interpreted in many ways, and compiled in many
| ways, as I said it's complicated. It's compiled to
| bytecode, which is interpreted until it's time to be
| compiled... at which point it's abstract interpreted to a
| graph, which is compiled to machine code, until it needs
| to deoptimise at which point the metadata from the graph
| is interpreted again, allowing it to jump back into the
| original interpreter.
|
| But if it didn't have the JIT it'd always be an
| interpreter running.
| an1sotropy wrote:
| Well, ok, but then isn't a CPU is also just an
| interpreter, traversing the object code text of compiled
| code?
| chrisseaton wrote:
| We don't normally call hardware or firmware
| implementations an 'interpreter'.
|
| Almost all execution techniques include some combination
| of compilation and interpretation. Even some ASTs include
| aspects of transformation to construct them from the
| source code, which we could call a compiler. Native
| compilers sometimes have to interpret metadata to do
| things like roll forward for deoptimisation.
|
| But most people in the field would describe CPython
| firmly as an 'interpreter'.
| zzzeek wrote:
| I call it "bytecode interpreted" to distinguish it from
| traditional parse-tree interpretation such as Perl 5 and
| others
| [deleted]
| detaro wrote:
| That's not misleading, that's standard terminology. an
| interpreter using bytecode is still an interpreter.
| mixmastamyk wrote:
| Python is a bit more dynamic than JS, which makes it uniquely
| hard to optimize. There is more improvement to be done
| however and is being done.
| chrisseaton wrote:
| Right, but I think we know how to optimise all these
| things. It's all solved problems.
| mixmastamyk wrote:
| A few things are impossible without changing/subsetting
| the language. What I was trying to get at.
| chrisseaton wrote:
| What things are you thinking of?
|
| (Not trying to interrogate you or prove you wrong, but
| I've got an interest in optimising very difficult meta-
| programming patterns.)
| mixmastamyk wrote:
| Nearly everything (or is it everything?) in memory can be
| modified at runtime. There are no real constants for
| example. The whole stack top to bottom can be
| monkeypatched on a whim.
|
| This means nothing is guaranteed and so every instruction
| must do multiple checks to make sure data structures are
| what is expected at the current moment.
|
| This is true of JS as well, but to a lesser extent.
| chrisseaton wrote:
| > so every instruction must do multiple checks
|
| Aren't all the things you mentioned already fixed by
| deoptimisation?
|
| You assume constants cannot be modified, and then get the
| code that wants to modify constants to do the work of
| stopping everyone who is assuming a constant value, and
| modify them that they need to pick up the new value?
|
| > To deoptimize means to jump from more optimised code to
| less optimized code. In practice that usually means to
| jump from just-in-time compiled machine code back into an
| interpreter. If we can do this at any point, and if we
| can perfectly restore the entire state of the
| interpreter, then we can start to throw away those checks
| in our optimized code, and instead we can deoptimize when
| the check would fail.
|
| https://chrisseaton.com/truffleruby/deoptimizing/
|
| I work on a compiler for Ruby, and mutable constants and
| the ability to monkey patch etc adds literally zero extra
| checks to optimised code.
| mixmastamyk wrote:
| No such thing as a constant in Python. You can optionally
| name a variable in uppercase to signal to others that it
| should be, but that's about it.
|
| You can write a new compiler if you'd like, as detailed
| on this page. But CPython doesn't work that way and 99%
| of the ecosystem is targeted there.
|
| There is some work on making more assumptions as it runs,
| now that the project has funding. This is about where my
| off-top-of-head knowledge ends however so someone else
| will want to chime in here. The HN search probably has a
| few blog posts and discussions as well.
| cozzyd wrote:
| I think it's more that cpython is so slow so a lot of
| things people use are implemented using the C API, and
| many optimizations will break a bunch of things. If
| everything was pure python the situation would be
| different.
| pmarreck wrote:
| Have they cleaned up Python's packaging/dependency problem yet?
| chrisseaton wrote:
| > for free ... this was a problem as a number of my tests rely on
| this [incompatible behaviour]
| meadsteve wrote:
| Maybe I should have said for cheap
| mrtranscendence wrote:
| Nice to see this. Do they have a project roadmap for mypyc?
|
| Doubling performance is nice, though it does still leave a lot of
| performance on the table. I'd be curious to see a comparison
| between this and Cython.
___________________________________________________________________
(page generated 2022-09-27 23:01 UTC)