[HN Gopher] Python extensions should be lazy
___________________________________________________________________
Python extensions should be lazy
Author : 0x63_Problems
Score : 45 points
Date : 2024-08-07 16:14 UTC (6 hours ago)
(HTM) web link (www.gauge.sh)
(TXT) w3m dump (www.gauge.sh)
| lalaland1125 wrote:
| Optimizing Python extensions is becoming increasingly important
| as Python is used in more and more compute intensive
| environments.
|
| The key for optimizing a Python extension is to minimize the
| number of times you have to interact with Python.
|
| A couple of other tips in addition to what this article provides:
|
| 1. Object pooling is quite useful as it can significantly cut
| down on the number of allocations.
|
| 2. Be very careful about tools like pybind11 that make it easier
| to write extensions for Python. They come with a significant
| amount of overhead. For critical hotspots, always use the raw
| Python C extension API.
|
| 3. Use numpy arrays whenever possible when returning large lists
| to Python. A python list of python integers is amazingly
| inefficient compared to a numpy array of integers.
| 0x63_Problems wrote:
| Totally agree, keeping the interface with the extension as thin
| as possible makes sense.
|
| I hadn't considered object pooling in this context, it might be
| more involved since each node has distinct data but for my use
| case it might still be a performance win.
|
| Have you ever used pyo3 for rust bindings? I haven't measured
| the overhead but I have been assuming that it's worth the
| tradeoff vs. rolling my own.
|
| (I'm the author)
| hansvm wrote:
| My last workplace used pyo3 for a project. It was slower than
| vanilla Python, and you picked up all the normal compiled-
| language problems like slow builds and cross-compilation
| toolchains.
|
| I wouldn't take away from that observation that pyo3 is slow
| (it was just a poor fit; FFI for miniscule amounts of work),
| but the fact that the binding costs were higher than vanilla
| Python computations suggests that the overhead is (was?)
| meaningful. I don't know how it compares to a hand-written
| extension.
| stabbles wrote:
| Why not go all the way and limit the times you have to interact
| with Python to zero ;)
| mkoubaa wrote:
| Because if your users want python, you have to convince them
| they don't want it. If you fail, you'll have made optimized
| code that nobody uses.
|
| Another strategy is to actually serve your users
| coldtea wrote:
| Because then you have the warts of the new language, and the
| pain of migrating code, which could be 100s or millions of
| lines, to worry about...
| tomjakubowski wrote:
| re: 3, Python has a native numeric array type
| https://docs.python.org/3/library/array.html
| raymondh wrote:
| We should probably get rid of that. It is old (predating
| numpy) and has limited functionality. In almost every case I
| can think of, you would be better off with numpy.
| jph00 wrote:
| If you don't want to add a dep on numpy (which is a big
| complex module) then it's nice to have a stdlib option. So
| there are certainly at least some cases where you're not
| better off with numpy.
| coldtea wrote:
| Even better if Python adds a mainline pandas/numpy like
| C-based table structure, with a very small subset of the
| pandas/numpy functionality, that's also convertable to
| pandas/numpy/etc.
| alkh wrote:
| Re: 2, is there any good repo with raw C Python API that can be
| used as a reference for someone who is not too proficient in C?
| I took a look at numpy but it seems too complicated for me
| raymondh wrote:
| This is an impressive post showing some nice investigative work
| that isolates a pain point and produces a performant work-around.
|
| However, the conclusion is debatable. Not everyone has this
| problem. Not everyone would benefit from the same solution.
|
| Sure, if your data can be loaded, manipulated, and summarized
| outside of Python land, then lazy object creation is a good way
| to go. But then you're giving up all of the Python tooling that
| likely drove you to Python in the first place.
|
| Most of the Python ecosystem from sets and dicts to the standard
| library is focused on manipulating native Python objects. While
| the syntax supports method calls to data encapsulated elsewhere,
| it can be costly to constantly "box and unbox" data to move back
| and forth between the two worlds.
| 0x63_Problems wrote:
| First off, thank you for all your contributions to Python!
|
| I completely take your point that there are many places where
| this approach won't fit. It was a surprise for me to trace the
| performance issue to allocations and GC, specifically because
| it is rare.
|
| WRT boxing and unboxing, I'd imagine it depends on access
| patterns primarily - given I was extracting a small portion of
| data from the AST only once each, it was a good fit. But I can
| imagine that the boxing and unboxing could be a net loss for
| more read-heavy use cases.
| coldtea wrote:
| > _However, the conclusion is debatable. Not everyone has this
| problem. Not everyone would benefit from the same solution._
|
| Everyone would benefit from developers being more performance
| minded and not doing uneccesarry work though! Especially Python
| who has long suffered with performance issues.
|
| Love your work btw!
| formerly_proven wrote:
| > In the case of ASTs, one could imagine a kind of 'query
| language' API for Python that operates on data that is owned by
| the extension - analogous to SQL over the highly specialized
| binary representations that a database would use. This would let
| the extension own the memory, and would lazily create Python
| objects when necessary.
|
| You could make the API transparently lazy, i.e. ast.parse creates
| only one AstNode object or whatever and when you ask that object
| for e.g. its children those are created lazily from the
| underlying C struct. To preserve identity (which I assume is
| something users of ast are more likely to rely on than usual)
| you'd have to add some extra book-keeping to make it not generate
| new objects for each access, but memoize them.
| 0x63_Problems wrote:
| This seems like it could be implemented without much trouble
| for consumers, but I actually think for the common case of full
| AST traversal you'd still want to avoid building objects for
| the nodes while traversing.
|
| That is to say, ast.NodeVisitor living in Python is part of the
| problem for use cases like mine. I need the extension to own
| the traversal as well so that I can avoid building objects
| except for the result set (which is typically a very small
| subset). That was what led me to imagine a query-like interface
| instead, so that Python can give concise traversal
| instructions.
___________________________________________________________________
(page generated 2024-08-07 23:00 UTC)