[HN Gopher] Python extensions should be lazy
       ___________________________________________________________________
        
       Python extensions should be lazy
        
       Author : 0x63_Problems
       Score  : 45 points
       Date   : 2024-08-07 16:14 UTC (6 hours ago)
        
 (HTM) web link (www.gauge.sh)
 (TXT) w3m dump (www.gauge.sh)
        
       | lalaland1125 wrote:
       | Optimizing Python extensions is becoming increasingly important
       | as Python is used in more and more compute intensive
       | environments.
       | 
       | The key for optimizing a Python extension is to minimize the
       | number of times you have to interact with Python.
       | 
       | A couple of other tips in addition to what this article provides:
       | 
       | 1. Object pooling is quite useful as it can significantly cut
       | down on the number of allocations.
       | 
       | 2. Be very careful about tools like pybind11 that make it easier
       | to write extensions for Python. They come with a significant
       | amount of overhead. For critical hotspots, always use the raw
       | Python C extension API.
       | 
       | 3. Use numpy arrays whenever possible when returning large lists
       | to Python. A python list of python integers is amazingly
       | inefficient compared to a numpy array of integers.
        
         | 0x63_Problems wrote:
         | Totally agree, keeping the interface with the extension as thin
         | as possible makes sense.
         | 
         | I hadn't considered object pooling in this context, it might be
         | more involved since each node has distinct data but for my use
         | case it might still be a performance win.
         | 
         | Have you ever used pyo3 for rust bindings? I haven't measured
         | the overhead but I have been assuming that it's worth the
         | tradeoff vs. rolling my own.
         | 
         | (I'm the author)
        
           | hansvm wrote:
           | My last workplace used pyo3 for a project. It was slower than
           | vanilla Python, and you picked up all the normal compiled-
           | language problems like slow builds and cross-compilation
           | toolchains.
           | 
           | I wouldn't take away from that observation that pyo3 is slow
           | (it was just a poor fit; FFI for miniscule amounts of work),
           | but the fact that the binding costs were higher than vanilla
           | Python computations suggests that the overhead is (was?)
           | meaningful. I don't know how it compares to a hand-written
           | extension.
        
         | stabbles wrote:
         | Why not go all the way and limit the times you have to interact
         | with Python to zero ;)
        
           | mkoubaa wrote:
           | Because if your users want python, you have to convince them
           | they don't want it. If you fail, you'll have made optimized
           | code that nobody uses.
           | 
           | Another strategy is to actually serve your users
        
           | coldtea wrote:
           | Because then you have the warts of the new language, and the
           | pain of migrating code, which could be 100s or millions of
           | lines, to worry about...
        
         | tomjakubowski wrote:
         | re: 3, Python has a native numeric array type
         | https://docs.python.org/3/library/array.html
        
           | raymondh wrote:
           | We should probably get rid of that. It is old (predating
           | numpy) and has limited functionality. In almost every case I
           | can think of, you would be better off with numpy.
        
             | jph00 wrote:
             | If you don't want to add a dep on numpy (which is a big
             | complex module) then it's nice to have a stdlib option. So
             | there are certainly at least some cases where you're not
             | better off with numpy.
        
               | coldtea wrote:
               | Even better if Python adds a mainline pandas/numpy like
               | C-based table structure, with a very small subset of the
               | pandas/numpy functionality, that's also convertable to
               | pandas/numpy/etc.
        
         | alkh wrote:
         | Re: 2, is there any good repo with raw C Python API that can be
         | used as a reference for someone who is not too proficient in C?
         | I took a look at numpy but it seems too complicated for me
        
       | raymondh wrote:
       | This is an impressive post showing some nice investigative work
       | that isolates a pain point and produces a performant work-around.
       | 
       | However, the conclusion is debatable. Not everyone has this
       | problem. Not everyone would benefit from the same solution.
       | 
       | Sure, if your data can be loaded, manipulated, and summarized
       | outside of Python land, then lazy object creation is a good way
       | to go. But then you're giving up all of the Python tooling that
       | likely drove you to Python in the first place.
       | 
       | Most of the Python ecosystem from sets and dicts to the standard
       | library is focused on manipulating native Python objects. While
       | the syntax supports method calls to data encapsulated elsewhere,
       | it can be costly to constantly "box and unbox" data to move back
       | and forth between the two worlds.
        
         | 0x63_Problems wrote:
         | First off, thank you for all your contributions to Python!
         | 
         | I completely take your point that there are many places where
         | this approach won't fit. It was a surprise for me to trace the
         | performance issue to allocations and GC, specifically because
         | it is rare.
         | 
         | WRT boxing and unboxing, I'd imagine it depends on access
         | patterns primarily - given I was extracting a small portion of
         | data from the AST only once each, it was a good fit. But I can
         | imagine that the boxing and unboxing could be a net loss for
         | more read-heavy use cases.
        
         | coldtea wrote:
         | > _However, the conclusion is debatable. Not everyone has this
         | problem. Not everyone would benefit from the same solution._
         | 
         | Everyone would benefit from developers being more performance
         | minded and not doing uneccesarry work though! Especially Python
         | who has long suffered with performance issues.
         | 
         | Love your work btw!
        
       | formerly_proven wrote:
       | > In the case of ASTs, one could imagine a kind of 'query
       | language' API for Python that operates on data that is owned by
       | the extension - analogous to SQL over the highly specialized
       | binary representations that a database would use. This would let
       | the extension own the memory, and would lazily create Python
       | objects when necessary.
       | 
       | You could make the API transparently lazy, i.e. ast.parse creates
       | only one AstNode object or whatever and when you ask that object
       | for e.g. its children those are created lazily from the
       | underlying C struct. To preserve identity (which I assume is
       | something users of ast are more likely to rely on than usual)
       | you'd have to add some extra book-keeping to make it not generate
       | new objects for each access, but memoize them.
        
         | 0x63_Problems wrote:
         | This seems like it could be implemented without much trouble
         | for consumers, but I actually think for the common case of full
         | AST traversal you'd still want to avoid building objects for
         | the nodes while traversing.
         | 
         | That is to say, ast.NodeVisitor living in Python is part of the
         | problem for use cases like mine. I need the extension to own
         | the traversal as well so that I can avoid building objects
         | except for the result set (which is typically a very small
         | subset). That was what led me to imagine a query-like interface
         | instead, so that Python can give concise traversal
         | instructions.
        
       ___________________________________________________________________
       (page generated 2024-08-07 23:00 UTC)