[HN Gopher] Data-Oriented Programming in Python
       ___________________________________________________________________
        
       Data-Oriented Programming in Python
        
       Author : brilee
       Score  : 83 points
       Date   : 2022-11-26 17:45 UTC (1 days ago)
        
 (HTM) web link (www.moderndescartes.com)
 (TXT) w3m dump (www.moderndescartes.com)
        
       | hgibbs wrote:
       | I'd like to plug riptables
       | (https://github.com/rtosholdings/riptable), which is (more-or-
       | less) a performance upgrade to pandas.
        
         | anigbrowl wrote:
         | Looks nice!
        
       | duped wrote:
       | I'm curious how you would do data oriented programming in a
       | language with no type system and no control over memory layout.
       | And I guess the answer is "you can't, but JITs might exist
       | someday that do it for you"
       | 
       | But you can't wave your hands around and say compiler
       | optimizations will fix performance problems - they can, but
       | they're not magic, and the arrow in the proverbial knee for
       | optimization passes are language semantics that make them
       | impossible to realize (forcing the authors to either abandon the
       | passes, or rely on things like dynamic deoptimization which is
       | not free).
        
         | sirwhinesalot wrote:
         | By using only coding patterns that are known to JIT well and
         | lower level primitive types and containers if provided by the
         | language. Maximizing the use of packages written in native code
         | also helps.
         | 
         | The resulting code is even more annoying to write than using a
         | lower level language typed language in the first place, but
         | ecosystem access sometimes makes up for it.
         | 
         | Hopefully tools like mypyc get better, letting well-typed
         | python code with reasonable usage patterns be compiled to
         | reasonably efficient native code.
         | 
         | Last time I used it I was pleased with the performance benefits
         | but it couldn't even compile all files in a module to a single
         | shared library, despite this being mentioned as possible (and
         | recommended) in the docs. Maybe I was doing something wrong,
         | but they don't answer their github issues often, alas.
         | 
         | Any little thing helps though, it's one thing for throwaway
         | scripts to be inefficient, but applications? At a large scale
         | it is a monstrous waste of time and literal energy.
        
       | wheelerof4te wrote:
       | To spare you a couple minutes of your life, the article is saying
       | this:
       | 
       | Python + C modules = Speed
       | 
       | Nothing new here, move along.
        
       | tomrod wrote:
       | This is a wonderfully technical article. I'd love to learn more
       | about Python internals as a scientific coder.
        
         | pedrovhb wrote:
         | The official Python documentation is excellent, and in many
         | ways goes beyond providing just a list of existing modules and
         | what they do. Sometimes if I'm bored I'll actually just pull up
         | documentation for something I'm not 100% familiar with and have
         | a look around, and I almost always find something new and
         | useful. A couple of interesting ones are [0][1], and [2] is a
         | nice starting point for discovering more. Not everyone's cup of
         | tea, but I also found it enjoyable dive into asyncio with the
         | docs.
         | 
         | [0] https://docs.python.org/3/howto/descriptor.html [1]
         | https://docs.python.org/3/library/collections.html [2]
         | https://docs.python.org/3/
        
         | barefeg wrote:
         | I recommend any of the talks by James Powell at PyData. For
         | example this one https://youtu.be/cKPlPJyQrt4
         | 
         | Edit: maybe this one on Numpy may be more relevant:
         | https://youtu.be/u2yvNw49AX4
        
       | _visgean wrote:
       | Hmm nice article but imho skips over the biggest optimization:
       | numpy uses BLAS libraries so stuff like
       | 
       | > >>> multiply_by_two = homogenous_array * 2
       | 
       | will be calculated most of the times using a BLAS library -
       | whichever you are using
       | (https://numpy.org/devdocs/user/building.html)
        
         | cdavid wrote:
         | That article talks about DL, where blas is much less relevant.
         | The kernels are mostly CUDA (for GPU) and similar stuff for
         | other accelerators.
        
       | college_physics wrote:
       | > In practice, scientific computing users rely on the NumPy
       | family of libraries e.g. NumPy, SciPy, TensorFlow, PyTorch, CuPy,
       | JAX, etc..
       | 
       | this is a somewhat confusing statement. most of these libraries
       | actually don't rely on numpy. e.g. tensorflow ultimately wraps
       | c++/eigen tensors [0] and numpy enters somewhere higher up in
       | their python integration
       | 
       | [0]
       | https://github.com/tensorflow/tensorflow/blob/master/tensorf...
        
       ___________________________________________________________________
       (page generated 2022-11-27 23:00 UTC)