[HN Gopher] Data-Oriented Programming in Python
___________________________________________________________________
Data-Oriented Programming in Python
Author : brilee
Score : 83 points
Date : 2022-11-26 17:45 UTC (1 days ago)
(HTM) web link (www.moderndescartes.com)
(TXT) w3m dump (www.moderndescartes.com)
| hgibbs wrote:
| I'd like to plug riptables
| (https://github.com/rtosholdings/riptable), which is (more-or-
| less) a performance upgrade to pandas.
| anigbrowl wrote:
| Looks nice!
| duped wrote:
| I'm curious how you would do data oriented programming in a
| language with no type system and no control over memory layout.
| And I guess the answer is "you can't, but JITs might exist
| someday that do it for you"
|
| But you can't wave your hands around and say compiler
| optimizations will fix performance problems - they can, but
| they're not magic, and the arrow in the proverbial knee for
| optimization passes are language semantics that make them
| impossible to realize (forcing the authors to either abandon the
| passes, or rely on things like dynamic deoptimization which is
| not free).
| sirwhinesalot wrote:
| By using only coding patterns that are known to JIT well and
| lower level primitive types and containers if provided by the
| language. Maximizing the use of packages written in native code
| also helps.
|
| The resulting code is even more annoying to write than using a
| lower level language typed language in the first place, but
| ecosystem access sometimes makes up for it.
|
| Hopefully tools like mypyc get better, letting well-typed
| python code with reasonable usage patterns be compiled to
| reasonably efficient native code.
|
| Last time I used it I was pleased with the performance benefits
| but it couldn't even compile all files in a module to a single
| shared library, despite this being mentioned as possible (and
| recommended) in the docs. Maybe I was doing something wrong,
| but they don't answer their github issues often, alas.
|
| Any little thing helps though, it's one thing for throwaway
| scripts to be inefficient, but applications? At a large scale
| it is a monstrous waste of time and literal energy.
| wheelerof4te wrote:
| To spare you a couple minutes of your life, the article is saying
| this:
|
| Python + C modules = Speed
|
| Nothing new here, move along.
| tomrod wrote:
| This is a wonderfully technical article. I'd love to learn more
| about Python internals as a scientific coder.
| pedrovhb wrote:
| The official Python documentation is excellent, and in many
| ways goes beyond providing just a list of existing modules and
| what they do. Sometimes if I'm bored I'll actually just pull up
| documentation for something I'm not 100% familiar with and have
| a look around, and I almost always find something new and
| useful. A couple of interesting ones are [0][1], and [2] is a
| nice starting point for discovering more. Not everyone's cup of
| tea, but I also found it enjoyable dive into asyncio with the
| docs.
|
| [0] https://docs.python.org/3/howto/descriptor.html [1]
| https://docs.python.org/3/library/collections.html [2]
| https://docs.python.org/3/
| barefeg wrote:
| I recommend any of the talks by James Powell at PyData. For
| example this one https://youtu.be/cKPlPJyQrt4
|
| Edit: maybe this one on Numpy may be more relevant:
| https://youtu.be/u2yvNw49AX4
| _visgean wrote:
| Hmm nice article but imho skips over the biggest optimization:
| numpy uses BLAS libraries so stuff like
|
| > >>> multiply_by_two = homogenous_array * 2
|
| will be calculated most of the times using a BLAS library -
| whichever you are using
| (https://numpy.org/devdocs/user/building.html)
| cdavid wrote:
| That article talks about DL, where blas is much less relevant.
| The kernels are mostly CUDA (for GPU) and similar stuff for
| other accelerators.
| college_physics wrote:
| > In practice, scientific computing users rely on the NumPy
| family of libraries e.g. NumPy, SciPy, TensorFlow, PyTorch, CuPy,
| JAX, etc..
|
| this is a somewhat confusing statement. most of these libraries
| actually don't rely on numpy. e.g. tensorflow ultimately wraps
| c++/eigen tensors [0] and numpy enters somewhere higher up in
| their python integration
|
| [0]
| https://github.com/tensorflow/tensorflow/blob/master/tensorf...
___________________________________________________________________
(page generated 2022-11-27 23:00 UTC)