[HN Gopher] A 100x speedup with unsafe Python
       ___________________________________________________________________
        
       A 100x speedup with unsafe Python
        
       Author : ingve
       Score  : 289 points
       Date   : 2024-05-05 08:08 UTC (2 days ago)
        
 (HTM) web link (yosefk.com)
 (TXT) w3m dump (yosefk.com)
        
       | idkdotcom wrote:
       | Safety is one of Python's greatest advantages over C or C++. Why
       | would anyone use unsafe Python when P0ython doesn't have other
       | features such as type safety and all the debugging tools that
       | have been built for C and C++ over time is beyond me.
       | 
       | Python is a great language, but just as the generation of kids
       | who got out of computer science programs in the 2000s were
       | clueless about anything that wasn't Java, it seems this
       | generation is clueless about anything that isn't Python.
       | 
       | There was life before Python and there will be life after Python!
        
         | jandrese wrote:
         | Seems to me if you could do most of the work in Python and then
         | just make the critical loop unsafe and 100x faster then that
         | would certainly have some appeal.
        
           | gibolt wrote:
           | Plenty of people would gladly not have to learn another
           | language (especially C).
           | 
           | You could also benefit from testing blocks of code with
           | safety enabled to have more confidence when safety is
           | removed.
        
             | KeplerBoy wrote:
             | It was explained pretty well in the blog: Installing the
             | OpenCV python package is easy, fast and painless as long as
             | you're happy with the default binary. Building OpenCV from
             | source to get it into a C/C++ program can quickly turn into
             | a multi-hour adventure, if you're not familiar with
             | building sizeable C++ projects.
        
             | TylerE wrote:
             | Or if you're going to learn another language might as well
             | learn nim. Keep most of the python syntax, ditch the
             | performance and the packaging insanity.
        
         | eru wrote:
         | > Why would anyone use unsafe Python when Python doesn't have
         | other features such as type safety and all the debugging tools
         | that have been built for C and C++ over time is beyond me.
         | 
         | C and C++ 'type safety' is barely there (compared to more sane
         | languages like OCaml or Rust or even Java etc). As to why would
         | anyone do that? The question is 'why not?' It's fun.
        
         | gibolt wrote:
         | "Why would anyone use" -> "when" usage generally means lots of
         | use cases are being ignored / swept under the rug.
         | 
         | >1 billion people exist. Each has a unique opinion / viewpoint.
        
         | bongodongobob wrote:
         | I don't use Python because of type safety, I use it because
         | it's easy to write. I couldn't give a single fuck about some
         | missed pointers that weren't cleaned up in the 2.5 seconds my
         | program ran. I'm not deploying public code.
         | 
         | I tend to prototype in Python and then just rewrite the whole
         | thing in C with classes if I need the 10000x speedup.
        
           | SunlitCat wrote:
           | That easy to write part is something, I am not so sure about.
           | It took me ages to understand why my first try at writing a
           | blender plugin didn't work.
           | 
           | It was because of white spaces not lining up. I was like,
           | really? Using white spaces as a way to denote the body of a
           | function? Really?
        
             | nottorp wrote:
             | I'm curious... what rock have you been living under in the
             | past 33 years since python was launched?
        
               | SunlitCat wrote:
               | Let's see, I was living under the:
               | 
               | - Amiga E - Assembler (on Amiga) - perl - tcl/tk -
               | javascript - vbscript - java - vba - c - c++ - python
               | 
               | rock. Granted at some of those rocks, I just took a quick
               | peek (more like a glance), but a rock with whitespaces
               | being important was, at that moment, new to me!
        
               | nottorp wrote:
               | Oh cmon, I can still write some z80 assembly from memory
               | and remember the zx spectrum memory layout somewhat, but
               | I check new programming languages now and then :)
        
               | SunlitCat wrote:
               | :)
               | 
               | I just meant that for someone coming from a language(s)
               | where different kinds of delimiters are used to denote
               | function bodies, a language that puts such a huge
               | emphasis on whitespace was kinda throwing me off to get
               | an error message slapped at me even though I was
               | replicating the example letter by letter (although,
               | obviously, I missed the importance of whitespace).
        
               | nottorp wrote:
               | It's all fuzzy, but I'm sure I knew about the importance
               | of whitespace in python long before I actually tried
               | python.
               | 
               | Even Raymond's article which was one of the first whined
               | about it [1]. I definitely remember reading that one and
               | thinking I have to check out that language later.
               | 
               | [1] https://www.linuxjournal.com/article/3882
        
               | SunlitCat wrote:
               | Thank you for the article. I have to put it on my "to
               | read" pile! :)
        
         | omoikane wrote:
         | The "unsafe" in the title appears to be used in the sense of
         | "exposing memory layout details", but not in the sense of
         | "direct unbounded access to memory". It's probably not the
         | memory safety issue you are thinking of.
        
           | yosefk wrote:
           | You can absolutely get direct unbounded access to memory with
           | ctypes, with all the bugs that come from this. I just
           | think/hope the code I show in TFA happens to have no such
           | bugs.
        
         | SunlitCat wrote:
         | Thing about academia (like computer science and the like) is,
         | you need a programming language you can get across easily,
         | quickly and can get people to produce satisfying results in no
         | time.
         | 
         | Back then it might have been java, then python, till the next
         | language comes around.
         | 
         | The java thing was so apparent, that when you looked at c++
         | code, you were able to spot the former java user pretty easily.
         | :)
         | 
         | About python, sometime ago, I was looking into arudino
         | programming in c and found someone offering a library (I think)
         | in c (or c++, can't remember). What i remember is, that this
         | person stopped working on it, because they got many requests
         | and questions regarding to be able to use their library in
         | python.
        
           | pjmlp wrote:
           | > The java thing was so apparent, that when you looked at c++
           | code, you were able to spot the former java user pretty
           | easily. :)
           | 
           | Well, Java was designed based on C++ GUI development patterns
           | typical in the 10 years of C++ GUI frameworks that predated
           | Java.
           | 
           | Yet, people keep using these kind of remarks.
        
         | kragen wrote:
         | while python is eminently criticable, yosef kreinin has
         | designed his own cpu instruction set and gotten it fabbed in a
         | shipping hardware product, and is the author of the c++fqa,
         | which is by far the best-informed criticism of c++; criticizing
         | him with 'this generation is clueless about anything that isn't
         | Python' seems off the mark
        
       | intelVISA wrote:
       | Isn't all Python, by design, unsafe?
        
         | eru wrote:
         | Is this supposed to be a joke?
         | 
         | Have a look at the linked article to see what meaning of
         | 'unsafe' the author has in mind.
        
           | intelVISA wrote:
           | Safety is catching (more) errors ahead of time ...for which
           | Python is grossly unsuitable imo.
           | 
           | Fun lang though, just not one that ever comes to mind when I
           | hear 'safe'.
        
             | eru wrote:
             | That's one definition of safety. But it's not the one the
             | author uses in this case.
             | 
             | The generic snide about Python that has nothing to do with
             | the article ain't all that informative.
             | 
             | You could make the same remark you just made about all
             | Python being unsafe about Rust: 'Isn't all Rust, by design,
             | unsafe?'. And compared to eg Agda or Idris that would be
             | true, but it wouldn't be a very useful nor interesting
             | comment when talking about specifically 'unsafe' Rust vs
             | normal Rust.
        
               | intelVISA wrote:
               | I would agree, Rust is not safe. We need to encourage
               | more formal rigor in our craft and avoid misconceptions
               | like 'safe Rust' or 'safe Python'. Thus my original
               | comment :P
        
               | eru wrote:
               | Eh, even Agda is only safe in the sense of 'does what
               | you've proven it to do'. That doesn't mean that if you eg
               | write a machine learning library in Agda, your AI won't
               | come and turn us all into paperclips.
               | 
               | So it doesn't really make sense to pretend there's some
               | single meaning of 'safe' vs 'unsafe' that's appropriate
               | in all contexts.
        
               | dahart wrote:
               | It's a term of art in this case. Or someone proposed the
               | term improper noun
               | https://news.ycombinator.com/item?id=32673100
               | 
               | Fine and good to advocate rigor, whether or not it's
               | specifically relevant to this post, but maybe be careful
               | with the interpretation and commentary lest people decide
               | the misconception is on your part?
        
             | recursive wrote:
             | Memory corruption vs type safety.
             | 
             | "Safety" is an overloaded term, but that happens a lot in
             | software. You'll probably get best results if you try to
             | understand what people are talking about, rather than just
             | assuming everyone else is an idiot.
        
         | westurner wrote:
         | ctypes, c extensions, and CFFI are unsafe.
         | 
         | Python doesn't have an unsafe keyword like Rustlang.
         | 
         | In Python, you can dereference a null pointer and cause a
         | Segmentation fault given a ctypes import.
         | 
         | lancedb/lance and/or pandas' dtype_backend="pyarrow" might work
         | with pygame
        
           | Retr0id wrote:
           | You don't even need a ctypes import for it, you'll get a
           | segfault from this:
           | eval((lambda:0).__code__.replace(co_consts=()))
           | 
           | cpython is not memory safe.
        
             | westurner wrote:
             | Static analysis of Python code should include review of
             | "unsafe" things like exec(), eval(), ctypes, c strings,
             | memcpy (*),.
             | 
             | Containers are considered nearly sufficient to sandbox
             | Python, which cannot be effectively sandboxed using Python
             | itself. Isn't that actually true for all languages though?
             | 
             | There's a RustPython, but it does support CFFI and
             | __builtins__.eval, so
        
               | maple3142 wrote:
               | The example given by parent does not need eval to trigger
               | though. Just create a function and replace its code
               | object then call it, it will easily segfault.
        
               | jwilk wrote:
               | Complete example without eval:                 def f():
               | pass       f.__code__ = f.__code__.replace(co_consts=())
               | f()
        
               | Retr0id wrote:
               | yup, eval was just there for golfing purposes
        
         | emmanueloga_ wrote:
         | As others commented "safety" is a heavily overloaded term, for
         | instance there's "type safety" [1] and "memory safety" [2], and
         | then there's "static vs dynamic typing" [3], and "weak vs
         | strong typing" [4], so talking about types and safety offered
         | by a language can be very nuanced.
         | 
         | I suspect when you say "unsafe by design" you may be referring
         | to the dynamic type checking aspect of Python, although Python
         | has supported for a while type annotations, that can be
         | statically checked by linter-like tools like PyRight.
         | 
         | --
         | 
         | 1: https://en.wikipedia.org/wiki/Type_safety
         | 
         | 2: https://en.wikipedia.org/wiki/Memory_safety
         | 
         | 3: https://en.wikipedia.org/wiki/Type_system#Type_checking
         | 
         | 4: https://en.wikipedia.org/wiki/Strong_and_weak_typing
        
           | d0mine wrote:
           | Python both type and memory safe e.g., ""+1 in Python is
           | TypeError and [][0] is IndexError
           | 
           | But you can perform unsafe operations e.g.,
           | import ctypes         ctypes.string_at(0)  # access 0 address
           | in memory -> segfault
        
       | comex wrote:
       | Do you actually need ctypes here, or can you just use np.reshape?
       | That would cut out the unsafety.
        
         | Retr0id wrote:
         | You can exploit memory unsafety in cpython using only built-in
         | methods, no imports needed at all:
         | https://github.com/DavidBuchanan314/unsafe-python/
        
           | yosefk wrote:
           | You... probably shouldn't, but, very interesting stuff in
           | that link! I didn't know CPython had no bounds checks in the
           | load_const bytecode op
        
         | im3w1l wrote:
         | I think what you want is a combination of swapaxes and flip
         | (reshape does not turn rows into columns, it turns m input rows
         | into n output rows), but yeah.
         | 
         | Actually let me include a little figure
         | original  reshape  swapaxes            abc       ab       ad
         | def       cd       be                 ef       cf
        
         | yosefk wrote:
         | I think you'll have a problem with BGR data absent ctypes,
         | since you have a numpy array with the base pointing at the
         | first red channel value, and you want an array with the base
         | pointing 2 bytes _before_ the first red channel value. This is
         | almost definitely  "unsafe"?.. or does numpy have functions
         | knowing that due to the negative z stride, the red value before
         | the blue value is within the bounds of the original array? I
         | somehow doubt it though it would be very impressive. And I
         | think the last A value is hopelessly out of reach absent ctypes
         | since a safe API has no information that this last value is
         | within the array bounds; more strictly speaking all the alpha
         | values are out of bounds according to the shape and strides of
         | the BGRA array.
        
       | qaq wrote:
       | Well with mojo it looks like you will have both safety and
       | performance
        
         | grandma_tea wrote:
         | Closed source and not relevant.
        
           | ianbutler wrote:
           | Mojo was open sourced months ago iirc
           | 
           | https://github.com/modularml/mojo
        
             | viraptor wrote:
             | Those are examples and docs. Mojo is still closed.
        
               | ianbutler wrote:
               | Their standard lib is in there, seems more open to me
               | 
               | Theres the announcement where they open sourced the
               | modules in their standard lib:
               | https://www.modular.com/blog/the-next-big-step-in-mojo-
               | open-...
               | 
               | There's the standard lib: https://github.com/modularml/mo
               | jo/blob/nightly/stdlib/README...
        
               | viraptor wrote:
               | Yeah, that's some minimal progress, but really not that
               | interesting. (As in, it's cool that it exists, but given
               | we've got Python stdlib and numpy already open, it's not
               | really new/exciting) And doesn't allow you to port to
               | platforms they don't care enough about.
        
               | qaq wrote:
               | It might be not interesting to you. Having a lot of Rust
               | features with much smarter compiler and Python syntax is
               | pretty interesting to me.
        
               | viraptor wrote:
               | No, mojo itself is interesting. The stdlib itself that's
               | posted isn't. I mean, is not very interesting code beyond
               | "what are the implementation details". Any decent
               | programmer could redo that based on python's stdlib.
        
       | Animats wrote:
       | Oh, array striding.
       | 
       | This is a classic bikeshedding issue. When Go and Rust were first
       | being designed, I brought up support for multidimensional arrays.
       | For both cases, that became lost in discussions over what
       | features arrays should have. Subarrays, like slices but
       | multidimensional? That's the main use case for striding.
       | Flattening in more than one dimension? And some people want
       | sparse arrays. Stuff like that. So the problem gets pushed off
       | onto collection classes, there's no standard, and everybody using
       | such arrays spends time doing format conversion. This is why
       | FORTRAN and Matlab still have strong positions in number-
       | crunching.
        
         | lifthrasiir wrote:
         | But format conversion is inevitable because C and FORTRAN has a
         | different axis ordering anyway, isn't it?
        
           | blt wrote:
           | Not really, you can always change the indexing to account for
           | it. For example, the GEMM matrix multiplication subroutines
           | from BLAS can transpose their arguments [1]. So if you have A
           | (m x n) and B (n x p) stored row-major, but you want to use a
           | column-major BLAS to compute A*B, you can instead tell BLAS
           | that A is n x m, B is p x n, and you want to compute A' * B'.
           | 
           | As the article mentions, NumPy can handle both and do all the
           | bookkeeping. So can Eigen in C++.
           | 
           | [1] https://www.math.utah.edu/software/lapack/lapack-
           | blas/dgemm....
        
             | lifthrasiir wrote:
             | That's conceptually still a format conversion, though the
             | actual conversion might not happen. Users have to track
             | which format is being used for which matrix and I believe
             | that was what the GP was originally complaining for.
        
               | yosefk wrote:
               | Agreed, realistically you're going to either convert the
               | data or give up on using the function rather than
               | reimplementing it to support another data layout
               | efficiently; while converting the code instead of the
               | data, so to speak, is more efficient from the machine
               | resource use POV, it can be very rough on human
               | resources.
               | 
               | (The article is basically about not having to give up _in
               | some cases_ where you can tweak the input parameters and
               | make things work without rewriting the function; but it
               | 's not always possible)
        
               | semi-extrinsic wrote:
               | Talking about rough on human resources, I take it you've
               | never written iterative Krylov methods where matrices are
               | in compressed sparse row (CSR) format?
               | 
               | Because that's the kind of mental model capacity
               | scientific programmers needed to have back in the days.
               | People still do similarly brain-twisting things today. In
               | this context, converting your logic between C and Fortran
               | to avoid shuffling data is trivial.
        
         | sa-code wrote:
         | What kind of support would you have hoped for?
        
           | infogulch wrote:
           | A way to reinterpret a slice of size N as a multidimensional
           | array with strides that are a factorization of N, including
           | optional reverse order strides. Basically, do the stride
           | bookkeeping internally so I can write an algorithm only
           | considering the logic and optimize the striding order
           | independently.
        
             | Animats wrote:
             | That's where you end up after heavy bikeshedding. Lots of
             | features, terrible performance, as the OP points out.
        
               | infogulch wrote:
               | I agree with you on sparse arrays and multidimensional
               | slices, but this is basically the same as what you'd do
               | manually. Saying that "track strides for me" is "lots of
               | features" is a bit uncharitable.
        
         | yosefk wrote:
         | From everything I'm seeing, it follows that Matlab & Fortran
         | have been decimated by Python and C/C++ around 1990s and 2010s,
         | respectively. Of course I could be wrong; any evidence of their
         | still strong position will be greatly appreciated, doubly so
         | evidence that this position is due to stride issues.
         | 
         | (Of course Python provides wrappers to Fortran code, eg
         | FITPACK, but this code specifically was mostly written in the
         | 80s, with small updates in this century, and is probably used
         | more thru the numpy wrappers, stride conversion issues and all,
         | than thru its native Fortran interface)
        
           | MobiusHorizons wrote:
           | I think pretty much all linear algebra libraries are still
           | Fortran and are unlikely to ever be C because Fortran is
           | legitimately faster than C for this stuff. I don't know if it
           | has anything to do with strong opinions about how values are
           | represented, I think it has more to do with lower overhead of
           | function calling, but that is just repeating what someone
           | told me about Fortran vs C in general, not necessarily
           | applicable to BLAS libraries.
           | 
           | Fortran at least used to be very common in heavy scientific
           | computing, but I would bet that relies on GPUs or other
           | accelerators these days.
        
             | yosefk wrote:
             | The question is how much new numeric code is written in
             | Fortran vs C/C++. My guess is way below 10%, certainly if
             | we measure by usage as opposed to LOC and I would guess by
             | LOC as well.
             | 
             | Is Fortran legitimately faster than C with the restrict
             | keyword? Regardless of the function call cost diffs between
             | the two - meaning, even if we assume Fortran is somehow
             | faster here - there's no way numeric production code does
             | enough function calls for this to matter. If Fortran was
             | faster than C at any point in time I can only imagine
             | pointer aliasing issues to have been the culprit and I
             | can't imagine it still being relevant, but I am of course
             | ready to stand corrected.
        
               | eyegor wrote:
               | I don't think fortran is faster by language virtue, but
               | it's certainly easier to scrap together high performance
               | numeric fortran code. And ifort/aocc are amazing at
               | making code that runs well on clusters, which is not a
               | priority for any other domain. Fortran is absolutely
               | still on top for modern numerics research code that
               | involves clusters, talk to anyone who works in
               | simulations at a national lab. If your code is mostly
               | matrix math, modern fortran is very nice to work with.
               | 
               | Emphasis on modern because maintaining 70s-80s numeric
               | mathematician cowboy code is a ninth circle of hell. It
               | has a bad rep for that reason.
        
               | semi-extrinsic wrote:
               | When you speak about code from the 1970s in particular,
               | you need to appreciate the extremely limited language
               | features that were available.
               | 
               | They did not even have while-loops in the language until
               | Fortran 77. While-loop functionality used to be
               | implemented using DO-loops mixed with GOTOs. You can't
               | fault the people of that era for using the only tools
               | that were available to them.
        
               | ant6n wrote:
               | At the core, Fortran uses multidimensional numerical
               | arrays, with the shapes being defined being defined in
               | the code. So the compiler knows much more about the data
               | that's being operated on, which in theory allows better
               | optimization.
               | 
               | I thought blas/lapack is still written in Fortran, so
               | most numerical code would still be built on top of
               | Fortran.
        
               | josefx wrote:
               | Restrict probably helps. I can't say much about fortran
               | but C still has warts that can significantly impact its
               | math library. For example almost every math function may
               | set errno, that is a sideeffect that isn't easy to
               | eliminate by the compiler and might bloat the code
               | significantly. For example with gcc a single instruction
               | sqrt turns into a sqrt instruction, followed by several
               | checks to see if it succeeded, followed by a function
               | call to the sqrt library function just to set errno
               | correctly. I just started to disable math-errno
               | completely once I realized that C allows several
               | alternative ways to handle math errors, which basically
               | makes any code relying on it non portable.
        
             | Galanwe wrote:
             | > I think pretty much all linear algebra libraries are
             | still Fortran and are unlikely to ever be C
             | 
             | That is more of an urban legend than reality in 2024. Fact
             | is, although the original BLAS implementation was fortran,
             | it has been at least a decade since every Linux
             | distribution ships either OpenBLAS or ATLAS or the MKL,
             | which are both written in a mix of C and assembly. All
             | modern processor support is only available in these
             | implementations.
             | 
             | LAPACK itself is still often built from the Fortran code
             | base, but it's just glue over BLAS, and it gets performance
             | solely from its underlying BLAS. Fortran doesn't bring any
             | value to LAPACK, it's just that rewriting millions of
             | algebraic tricks and heuristics for no performance gain is
             | not enticing.
        
           | chillfox wrote:
           | If everyone is just using the Fortran libraries instead of
           | reimplementing it in a modern language, then that's evidence
           | that it's still being used for that purpose.
        
           | aragilar wrote:
           | Matlab has the issue of not being open source (similarly
           | IDL), but I still see it popping up (though naturally, if
           | I've got any choice it the matter, it's going to be ported to
           | Python). I've also seen new Fortran codebases as well, mainly
           | where the (conceptual, not computational) overhead of using
           | helper libraries isn't worth it.
           | 
           | I'd suggest Python (via the various memoryview related PEPs,
           | plus numpy) does provide most of the required information
           | (vs. e.g. Go or Rust or Ruby), so I'm not sure that proving
           | much other than if the value of using a library to handle
           | multidimensional arrays is enough such that the combined
           | value of a more general language plus library beats Matlab or
           | Fortran (likely due to other ecosystem or licensing effects),
           | people will switch.
        
             | RugnirViking wrote:
             | we still get new Matlab codebases at my work, also R. In my
             | case it comes from academics in the power systems and
             | renewables fields who move into industry and keep using the
             | tools they used there. We try to gently ease them into
             | python and version control and sensible collaboration but
             | we get new hires all the time
        
           | samatman wrote:
           | Julia has gained a lot of mindshare in array-heavy coding,
           | with a lot of hard work put into making it possible to get
           | comparable speeds to C++ and Fortran. The manual[0] offers a
           | good introduction to what's possible: this includes sparse
           | and strided arrays, as well as the sort of zero-copy
           | reinterpretation you accomplish with pointer wizardry in the
           | Fine Article.
           | 
           | [0]: https://docs.julialang.org/en/v1/manual/interfaces/#man-
           | inte...
        
           | StableAlkyne wrote:
           | > any evidence of their still strong position will be greatly
           | appreciated
           | 
           | Fortran still dominates certain areas of scientific computing
           | / HPC, primarily computational chemistry and CFD. -
           | https://fortran-lang.org/packages/scientific - you don't hear
           | about most of them because they're generally run on HPC
           | centers by scientists in niche fields. But you do get their
           | benefit if you buy things that have the chemical sector in
           | their supply chain.
           | 
           | The common thread is generally historical codes with a lot of
           | matrix math. Fortran has some pretty great syntax for arrays
           | and their operations. And the for-loop parallelization syntax
           | in parallel compilers (like openmp) is also easy to use. The
           | language can even enforce function purity for you, which
           | removes some of the foot guns from parallel code that you get
           | in other languages.
           | 
           | The kinds of problems those packages solve tend to bottleneck
           | at matrix math, so it's not surprising a language that is
           | very ergonomic for vector math found use there.
           | 
           | Same for Matlab, it's mostly used by niche fields and
           | engineers who work on physical objects (chemical, mechanical,
           | etc). Their marketing strategy is to give discounts to
           | universities to encourage classes that use them. Like
           | Fortran, it has good syntax for matrix operations. Plus it
           | has a legitimately strong standard library. Great for
           | students who aren't programmers and who don't want to be
           | programmers. They then only know this language and ask their
           | future employer for a license. If you don't interact with a
           | lot of engineers at many companies, you aren't going to see
           | Matlab.
        
             | embwbam wrote:
             | I work for the National Solar Observatory, creating Level 2
             | data for the DKIST Telescope's observations of the sun.
             | (For example, an image of the Temperature that lines up
             | with the observation)
             | 
             | Just the other day, the solar physicist I work with said
             | "yeah, that code that runs on the supercomputer needs to be
             | rewritten in Fortran" (from C, I think.
             | 
             | He's nearing retirement, but it's not that he's behind the
             | times. He knows his stuff, and has a lot of software
             | experience in addition to physics
        
             | moregrist wrote:
             | While it's not as ubiquitous as it used to be, Matlab is
             | still very heavily used within shops that do a lot of
             | traditional engineering (ME, EE, Aero, etc.).
             | 
             | This surprises people who just write software, but
             | consider:
             | 
             | - The documentation and support is superb. This alone can
             | justify the cost for many orgs.
             | 
             | - Did I mention the support? MathWorks has teams of
             | application support engineers to help you use the tool. The
             | ability to pay to get someone on the phone can also justify
             | the price.
             | 
             | - The toolkits tend to do what specific fields want, and
             | they tend to have a decent api. In contrast, you might end
             | up hacking together something out of scipy and 3-4 weirdly
             | incompatible data science packages. That's fine for me, but
             | your average ME/EE would rather just have a tool that does
             | what they need.
             | 
             | - It also has SIMULINK, which can help engineers get from
             | spec to simulation very quickly, and seems deeply embedded
             | in certain areas (eg: it's pretty common for a job ad with
             | control systems work to want SIMULINK experience).
             | 
             | Is Python gradually displacing it? Probably.
             | 
             | (Honestly, I wish it would happen faster. I've written one
             | large program in Matlab, and I have absolutely no desire to
             | write another.)
        
           | xdavidliu wrote:
           | > Matlab & Fortran have been decimated by Python and C/C++
           | around 1990s and 2010s, respectively.
           | 
           | Nit: this failed to be respective; the years are the other
           | way around
        
             | kbelder wrote:
             | Interesting... maybe the respective comparison was Matlab
             | to Python and Fortran to C/C++? This sentence actually had
             | three parallel clauses.
             | 
             | But that was a great nit to find.
        
       | gugagore wrote:
       | Technically, image resizing does in general care about the color
       | channel ordering, because color spaces are in general not linear.
       | https://www.alanzucconi.com/2016/01/06/colour-interpolation/
        
         | quietbritishjim wrote:
         | That article fails to touch on the fundamental issue with its
         | title "The Secrets of Colour Interpolation": RGB values are a
         | nonlinear function of the light emitted (because we are better
         | at distinguishing dark colours, so it's better to allow
         | representing more of those), so to interpolate properly you
         | need to invert that function, interpolate, then reapply. The
         | difference that makes to colour gradients is really striking.
        
           | topherclay wrote:
           | Do you mean converting the RGB value to LAB values in the
           | CIELAB color space and doing the interpellation there?
           | 
           | Is there a better way to do it?
        
             | CarVac wrote:
             | No, you just need to linearize the brightness.
        
               | drjasonharrison wrote:
               | Typically this is done by using a look-up table to
               | convert the 8-bit gamma encoded intensities to 10-bit (or
               | more) linear intensities. You can use the same look-up
               | table for R, G, and B. Alpha should be linear.
        
         | yosefk wrote:
         | While this is a valid point, cv2.resize doesn't actually
         | implement color conversion in a way addressing this issue; at
         | least in my testing I get identical results whether I interpret
         | the data as RGB or BGR. So if you want to use cv2.resize, AFAIK
         | you can count on it not caring which channel is which. And if
         | you need fast resizing, you're quite likely to settle for the
         | straightforward interpolation implemented by cv2.resize.
        
         | planede wrote:
         | Even if the RGB components correspond to sRGB, to linearize you
         | apply the same non-linear function to each component value,
         | independently. So even if you do the interpolation in a linear
         | colorspace, the order of the sRGB components does not matter.
        
       | ggm wrote:
       | _Why_ does numpy do column order data?
       | 
       | Is it because in much of the maths domain it turns out you
       | manipulate columns more often than rows?
       | 
       | (not a mathematician but I could believe if the columns represent
       | "dimensions" or "qualities" of some dataset, and you want to
       | apply a function to a given dimension, having data in column-
       | natural order makes that faster.)
       | 
       | Obviously you want to believe naievely there is no difference to
       | the X and Y in the X,Y plane, but machines are not naieve and
       | sometimes there IS a difference to doing things to the set of
       | verticals, and the set of horizontals.
        
         | sdeer wrote:
         | Probably because Fortran stores matrices and other
         | multidimensional arrays in column order. Traditionally most
         | numerical computation software was written in Fortran and numpy
         | calls those under the hood. Storing in row order would have
         | meant copying the data to column major order and back for any
         | call to Fortran.
        
           | Thrymr wrote:
           | NumPy can _store_ arrays in either row-major (C-style) or
           | column-major (Fortran-style) order. Row-major is the default.
           | Index order is always row, column.
        
         | thayne wrote:
         | Numpy was originally designed for mathematicians and
         | scientists, and in those domains convention often lines up
         | better with column major order. For example, vectors are a
         | single column, and the column index comes before the row index
         | in many notations. So using column major order meant familar
         | formulas and algorithms (including translating from fortran
         | code which is column major, or matlab for that matter) are easy
         | to translate to numpy code.
         | 
         | Also, numpy is built on some fortran libraries, like BLAS and
         | LAPACK that assume a column major order. If it took row major
         | input, it would need to transpose matrices in some cases change
         | the order, or rewrite those libraries to use row major form.
        
           | kolbusa wrote:
           | Copying is usually not necessary. Often times you can swap
           | data and/or shape arguments and get a transposed result out.
           | While it is true that Fortran BLAS only supports col-major,
           | CBLAS supports both row- and col-major. Internally, all the
           | libraries I worked on use col-major, but that is just a
           | convention.
        
         | bee_rider wrote:
         | It is possible that my brain had been damaged by convention,
         | but it looks much tidier to have your vector multiplicand on
         | the right of the matrix when doing a matrix vector
         | multiplication y=Ax, where A is the matrix and x is the vector,
         | y is the result.
         | 
         | So, x and y must be columns. So, it is nice if our language has
         | column-order as the default, that way a vector is also just a
         | good old 1d array.
         | 
         | If we did y=xA, x and y would be rows, the actual math would be
         | the same but... I dunno, isn't it just hideous? Ax is like a
         | sentence, it should start with an upper case!
         | 
         | It also fits well with how we tend to learn math. First we
         | learn about functions, f(x). Then we learn that a function can
         | be an operator. And a matrix can represent a linear operator.
         | So, we tend to have a long history of stuff sitting to the left
         | of x getting ready to transform it.
        
           | SailorJerry wrote:
           | I think the reason I prefer columns is I do the mental
           | expansion into large bracketed expressions. If x is a row and
           | kept inline, then the expansion gets really wide. To keep it
           | compact and have the symbols oriented the same as their
           | expansion, you'd have to put the x above A and that's just
           | silly.
        
           | Joker_vD wrote:
           | Well, there used to be a tradition in universal algebra (and
           | category theory) of putting functions/operators at the right
           | side of the arguments but it seems to have ultimately died
           | out. And "y = xA" is the standard notation in the linear
           | coding theory, even to this day: message vectors are bit
           | strings, not bit columns.
        
             | bee_rider wrote:
             | I think there's also some other pretty big field that tends
             | to put the vector on the left... statistics? Economics? For
             | some reason I couldn't find any links, so maybe I just got
             | this from, like, one random statistician. If only they'd
             | told me about sample sizes.
        
           | planede wrote:
           | Your reasoning starts out nicely with y=Ax, but you get the
           | wrong conclusion. The layout of x and y are not affected at
           | all by row or column order, they are just consecutive numbers
           | in either case. So you have to look at the layout of A.
           | 
           | For the row-major layout the multiplication Ax ends up being
           | more cache-efficient than for the column major layout, as for
           | calculating each component of y you need to scan x and a row
           | of A.
        
             | bee_rider wrote:
             | I'm pretty sure BLAS will tile out the matrix
             | multiplication internally anyway, so it doesn't matter. At
             | least for matmat would, is matvec special?
        
               | planede wrote:
               | However matmat is done, row-major vs column-major for
               | both matrices shouldn't make a difference (for square
               | matrices at least).
               | 
               | I don't know if tiling is done for matvec. I don't think
               | it makes sense there, but I didn't think about it too
               | hard.
        
           | montebicyclelo wrote:
           | y = x @ W + b
           | 
           | Is how you'd write it in NumPy and most (/all?) deep learning
           | libraries, with x.shape=[batch, ndim]
           | 
           | I'd personally prefer that to be the convention, for
           | consistency.
        
         | rdtsc wrote:
         | I think in math columns as vectors is a more common
         | representation. Especially if we talk about matrix
         | multiplication and linear systems.
        
         | Pinus wrote:
         | Hang on, doesn't numpy use C (row major) array ordering by
         | default? _checks docs_ Seems like it does. However, numpy array
         | indexing also follows the maths convention where 2D arrays are
         | indexed by row, column (as does C, by the way), so to access
         | pixel x, y you need to say im[y, x]. And the image libraries
         | where I have toyed with with numpy integration (only Pillow, to
         | be honest) seem to work just fine like this -- a row of pixels
         | is stored contiguously in memory. So I don't quite see why the
         | author claims that numpy stores a _column_ of pixels
         | contiguously, but I have only glanced at the article, so quite
         | probably I have missed something.
        
           | KeplerBoy wrote:
           | You're right. Numpy stores arrays in row-major order by
           | default. One can always just have a look at the flags
           | (ndarray.flags returns some information about the order and
           | underlying buffer).
        
       | ruined wrote:
       | oh THIS is why image byte order and dimensions are so confusing
       | every time i fuck with opencv and pygame
       | 
       | well, half of why. for some reason i keep doing everything
       | directly on /dev/fb0
        
         | TeMPOraL wrote:
         | Image byte order, axis directions, coordinate system handedness
         | when in 3D... after enough trying, you eventually figure out
         | the order of looping at any given stage of your program, and
         | then it Just Works, and then you _never touch it again_.
        
       | Too wrote:
       | Clickbait title. The speedup only applies to one particular
       | interaction between SDL and numpy.
        
       | C4stor wrote:
       | All of this seems unnecessary, and easily replaced in the
       | provided benchmark by :
       | 
       | i2 = np.ascontiguousarray(pg.surfarray.pixels3d(isurf))
       | 
       | Which does the 100x speedup too and is a "safe" way to adjust
       | memory access to numpy strides.
       | 
       | Whether the output is correct or not is left as an exercise to
       | the author, since the provided benchmark only use np.zeros() it's
       | kind of hard to verify
        
         | yosefk wrote:
         | I just measured this with the np.ascontiguousarray call (inside
         | the loop of course, if you do it outside the loop when i2 is
         | assigned to, then ofc you don't measure the overhead of
         | np.ascontiguousarray) and the runtime is about the same as
         | without this call, which I would expect given everything TFA
         | explains. So you still have the 100x slowdown.
         | 
         | TFA links to a GitHub repo with code resizing non-zero images,
         | which you could use both to easily benchmark your suggested
         | change, and to check whether the output is still correct.
        
       | londons_explore wrote:
       | > What can we do about this? We can't change the layout of pygame
       | Surface data. And we seriously don't want to copy the C++ code of
       | cv2.resize, with its various platform-specific optimizations,
       | 
       | Or... you could have sent a ~25 line pull request to opencv to
       | fix this performance problem not just for you, but for thousands
       | of other developers and millions of users.
       | 
       | I think your fix would go here:
       | 
       | https://github.com/opencv/opencv/blob/ba65d2eb0d83e6c9567a25...
       | 
       | And you could have tracked down that slow code easily by running
       | your slow code in gdb, hitting Ctrl+C while it's doing the slow
       | thing, and then "bt" to get a stack trace of what it's doing and
       | you'd see it constructing this new image copy because the format
       | isn't correct.
        
         | yosefk wrote:
         | 25 line pull request doing what? Supporting this format
         | efficiently is probably more LOC and unlikely to get merged as
         | few need this and it complicates the code. Doing the thing I
         | did in Python inside OpenCV C++ code instead (reinterpreting
         | the data) is not really possible since you have less knowledge
         | about the input data at this point (like, you're going to
         | assume it's an RGBA image and access past the area defined as
         | the allocated array data range for the last A value? The user
         | gives a 3D array and you blithely work on the 4th dimension?)
         | 
         | And what about the other examples I measured, including pure
         | numpy code? More patches to submit?
        
           | londons_explore wrote:
           | inside the python opencv bindings?
           | 
           | Or just special case resize, since there the channel order
           | doesn't matter. If you interpret R as A you'll still get the
           | right answer.
        
             | yosefk wrote:
             | The pygame API gives you a 3D RGB array and a separate 2D
             | alpha array, which happen to reference the same RGBA/BGRA
             | memory. Are you suggesting that OpenCV Python bindings (in
             | most of their functions, and likewise other libraries
             | including numpy operators themselves) should have code
             | assuming that a WxHx3 RGB array really references the
             | memory of a WxHx4 RGBA array?..
             | 
             | If you really want a solution which is a patch rather than
             | a function in your own code, I guess the way to go would be
             | to persuade the maintainers of pygame and separately, the
             | maintainers of its fork, pygame-ce, to add a
             | pixels4d_maybe_rgba_and_maybe_bgra() function returning a
             | WxHx4 array. Whether they would want to add something like
             | that I don't know. But in any case, I think that since
             | closed source projects, open source projects, and code by
             | sister teams exist that will not accept patches, or will
             | not accept them in a timely manner, it's interesting what
             | you can do without changing someone else's code.
        
       | antirez wrote:
       | MicroPython allows to do even more than that, and is fantastic
       | and I wish it was part of standard Python as well. Using a
       | decorator you can write functions in Viper, a subset of Python
       | that has pointers to bytearrays() and only fast integer types
       | that have a behavior similar to C (fixed size, they wrap around
       | and so forth). It's a very nice way to speedup things 10 or 100x
       | without resorting to C extensions.
        
         | yosefk wrote:
         | I dunno if it's "more than that" since they're not directly
         | comparable? What you describe sounds more like numba or Cython
         | than what TFA describes which is a different use case?..
        
           | antirez wrote:
           | Indeed, I don't mean they are directly comparable, but it's a
           | great incarnation of the "Unsafe Python" idea, because it
           | does not go too far with the subset of Python that you can
           | use, like introducing a complete different syntax or alike.
           | There are strict rules, but if you follow them you still
           | kinda write Python that goes super fast.
        
       | Joker_vD wrote:
       | Yep, welcome to the world of RGBA and ARGB storage and memory
       | representation formats, with little- and big-endianness thrown in
       | for the mix, it's all very bloody annoying for very little gain.
        
       | sylware wrote:
       | I wonder how it would perform with the latest openstreetmap tile
       | renderer which was released not long ago.
        
       | jofer wrote:
       | I'm confused on the need for cytpes/etc here. You can directly
       | modify somearr.strides and somearr.shape. And if you need to set
       | them both together, then there's
       | numpy.lib.stride_tricks.as_strided. Unless I'm missing something,
       | you can do the same assignment that ctypes is used for here
       | directly in numpy. But I might also be missing a lot - I'm still
       | pre-coffee.
       | 
       | On a different note, I'm surprised no one has mentioned Cython in
       | this context. This is distinct from, but broadly related to
       | things like @cython.boundscheck(False). Cython is _really_ handy,
       | and it's a shame that it's kinda fallen out of favor in some ways
       | lately.
        
         | yosefk wrote:
         | You need to modify base pointer in this example, specifically
         | to point 2 bytes before the current base pointer (moving back
         | from the first red pixel value to the first blue value.) I
         | don't think you can do it without ctypes, maybe I'm wrong.
         | 
         | What does Cython do better than numbs except static
         | compilation? Honest q, I know little about both
        
           | jofer wrote:
           | You actually can do the "offset by 2 bytes back" with a
           | reshape + indexing + reshape back. But I suspect I'm still
           | missing something. I need to read things in more depth and
           | try it out locally.
           | 
           | By "numbs" here, I'm assuming you mean numba. If that's not
           | correct, I apologize in advance!
           | 
           | There are several things where Cython is a better fit than
           | numba (and to be fair, several cases where the opposite is
           | true). There are two that stick out in my mind:
           | 
           | First off, the optimizations for cython are explicit and a
           | matter of how you implement the actual code. It's a separate
           | language (technically a superset of python). There's no
           | "magic". That's often a significant advantage in and of
           | itself. Personally, I often find larger speedups with Cython,
           | but then again, I'm more familiar with it, and understand
           | what settings to turn on/off in different situations. Numba
           | is much more of a black box given than it's a JIT compiler.
           | With that said, it can also do things that Cython can't
           | _because_ it's a JIT compiler.
           | 
           | If you want to run python code as-is, then yeah, numba is the
           | better choice. You won't see a speedup at all with Cython
           | unless you change the way you've written things.
           | 
           | The second key thing is one that's likely the most
           | overlooked. Cython is arguably the best way to write a python
           | wrapper around C code where you want to expose things as
           | numpy arrays. It takes a _huge_ amount of the boilerplate
           | out. That alone makes it worth learning, i.m.o.
        
             | jofer wrote:
             | Ah, right, you mean _outside_ the memory block of the
             | array! Sorry, my mind was just foggy this morning.
             | 
             | That's not strictly possible, but "circular" references are
             | with "stride tricks". Those can accomplish similar things
             | in some circumstances. But with that said, I don't think
             | that would work in this case.
        
       ___________________________________________________________________
       (page generated 2024-05-07 23:02 UTC)