[HN Gopher] Some reasons to avoid Cython
       ___________________________________________________________________
        
       Some reasons to avoid Cython
        
       Author : EntICOnc
       Score  : 60 points
       Date   : 2023-09-21 06:12 UTC (1 days ago)
        
 (HTM) web link (pythonspeed.com)
 (TXT) w3m dump (pythonspeed.com)
        
       | jonatron wrote:
       | You have to pretend that you can't do                   from
       | libcpp.vector cimport vector
       | 
       | for this blog post to make sense.
        
         | [deleted]
        
         | bjourne wrote:
         | Perhaps you don't want to link to STL or can't tolerate some of
         | its idiosyncratic semantics? Some platforms Python runs on may
         | not even come with STL.
        
       | Reubend wrote:
       | I think these criticisms are valid (at least for Cython 2) and
       | they are well explained. But I don't see this article mention the
       | main benefit of Cython from my experience, which is the speed
       | increase you can get from Pythonic code annotated with a few
       | types. The suggested alternatives don't really address that same
       | use case.
        
       | sdfghswe wrote:
       | If you wanna write code in a high-level language that lets to
       | optimize individual assembly lines, have a look at Julia.
       | 
       | https://docs.julialang.org/en/v1/stdlib/InteractiveUtils/#In...
        
       | joss82 wrote:
       | `ctypes` is part of Python's standard library and allows you to
       | directly call C functions from Python code.
       | 
       | It's glorious in its simplicity.
       | 
       | https://docs.python.org/3/library/ctypes.html
        
         | masklinn wrote:
         | "Glorious" and "simplicity" are definitely words I've never
         | read about ctypes before.
         | 
         | "Wonky" and "terrifying", a lot more. ctypes is... useful, but
         | it also uses somewhat strange terminology which can be hard to
         | match to C's as it's trying to bridge C and Python. And when
         | getting it wrong is an UB, it's pretty frustrating.
        
         | jofer wrote:
         | It also doesn't help you use numpy arrays with C functions,
         | which is one of the big selling points of cython.
        
           | ali_m wrote:
           | You can absolutely use numpy arrays with C functions using
           | ctypes. Numpy has `numpy.ctypeslib` which takes care of some
           | of the boilerplate involved.
        
             | jofer wrote:
             | Yes, you can, but it is easier in cython, and that is one
             | of the key selling points of cython.
             | 
             | Nothing wrong with using ctypes. It's the right solution
             | for some things. However, cython is generally easier with
             | numpy than numpy.ctypeslib
        
               | ali_m wrote:
               | I think ctypes shines when it comes to fast prototyping,
               | since you can iterate on the python bindings without a
               | compilation step. It can also simplify distribution since
               | the bindings can be pure python. Where it's arguably not
               | so good is performance and maintainability.
        
       | intalentive wrote:
       | There are other ways to get performance boosts out of Python,
       | like taichi and mypyc.
       | 
       | Rust evangelists are big on safety guarantees and while that's a
       | nice feature I'm not convinced it's The Most Important Thing
       | Ever.
        
       | stabbles wrote:
       | Also, don't include generated C files when distributing the
       | package. It's notoriously non-forward compatible with Python, and
       | generating C code takes much less time than compiling, which has
       | to happen anyways.
        
       | tehsauce wrote:
       | "Notice that Rust has a built-in vector class, as well as
       | iterators" Pybind with c++ will also automatically convert
       | between python and c++ standard types
        
       | zzzeek wrote:
       | sure if you use malloc directly in your cython code, you're out
       | on a limb. That's not how simple use of cython goes. You can
       | apply cython to Python code directly as a code inliner and
       | there's little to no risk of C-style issues being introduced.
       | 
       | "two compiler passes being a problem" again this is if you are
       | writing big tracts of C code in your cython; not how it's
       | normally used.
       | 
       | "No standardized package or build system for dependencies" / "all
       | the incentives push you to write everything from scratch in
       | Cython, rather than reuse preexisting libraries." - I dont really
       | understand this part, is this just a general C/C++ does not
       | encourage the use of other native dependencies? We are using
       | Cython to write Python code that is more optimized than plain
       | Python. our dependencies are normally going to be other Python
       | dependencies. If our Cython is to wrap some well known native
       | library, then that has to be installed also when the Python wheel
       | is installed, and that doesnt change if your Python wheel was
       | built from Rust source or C/C++ source.
       | 
       | We use Cython in SQLAlchemy to tremendous effect and excellent
       | integration with existing Python code, including being able to
       | fall back to pure Python (so that our source install runs even if
       | you dont have a compiler or are using Pypy), and we've had zero
       | user issues /bugs / anything. We will consider the Rust tools
       | once they've had several years of maturity and widespread use
       | under their belts (meaning, they'd have to meet or surpass
       | cython's popularity), otherwise we aren't going to hoist that on
       | our userbase anytime soon.
        
         | cb321 wrote:
         | Indeed. It is pretty easy to just write some Cython routine
         | against the data pointer & range lifted out of some NumPy array
         | and then still let Python do all the memory management for you.
         | 
         | I think Cython is great for just speeding up profiling-revealed
         | hot spots. And `cython --annotate` is even a nice helper along
         | that path. { In fact, I think gcc should have a similar system
         | one could integrate so that you can click-expand the Python to
         | get the C and then click again the C to get the assembly. :-) }
         | It really makes Python more like the gradually typed system
         | Common Lisp always was.
         | 
         | In fact, there was talk back in the very early noughties of
         | bundling a precursor of Cython with the Python interpreter
         | itself. I was always a bit disappointed that didn't go very
         | far. Ah well.
        
       | PaulHoule wrote:
       | The article doesn't mention
       | 
       | https://www.pypy.org/
       | 
       | which gives a big boost to plain ordinary Python code,
       | particularly branchy and dynamic stuff like
       | 
       | https://rdflib.readthedocs.io/en/stable/
       | 
       | where it made the difference between a system I was working on
       | being tolerable and not tolerable.
        
         | srean wrote:
         | I love PyPy, It is a marvelous piece of engineering and design.
         | Unfortunately, the benefits of PyPy do not translate into one
         | of the most important use cases of Python -- those that call
         | into 3rd party array, ML, stats and science modules and
         | libraries.
        
         | klyrs wrote:
         | My favorite feature of LPython is that they have a list of
         | other python compilers:
         | 
         | https://lpython.org/
        
       | btwillard wrote:
       | In case anyone is wondering, this is essentially a few complaints
       | about the basic transpilation/source-to-source approach taken by
       | Cython and then some promotion for Rust. It unfortunately mixes
       | some general C/C++ complaints in there, too.
        
         | scarygliders wrote:
         | Beat me to it.
         | 
         | tl;dr "Don't use Python/Cython or C/C++. Use Rust instead, it's
         | better." is basically that article.
        
         | [deleted]
        
       | itamarst wrote:
       | Author here: Note that this hasn't yet been updated for Cython 3,
       | which does fix or improve some of these (but not the fundamental
       | limitation that you're stuck with C or C++).
        
         | klyrs wrote:
         | Pardon me, but your implementation is a strawman. Pick on this
         | (which doesn't require Cython 3):                   from
         | libcpp.vector cimport vector         from libcpp.pair cimport
         | pair              cdef class PointVec:             cdef
         | vector[pair[float, float]] vec                  def
         | __init__(self, points: list[tuple[float, float]]):
         | self.vec = points                  def __repr__(self):
         | result = ", ".join(f"({x}, {y})" for x, y in self.vec)
         | return f"PointVec({result})"                  def __setitem__(
         | self, index, point: tuple[float, float]             ):
         | cdef pair[float, float] *p = &self.vec.at(index)
         | p.first = point[0]                 p.second = point[1]
         | def __getitem__(self, index):                 return
         | self.vec.at(index)
        
           | IshKebab wrote:
           | You can't disprove that a language is error prone by
           | providing a 20 line example that happens to be correct.
        
             | klyrs wrote:
             | Nor can you prove that a language is error prone by
             | providing a 40 line example written in an antiquated style
             | that deliberately avoids using the safety features at one's
             | disposal.
        
       | eigenvalue wrote:
       | My new favorite way to write very fast libraries for Python is to
       | just use Rust and Maturin:
       | 
       | https://github.com/PyO3/maturin
       | 
       | It basically automates everything for you. If you use it with
       | Github actions, it will compile wheels for you on each release
       | for every platform and python version you want, and even upload
       | them to PyPi (pip) for you. Everything feels very modern and well
       | thought out. People really care about good tooling in the Rust
       | world.
        
         | IshKebab wrote:
         | Maturin is great. I've used it to distribute a Rust program
         | that has absolutely nothing to do with Python. It compiled it
         | fine and once I had navigated the usual mess of Python to find
         | out how to upload packages to PyPi (not to be confused with
         | PyPy), it worked pretty well.
         | 
         | I got the idea from CMake, which also has absolutely nothing to
         | do with Python but is best installed via Pip. It's a package
         | manager that basically works and is basically always available
         | on Linux and Mac (among programmers anyway).
         | 
         | One of the few areas of Python that doesn't completely suck.
        
         | abdullahkhalids wrote:
         | I am wanting to convert about a 5 functions/100 lines of my
         | Python project into Rust. I cobbled together the Maturin
         | integration with my project earlier this week, which seems to
         | work on some test functions. But I don't know any Rust!
         | 
         | What's the best way to learn enough Rust to do this? My code is
         | basically just some Numpy array manipulation, with some
         | unfortunate for-loops which can't be vectorized, which is the
         | source of the slow speeds.
        
           | aardshark wrote:
           | Open up Chat GPT, paste your functions and ask it to convert
           | them to rust. Go through them 1 by 1, see if you understand
           | and ask questions about anything you don't recognise. Don't
           | expect the output to be perfectly logically correct, you will
           | have to ensure that yourself.
           | 
           | I've found Chat GPT to be really excellent for quickly
           | getting myself up to speed with languages that I'm not
           | familiar with.
        
             | eigenvalue wrote:
             | Yes, that's my advice as well. Set up vscode with rust
             | analyzer and paste any errors it shows back into the same
             | ChatGPT conversation and it will debug everything for you.
        
           | eigenvalue wrote:
           | You can see how I did something similar in my library here:
           | 
           | https://github.com/Dicklesworthstone/fast_vector_similarity/.
           | ..
           | 
           | Basically you use ndarray instead of numpy, try to vectorize
           | anything you can, and for the for loops that can't be
           | vectorized, you can use rayon to do them in parallel.
        
         | cozzyd wrote:
         | this is indeed what the article advocates for
         | 
         | (yes, this particular bit of rust evangelism was not obvious
         | from the headline)
        
       | huac wrote:
       | I would love some examples of how to do non-trivial data interop
       | between Rust and Python. My experience is that PyO3/Maturin is
       | excellent when converting between simple datatypes but
       | conversions get difficult when there are non-standard types, e.g.
       | Python Numpy arrays or Rust ndarrays or whatever other custom
       | thing.
       | 
       | Polars seems to have a good model where it uses the Arrow in
       | memory format, which has implementations in Python and Rust, and
       | makes a lot of the ndarray stuff easier. However, if the Rust
       | libraries are not written with Arrow first, they become quite
       | hard to work with. For example, there are many libraries written
       | with https://github.com/rust-ndarray/ndarray, which is
       | challenging to interop with Numpy.
       | 
       | (I am not an expert at all, please correct me if my
       | characterizations are wrong!)
        
       | atemerev wrote:
       | I _knew_ there will be Rust at the end.
       | 
       | Sorry, not interested. I can't think in Rust. Tried many times.
       | Things like dynamically updated graphs are nearly impossible to
       | write in Rust, and concurrency is less than pleasant. Fighting
       | the borrow checker is not my idea of a good time.
       | 
       | I don't understand why everyone is so fascinated with Rust. I am
       | like 3 times less productive there, and there is absolutely no
       | pleasure for me in writing Rust code.
       | 
       | I'll stick to Python and C++, thank you.
        
         | slowhadoken wrote:
         | I prefer the C, C++, and Python communities too.
        
         | klyrs wrote:
         | The borrow checker shouldn't be a showstopper for generic
         | graphs. I understand the difficulty with linked lists, trees,
         | etc.; but if your graph is represented by a container, I don't
         | see the obstacle. (nb: I don't use rust, I also get annoyed by
         | people over-selling rust)
        
         | fsloth wrote:
         | "dynamically updated graphs are nearly impossible to write in
         | Rust"
         | 
         | Can you expand on this? I've taken only a cursory look on Rust
         | and it's not obvious to me what are the specific constraints
         | that would cause this.
        
           | PaulHoule wrote:
           | You can fall back to
           | 
           | https://doc.rust-lang.org/book/ch15-04-rc.html
           | 
           | for things that are too dynamic for borrow checking.
           | 
           | Reference counting works great for the things it is good for
           | but it does get into trouble with cycles and many of us would
           | say that Java's memory allocator/garbage collector is worth
           | what it costs.
           | 
           | My opinion is that automated memory management is a key
           | concept for software reuse and if you look at the problems of
           | the C/C++ world this is pivotal. That is, the range of memory
           | management relationships you might want between a library and
           | its client is pretty wide, I mean sometimes you want a
           | library to make its own buffers, other times you want to hand
           | it an existing buffer, if it is building graphy structures it
           | needs to allocate stuff, do you really want it to use malloc?
           | do you want to pass it your own malloc? etc.
           | 
           | The Java answer of providing a standard answer to allocation
           | and garbage collection makes libraries composable with code
           | in a way that Rust struggles with. (In the end rust libraries
           | have to fall back to RC when complexity gets too high)
        
             | fsloth wrote:
             | Ah, I see. IMO indices are a good way to implement all data
             | structures, gc or no gc. I.e array for storage, indices for
             | links. Referring the array via indices is not slower than
             | following pointers.
        
             | atemerev wrote:
             | You start with Rc, and you end with fn new<'a>(datum:
             | &'static str, arena: &'a TypedArena<Node<'a>>) -> &'a
             | Node<'a> (or Rc<RefCell<T>> and the like).
             | 
             | (and basically implementing your own cycle-aware garbage
             | collector, which again is not my idea of a good time).
        
           | itishappy wrote:
           | The borrow checker really does not like recursive structures.
           | 
           | https://rust-unofficial.github.io/too-many-lists/
        
             | atemerev wrote:
             | And I really like recursion, I come from Scala and Common
             | Lisp.
        
         | TwentyPosts wrote:
         | > Things like dynamically updated graphs are nearly impossible
         | to write in Rust Just curious, have you tried to handroll
         | these, or have you used libraries? (eg. petgraph, though I
         | don't know if it'd suit your usecases.)
         | 
         | I'm a Rust connoisseur, but I'd agree with 'nearly impossible
         | to write', which is why I'd (first of all) try to grab a
         | library, assuming I'm doing anything complicated with graphs.
         | If it's very simple and specific, I'd try to go through the
         | list of possible graph representations (eg. adjacency lists),
         | and pick a suitable one, but never store nodes directly, rather
         | store indices (while the nodes are stored in some sort of
         | vector).
        
           | pimeys wrote:
           | The index/vector strategy is also perfect for basic trees, if
           | you need to have cyclic dependencies between the nodes, and
           | as a cherry on top it serializes super well.
           | 
           | Requires a bit of boilerplate in the beginning, but pays off
           | when actually needing to work with your data.
        
           | atemerev wrote:
           | This is where I ended up (adjacency lists), but yes, it was
           | quite unintuitive (because you need some central entity to
           | manage adjacency lists, and that idea somehow eluded me for a
           | long time). Rust really doesn't like updating references (and
           | anything non-hierarchical in general).
        
       | rvanlaar wrote:
       | I'm working on a project to revive old QTVR movies[1].
       | 
       | After writing a couple of python decoders [2] for movie encodings
       | from the 90's it got old quickly.
       | 
       | As luck would have it, FFmpeg has support for almost all video
       | encodings under the sun. For my usecase I wanted to send one
       | frame per time to FFmpeg to decode.
       | 
       | Luckily I found PyAV[3]. It's a Cython project which binds to
       | FFmpeg.
       | 
       | Which brings me to the article. It reads more like a C bad, rust
       | good. Cythons tag line is: 'Cython gives you the combined power
       | of Python and C`
       | 
       | Just wanting speed and less memory bugs, then rust will fare
       | better. If you want to have the combined power of Python and C
       | then Cython is pretty cool.
       | 
       | [1] https://github.com/rvanlaar/QTVR [2]
       | https://github.com/rvanlaar/QTVR/tree/master/qtvr/decoders [3]
       | https://github.com/PyAV-Org/PyAV
        
       ___________________________________________________________________
       (page generated 2023-09-22 23:01 UTC)