[HN Gopher] Some reasons to avoid Cython
___________________________________________________________________
Some reasons to avoid Cython
Author : EntICOnc
Score : 60 points
Date : 2023-09-21 06:12 UTC (1 days ago)
(HTM) web link (pythonspeed.com)
(TXT) w3m dump (pythonspeed.com)
| jonatron wrote:
| You have to pretend that you can't do from
| libcpp.vector cimport vector
|
| for this blog post to make sense.
| [deleted]
| bjourne wrote:
| Perhaps you don't want to link to STL or can't tolerate some of
| its idiosyncratic semantics? Some platforms Python runs on may
| not even come with STL.
| Reubend wrote:
| I think these criticisms are valid (at least for Cython 2) and
| they are well explained. But I don't see this article mention the
| main benefit of Cython from my experience, which is the speed
| increase you can get from Pythonic code annotated with a few
| types. The suggested alternatives don't really address that same
| use case.
| sdfghswe wrote:
| If you wanna write code in a high-level language that lets to
| optimize individual assembly lines, have a look at Julia.
|
| https://docs.julialang.org/en/v1/stdlib/InteractiveUtils/#In...
| joss82 wrote:
| `ctypes` is part of Python's standard library and allows you to
| directly call C functions from Python code.
|
| It's glorious in its simplicity.
|
| https://docs.python.org/3/library/ctypes.html
| masklinn wrote:
| "Glorious" and "simplicity" are definitely words I've never
| read about ctypes before.
|
| "Wonky" and "terrifying", a lot more. ctypes is... useful, but
| it also uses somewhat strange terminology which can be hard to
| match to C's as it's trying to bridge C and Python. And when
| getting it wrong is an UB, it's pretty frustrating.
| jofer wrote:
| It also doesn't help you use numpy arrays with C functions,
| which is one of the big selling points of cython.
| ali_m wrote:
| You can absolutely use numpy arrays with C functions using
| ctypes. Numpy has `numpy.ctypeslib` which takes care of some
| of the boilerplate involved.
| jofer wrote:
| Yes, you can, but it is easier in cython, and that is one
| of the key selling points of cython.
|
| Nothing wrong with using ctypes. It's the right solution
| for some things. However, cython is generally easier with
| numpy than numpy.ctypeslib
| ali_m wrote:
| I think ctypes shines when it comes to fast prototyping,
| since you can iterate on the python bindings without a
| compilation step. It can also simplify distribution since
| the bindings can be pure python. Where it's arguably not
| so good is performance and maintainability.
| intalentive wrote:
| There are other ways to get performance boosts out of Python,
| like taichi and mypyc.
|
| Rust evangelists are big on safety guarantees and while that's a
| nice feature I'm not convinced it's The Most Important Thing
| Ever.
| stabbles wrote:
| Also, don't include generated C files when distributing the
| package. It's notoriously non-forward compatible with Python, and
| generating C code takes much less time than compiling, which has
| to happen anyways.
| tehsauce wrote:
| "Notice that Rust has a built-in vector class, as well as
| iterators" Pybind with c++ will also automatically convert
| between python and c++ standard types
| zzzeek wrote:
| sure if you use malloc directly in your cython code, you're out
| on a limb. That's not how simple use of cython goes. You can
| apply cython to Python code directly as a code inliner and
| there's little to no risk of C-style issues being introduced.
|
| "two compiler passes being a problem" again this is if you are
| writing big tracts of C code in your cython; not how it's
| normally used.
|
| "No standardized package or build system for dependencies" / "all
| the incentives push you to write everything from scratch in
| Cython, rather than reuse preexisting libraries." - I dont really
| understand this part, is this just a general C/C++ does not
| encourage the use of other native dependencies? We are using
| Cython to write Python code that is more optimized than plain
| Python. our dependencies are normally going to be other Python
| dependencies. If our Cython is to wrap some well known native
| library, then that has to be installed also when the Python wheel
| is installed, and that doesnt change if your Python wheel was
| built from Rust source or C/C++ source.
|
| We use Cython in SQLAlchemy to tremendous effect and excellent
| integration with existing Python code, including being able to
| fall back to pure Python (so that our source install runs even if
| you dont have a compiler or are using Pypy), and we've had zero
| user issues /bugs / anything. We will consider the Rust tools
| once they've had several years of maturity and widespread use
| under their belts (meaning, they'd have to meet or surpass
| cython's popularity), otherwise we aren't going to hoist that on
| our userbase anytime soon.
| cb321 wrote:
| Indeed. It is pretty easy to just write some Cython routine
| against the data pointer & range lifted out of some NumPy array
| and then still let Python do all the memory management for you.
|
| I think Cython is great for just speeding up profiling-revealed
| hot spots. And `cython --annotate` is even a nice helper along
| that path. { In fact, I think gcc should have a similar system
| one could integrate so that you can click-expand the Python to
| get the C and then click again the C to get the assembly. :-) }
| It really makes Python more like the gradually typed system
| Common Lisp always was.
|
| In fact, there was talk back in the very early noughties of
| bundling a precursor of Cython with the Python interpreter
| itself. I was always a bit disappointed that didn't go very
| far. Ah well.
| PaulHoule wrote:
| The article doesn't mention
|
| https://www.pypy.org/
|
| which gives a big boost to plain ordinary Python code,
| particularly branchy and dynamic stuff like
|
| https://rdflib.readthedocs.io/en/stable/
|
| where it made the difference between a system I was working on
| being tolerable and not tolerable.
| srean wrote:
| I love PyPy, It is a marvelous piece of engineering and design.
| Unfortunately, the benefits of PyPy do not translate into one
| of the most important use cases of Python -- those that call
| into 3rd party array, ML, stats and science modules and
| libraries.
| klyrs wrote:
| My favorite feature of LPython is that they have a list of
| other python compilers:
|
| https://lpython.org/
| btwillard wrote:
| In case anyone is wondering, this is essentially a few complaints
| about the basic transpilation/source-to-source approach taken by
| Cython and then some promotion for Rust. It unfortunately mixes
| some general C/C++ complaints in there, too.
| scarygliders wrote:
| Beat me to it.
|
| tl;dr "Don't use Python/Cython or C/C++. Use Rust instead, it's
| better." is basically that article.
| [deleted]
| itamarst wrote:
| Author here: Note that this hasn't yet been updated for Cython 3,
| which does fix or improve some of these (but not the fundamental
| limitation that you're stuck with C or C++).
| klyrs wrote:
| Pardon me, but your implementation is a strawman. Pick on this
| (which doesn't require Cython 3): from
| libcpp.vector cimport vector from libcpp.pair cimport
| pair cdef class PointVec: cdef
| vector[pair[float, float]] vec def
| __init__(self, points: list[tuple[float, float]]):
| self.vec = points def __repr__(self):
| result = ", ".join(f"({x}, {y})" for x, y in self.vec)
| return f"PointVec({result})" def __setitem__(
| self, index, point: tuple[float, float] ):
| cdef pair[float, float] *p = &self.vec.at(index)
| p.first = point[0] p.second = point[1]
| def __getitem__(self, index): return
| self.vec.at(index)
| IshKebab wrote:
| You can't disprove that a language is error prone by
| providing a 20 line example that happens to be correct.
| klyrs wrote:
| Nor can you prove that a language is error prone by
| providing a 40 line example written in an antiquated style
| that deliberately avoids using the safety features at one's
| disposal.
| eigenvalue wrote:
| My new favorite way to write very fast libraries for Python is to
| just use Rust and Maturin:
|
| https://github.com/PyO3/maturin
|
| It basically automates everything for you. If you use it with
| Github actions, it will compile wheels for you on each release
| for every platform and python version you want, and even upload
| them to PyPi (pip) for you. Everything feels very modern and well
| thought out. People really care about good tooling in the Rust
| world.
| IshKebab wrote:
| Maturin is great. I've used it to distribute a Rust program
| that has absolutely nothing to do with Python. It compiled it
| fine and once I had navigated the usual mess of Python to find
| out how to upload packages to PyPi (not to be confused with
| PyPy), it worked pretty well.
|
| I got the idea from CMake, which also has absolutely nothing to
| do with Python but is best installed via Pip. It's a package
| manager that basically works and is basically always available
| on Linux and Mac (among programmers anyway).
|
| One of the few areas of Python that doesn't completely suck.
| abdullahkhalids wrote:
| I am wanting to convert about a 5 functions/100 lines of my
| Python project into Rust. I cobbled together the Maturin
| integration with my project earlier this week, which seems to
| work on some test functions. But I don't know any Rust!
|
| What's the best way to learn enough Rust to do this? My code is
| basically just some Numpy array manipulation, with some
| unfortunate for-loops which can't be vectorized, which is the
| source of the slow speeds.
| aardshark wrote:
| Open up Chat GPT, paste your functions and ask it to convert
| them to rust. Go through them 1 by 1, see if you understand
| and ask questions about anything you don't recognise. Don't
| expect the output to be perfectly logically correct, you will
| have to ensure that yourself.
|
| I've found Chat GPT to be really excellent for quickly
| getting myself up to speed with languages that I'm not
| familiar with.
| eigenvalue wrote:
| Yes, that's my advice as well. Set up vscode with rust
| analyzer and paste any errors it shows back into the same
| ChatGPT conversation and it will debug everything for you.
| eigenvalue wrote:
| You can see how I did something similar in my library here:
|
| https://github.com/Dicklesworthstone/fast_vector_similarity/.
| ..
|
| Basically you use ndarray instead of numpy, try to vectorize
| anything you can, and for the for loops that can't be
| vectorized, you can use rayon to do them in parallel.
| cozzyd wrote:
| this is indeed what the article advocates for
|
| (yes, this particular bit of rust evangelism was not obvious
| from the headline)
| huac wrote:
| I would love some examples of how to do non-trivial data interop
| between Rust and Python. My experience is that PyO3/Maturin is
| excellent when converting between simple datatypes but
| conversions get difficult when there are non-standard types, e.g.
| Python Numpy arrays or Rust ndarrays or whatever other custom
| thing.
|
| Polars seems to have a good model where it uses the Arrow in
| memory format, which has implementations in Python and Rust, and
| makes a lot of the ndarray stuff easier. However, if the Rust
| libraries are not written with Arrow first, they become quite
| hard to work with. For example, there are many libraries written
| with https://github.com/rust-ndarray/ndarray, which is
| challenging to interop with Numpy.
|
| (I am not an expert at all, please correct me if my
| characterizations are wrong!)
| atemerev wrote:
| I _knew_ there will be Rust at the end.
|
| Sorry, not interested. I can't think in Rust. Tried many times.
| Things like dynamically updated graphs are nearly impossible to
| write in Rust, and concurrency is less than pleasant. Fighting
| the borrow checker is not my idea of a good time.
|
| I don't understand why everyone is so fascinated with Rust. I am
| like 3 times less productive there, and there is absolutely no
| pleasure for me in writing Rust code.
|
| I'll stick to Python and C++, thank you.
| slowhadoken wrote:
| I prefer the C, C++, and Python communities too.
| klyrs wrote:
| The borrow checker shouldn't be a showstopper for generic
| graphs. I understand the difficulty with linked lists, trees,
| etc.; but if your graph is represented by a container, I don't
| see the obstacle. (nb: I don't use rust, I also get annoyed by
| people over-selling rust)
| fsloth wrote:
| "dynamically updated graphs are nearly impossible to write in
| Rust"
|
| Can you expand on this? I've taken only a cursory look on Rust
| and it's not obvious to me what are the specific constraints
| that would cause this.
| PaulHoule wrote:
| You can fall back to
|
| https://doc.rust-lang.org/book/ch15-04-rc.html
|
| for things that are too dynamic for borrow checking.
|
| Reference counting works great for the things it is good for
| but it does get into trouble with cycles and many of us would
| say that Java's memory allocator/garbage collector is worth
| what it costs.
|
| My opinion is that automated memory management is a key
| concept for software reuse and if you look at the problems of
| the C/C++ world this is pivotal. That is, the range of memory
| management relationships you might want between a library and
| its client is pretty wide, I mean sometimes you want a
| library to make its own buffers, other times you want to hand
| it an existing buffer, if it is building graphy structures it
| needs to allocate stuff, do you really want it to use malloc?
| do you want to pass it your own malloc? etc.
|
| The Java answer of providing a standard answer to allocation
| and garbage collection makes libraries composable with code
| in a way that Rust struggles with. (In the end rust libraries
| have to fall back to RC when complexity gets too high)
| fsloth wrote:
| Ah, I see. IMO indices are a good way to implement all data
| structures, gc or no gc. I.e array for storage, indices for
| links. Referring the array via indices is not slower than
| following pointers.
| atemerev wrote:
| You start with Rc, and you end with fn new<'a>(datum:
| &'static str, arena: &'a TypedArena<Node<'a>>) -> &'a
| Node<'a> (or Rc<RefCell<T>> and the like).
|
| (and basically implementing your own cycle-aware garbage
| collector, which again is not my idea of a good time).
| itishappy wrote:
| The borrow checker really does not like recursive structures.
|
| https://rust-unofficial.github.io/too-many-lists/
| atemerev wrote:
| And I really like recursion, I come from Scala and Common
| Lisp.
| TwentyPosts wrote:
| > Things like dynamically updated graphs are nearly impossible
| to write in Rust Just curious, have you tried to handroll
| these, or have you used libraries? (eg. petgraph, though I
| don't know if it'd suit your usecases.)
|
| I'm a Rust connoisseur, but I'd agree with 'nearly impossible
| to write', which is why I'd (first of all) try to grab a
| library, assuming I'm doing anything complicated with graphs.
| If it's very simple and specific, I'd try to go through the
| list of possible graph representations (eg. adjacency lists),
| and pick a suitable one, but never store nodes directly, rather
| store indices (while the nodes are stored in some sort of
| vector).
| pimeys wrote:
| The index/vector strategy is also perfect for basic trees, if
| you need to have cyclic dependencies between the nodes, and
| as a cherry on top it serializes super well.
|
| Requires a bit of boilerplate in the beginning, but pays off
| when actually needing to work with your data.
| atemerev wrote:
| This is where I ended up (adjacency lists), but yes, it was
| quite unintuitive (because you need some central entity to
| manage adjacency lists, and that idea somehow eluded me for a
| long time). Rust really doesn't like updating references (and
| anything non-hierarchical in general).
| rvanlaar wrote:
| I'm working on a project to revive old QTVR movies[1].
|
| After writing a couple of python decoders [2] for movie encodings
| from the 90's it got old quickly.
|
| As luck would have it, FFmpeg has support for almost all video
| encodings under the sun. For my usecase I wanted to send one
| frame per time to FFmpeg to decode.
|
| Luckily I found PyAV[3]. It's a Cython project which binds to
| FFmpeg.
|
| Which brings me to the article. It reads more like a C bad, rust
| good. Cythons tag line is: 'Cython gives you the combined power
| of Python and C`
|
| Just wanting speed and less memory bugs, then rust will fare
| better. If you want to have the combined power of Python and C
| then Cython is pretty cool.
|
| [1] https://github.com/rvanlaar/QTVR [2]
| https://github.com/rvanlaar/QTVR/tree/master/qtvr/decoders [3]
| https://github.com/PyAV-Org/PyAV
___________________________________________________________________
(page generated 2023-09-22 23:01 UTC)