[HN Gopher] gh-116167: Allow disabling the GIL
___________________________________________________________________
gh-116167: Allow disabling the GIL
Author : freediver
Score : 348 points
Date : 2024-03-11 16:21 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| helsinki wrote:
| While the title is correct, it is a bit misleading, because
| disabling the GIL breaks the asyncio tests. It's like saying the
| engine can be removed from my car. Sure, it can, but the car
| won't work.
| TylerE wrote:
| This comment feels very disingenuous because non-threaded
| programs do in fact work.
| ollien wrote:
| I mean, you're not wrong, but also it's a huge feat to provide
| a toggle for a major feature like the GIL. Though, if it's just
| asyncio that's broken, perhaps it's not like removing your
| engine, but rather your antilock brakes :)
|
| EDIT:
|
| > [the test synchronous programs] all seem to run fine, and
| very basic threaded programs work, sometimes
|
| Perhaps this is closer to removing the oil pan
| Kranar wrote:
| Well this release will break any code that uses threads. The
| goal of this particular release is to work for thread-free
| programs.
| ollien wrote:
| How do single-threaded programs benefit from a lack of GIL?
| protomikron wrote:
| They don't.
| gtirloni wrote:
| It could remove the locking/unlocking operations.
| Retr0id wrote:
| Doesn't removing the GIL imply adding back new, more
| granular locks?
| fiddlerwoaroof wrote:
| Sort of, but the biased reference counting scheme they're
| using avoids a lot of locks for the common case.
| sapiogram wrote:
| Removing the GIL requires _more_ locking /unlocking
| operations. For single-threaded program, it's a
| performance penalty on average:
| https://peps.python.org/pep-0703/#performance
| Kranar wrote:
| They don't benefit much from a lack of GIL, perhaps a small
| reduction in overhead. This feature is a first step towards
| being able to disable the GIL completely. It is intended to
| be implemented in a very conservative manner, bit by bit
| and so for this first step it should work for thread free
| code.
| kjqgqkejbfefn wrote:
| Disabling the GIL can unlock true multi-core parallelism
| for multi-threaded programs, but this requires code to be
| restructured for safe concurrency, which isn't that
| difficult it seems:
|
| > When we found out about the "nogil" fork of Python it
| took a single person less than half a working day to adjust
| the codebase to use this fork and the results were
| astonishing. Now we can focus on data acquisition system
| development rather than fine-tuning data exchange
| algorithms.
|
| https://peps.python.org/pep-0703/
| kjqgqkejbfefn wrote:
| >We frequently battle issues with the Python GIL at
| DeepMind. In many of our applications, we would like to
| run on the order of 50-100 threads per process. However,
| we often see that even with fewer than 10 threads the GIL
| becomes the bottleneck. To work around this problem, we
| sometimes use subprocesses, but in many cases the inter-
| process communication becomes too big of an overhead. To
| deal with the GIL, we usually end up translating large
| parts of our Python codebase into C++. This is
| undesirable because it makes the code less accessible to
| researchers.
| actionfromafar wrote:
| Maybe they should look in to translating parts of their
| code base to Shedskin Python. It compiles (a subset of)
| Python to C++.
| logicchains wrote:
| How's it different from Cython, which compiles a subset
| of Python to C or C++?
| actionfromafar wrote:
| Shedskin has stricter typing, and about 10-100 times
| performance vs Cython.
| TylerE wrote:
| Speed. Admittedly not quite as much so the way this patch
| is implemented, since it just short circuits the extra
| function calls, doesn't omit them entirely.
| 0cf8612b2e1e wrote:
| Removing the GIL results in slower execution. Without the
| guarantees of single thread action, the interpreter needs
| to utilize more locks under the hood.
| TylerE wrote:
| Not in single threaded code.
| 0cf8612b2e1e wrote:
| Umm, yes it does? For the longest time, Guido's defense
| for the GIL was that all previous efforts resulted in an
| unacceptable hit to single threaded performance.
|
| Read PEP-703
| (https://peps.python.org/pep-0703/#performance) where the
| performance hit is currently 5-8%
| OskarS wrote:
| Really, any code? I thought they were adding fine-grained
| locks to the python objects themselves? Are you saying that
| if I share a python list between two threads and modify it on
| one and read it on the other, I can segfault python?
| Kranar wrote:
| With this particular release, yes it will segfault. But
| down the road what you state is correct, this is just a
| first step towards that goal.
| thibaut_barrere wrote:
| Couldn't it work if each threads only touch thread-specific
| data structures?
| neilkk wrote:
| This isn't correct. TFA said that small threaded programs had
| been run successfully, but that the test suite broke in
| asyncio.
|
| Async I/O and threads are two different things, and either
| can be present in real code without the other.
| wolletd wrote:
| "small threaded programs had been run successfully"
|
| I have ran a lot of programs containing race conditions
| successfully many times until I ran into an issue.
| Kranar wrote:
| Not quite sure what your comment means exactly or how it
| implies what I said is incorrect.
|
| At any rate, test_asyncio contains a lot of tests that
| involve threads and specifically thread safety between
| coroutines and those tests fail. As far as async I/O and
| threads being distinct, I mean sure that is true of a lot
| of features but people mix features together and mixing
| asyncio with threads will not work with this particular
| release.
| znpy wrote:
| > While the title is correct, it is a bit misleading, because
| disabling the GIL breaks the asyncio tests. It's like saying
| the engine can be removed from my car. Sure, it can, but the
| car won't work.
|
| You're not supposed to drive a car that hasn't got out of the
| research and development laboratory either, so there's that.
| petters wrote:
| You also need to compile Python with a special flag activated.
| It's not only an environment variable or a command line option.
| pharrington wrote:
| Being able to remove the engine from my car with the push of a
| button would be a pretty amazing feature!
| jagged-chisel wrote:
| Analogy breaking down and all, but ...
|
| Only as long at it's as easy to put back in
| Retr0id wrote:
| Is there a good overview of the bigger picture here?
| gtirloni wrote:
| https://peps.python.org/pep-0703/
| dang wrote:
| Related:
|
| _Intent to approve PEP 703: making the GIL optional_ -
| https://news.ycombinator.com/item?id=36913328 - July 2023
| (499 comments)
| protomikron wrote:
| Although this is nice, the problems with the GIL are often blown
| out of proportion: people stating that you couldn't do efficient
| (compute-bounded) multi-processing, which was never the case as
| the `multiprocessing` module works just fine.
| liuliu wrote:
| `multiprocessing` works fine for serving HTTP requests or do
| some other subset of embarrassingly-parallel problems.
| skrause wrote:
| > _`multiprocessing` works fine for serving HTTP requests_
|
| Not if you use Windows, then it's a mess. I have a suspicion
| that people who say that the multiprocessing works just fine
| never had to seriously use Python on Windows.
| ptx wrote:
| Why is it a mess? What's wrong with it on Windows?
| skrause wrote:
| * A lack of fork() makes starting new processes slow.
|
| * All Python webservers that somewhat support
| multiprocessing on Windows disable the IOCP asyncio event
| loop when using more than one process (because it breaks
| in random ways), so you're left with the slower select()
| event loop which doesn't support more than 512
| connections.
| colatkinson wrote:
| Adding on to the other comment, multiprocessing is also
| kinda broken on Linux/Mac.
|
| 1. Because global objects are refcounted, CoW effectively
| isn't a thing on Linux. They did add a way to avoid this
| [0], but you have to manually call it once your main
| imports are done.
|
| 2. On Mac, turns out a lot of the system libs aren't
| actually fork-safe [1]. Since these get imported
| inadvertently all the time, Python on Mac actually uses
| `spawn` [2] -- so it's roughly as slow as on Windows.
|
| I haven't worked in Python in a couple years, but
| handling concurrency while supporting the major OSes was
| a goddamn mess and a half.
|
| [0]:
| https://docs.python.org/3.12/library/gc.html#gc.freeze
|
| [1]: https://bugs.python.org/issue33725
|
| [2]: https://docs.python.org/3.12/library/multiprocessing
| .html#co...
| rmbyrro wrote:
| Probably a very small minority of Python codebases run on
| Windows, no? That's my impression. It would explain why so
| many people are unaware of multiprocessing issues on
| Windows. I've never ran any serious Python code on
| windows...
| kroolik wrote:
| Managing processes is more annoying than threads, though. Incl.
| data passing and so forth.
| pillusmany wrote:
| The "ray" library makes running python code on multi core and
| clusters very easy.
| smcl wrote:
| Interesting - looking at their homepage they seem to lean
| heavily into the idea that it's for optimising AI/ML work,
| not multi-process generally.
| pillusmany wrote:
| You can use just ray.core to do multi process.
|
| You can do whatever you want in the workers, I parse
| JSONs and write to sqlite files.
| kroolik wrote:
| Although its great the library helps with multicore Python,
| the existence of such package shouldnt be an excuse not to
| improve the state of things in std python
| vita7777777 wrote:
| On the other hand, this particular argument also gets overused.
| Not all compute-bounded parallel workloads are easily solved by
| dropping into multiprocessing. When you need to share non-
| trivial data structures between the processes you may quickly
| run into un/marshalling issues and inefficiency.
| ynik wrote:
| multiprocessing only works fine when you're working on problems
| that don't require 10+ GB of memory _per process_. Once you
| have significant memory usage, you really need to find a way to
| share that memory across multiple CPU cores. For non-trivial
| data structures partly implemented in C++ (as optimization,
| because pure python would be too slow), that means messing with
| allocators and shared memory. Such GIL-workarounds have easily
| cost our company several man-years of engineer time, and we
| still have a bunch of embarrassingly parallel stuff that we
| still cannot parallelize due to GIL and not yet supporting
| shared memory allocation for that stuff.
|
| Once the Python ecosystem supports either subinterpreters or
| nogil, we'll happily migrate to those and get rid of our hacky
| interprocess code.
|
| Subinterpreters with independent GILs, released with 3.12,
| theoretically solve our problems but practically are not yet
| usable, as none of Cython/pybind11/nanobind support them yet.
| In comparison, nogil feels like it'll be easier to support.
| pillusmany wrote:
| "Ray" can share python objects memory between processes. It's
| also much easier to use than multi processing.
| ptx wrote:
| How does that work? I'm not familiar with Ray, but I'm
| assuming you might be referring to actors [1]? Isn't that
| basically the same idea as multiprocessing's Managers [2],
| which also allow client processes to manipulate a remote
| object through message-passing? (See also DCOM.)
|
| [1] https://docs.ray.io/en/latest/ray-
| core/walkthrough.html#call...
|
| [2] https://docs.python.org/3/library/multiprocessing.html#
| manag...
| pillusmany wrote:
| Shared memory:
|
| https://docs.ray.io/en/latest/ray-core/objects.html
| ptx wrote:
| According to the docs, those shared memory objects have
| significant limitations: they are immutable and only
| support numpy arrays (or must be deserialized).
|
| Sharing arrays of numbers is supported in multiprocessing
| as well: https://docs.python.org/3/library/multiprocessin
| g.html#shari...
| ebiester wrote:
| And I guess what I don't understand is why people choose
| Python for these use cases. I am not in the "Rustify"
| everything camp, but Go + C, Java + JNI, Rust, and C++ all
| seem like more suitable solutions.
| oivey wrote:
| Notably, all of those are static languages and none of them
| have array types as nice as PyTorch or NumPy, among many
| other packages in the Python ecosystem. Those two facts are
| likely closely related.
| samatman wrote:
| If only there were a dynamic language which performs
| comparably to C and Fortran, and was specifically
| designed to have excellent array processing facilities.
|
| Unfortunately, the closest thing we have to that is
| Julia, which fails to meet none of the requirements.
| Alas.
| rmbyrro wrote:
| If only there was a car that could fly, but was still as
| easy and cheap to buy and maintain :D
| abdullahkhalids wrote:
| Python is just the more popular language. Julia array
| manipulation is mostly better (better syntax, better
| integration, larger standard library) or as good as
| python. Julia is also dynamically typed. It is also
| faster than Python, except for the jit issues.
| oivey wrote:
| Preaching to the choir here.
|
| Julia's threading API is really nice. One deficiency is
| that it can be tricky to maintain type stability across
| tasks / fetches.
| znpy wrote:
| > It is also faster than Python, except for the jit
| issues.
|
| I was intrigued by Julia a while ago, but didn't have
| time to properly learn it.
|
| So just out of curiosity: what's the issues with jit and
| Julia ?
| jakobnissen wrote:
| Julia's JIT compiles code when its first executed, so
| Julia has a noticable delay from you start the program
| and until it starts running. This is anywhere from a few
| hundred milliseconds for small scripts, to tens of
| seconds or even minutes for large packages.
| cjalmeida wrote:
| The "issue" is Julia is not Just-in-Time, but a "Just-
| Ahead-of-Time" language. This means code is compiled
| before getting executed, and this can get expensive for
| interactive use.
|
| The famous "Time To First Plot" problem was about taking
| several minutes to do something like `using Plots;
| Plots.plot(sin)`.
|
| But to be fair recent Julia releases improved a lot of
| it, the code above in Julia 1.10 takes 1.5s on my 3-year
| old laptop
| zamadatix wrote:
| People choose Python the use case, regardless what that is,
| because it's quick and easy to work with. When Python can't
| realistically be extended to a use case then it's lamented,
| when it can it's celebrated. Even Go, while probably the
| friendliest of that buch when it comes to parallel work, is
| on a different level.
| esafak wrote:
| Why do people use python for anything beyond glue code?
| Because it took off, and machine learning and data science
| now rely on it.
|
| I think Python is a terrible language that exemplifies the
| maxim "worse is better".
|
| https://en.wikipedia.org/wiki/Worse_is_better
| rmbyrro wrote:
| Some speculate that universities adopted it as
| introductory language for its expressiveness and flat
| learning curve. Scientific / research projects in those
| unis started picking Python, since all students already
| knew it. And now we're here
| spprashant wrote:
| I have no idea if this is verifiably true in a broad
| sense, but I work at the university and this is
| definitely the case. PhD students are predominantly using
| Python to develop models across domains - transportation,
| finance, social sciences etc. They then transition to
| industry, continuing to use Python for prototyping.
| KaiserPro wrote:
| > but Go + C, Java + JNI, Rust, and C++ all seem like more
| suitable solutions.
|
| apart from go (maybe java) those are all "scary" languages
| that require a bunch of engineering to get to the point
| that you can prototype.
|
| even then you can normally pybind the bits that are compute
| bound.
|
| If Microsoft had been better back in the say, then c#
| should have been the goto language of choice. It has the
| best tradeoff of speed/handholding/rapid prototyping. Its
| also statically typed, unless you tell it to not be.
| jcranmer wrote:
| > as the `multiprocessing` module works just fine.
|
| Something that tripped me up when I last did `multiprocessing`
| was that communication between the processes requires
| marshaling all the data into a binary format to be unmarshaled
| on the other side; if you're dealing with 100s of MB of data or
| more, that can be quite some significant expense.
| dvas wrote:
| Extra links for the no gil work for anyone else curious about
| this [0], [1].
|
| [0] Multithreaded Python without the GIL
| https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsD...
|
| [1] Github repo https://github.com/colesbury/nogil
| tentacleuno wrote:
| Further context on noGIL in general:
| https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
| Dowwie wrote:
| We are now one step closer to PythonOTP
| behnamoh wrote:
| I've been programming in Python for over 6 years now and every
| week I learn something new. But recently I've been thinking about
| moving to a more capable language with proper concurrency for
| backend API requests (FastAPI sucked).
|
| I also want types, so Elixir is not in the picture. I dabbled in
| Rust a bit. Although I was able to get the hang of things and
| build a CLI tool pretty quickly, I'm worried I'll have to deal
| with numerous quirks later if I keep using Rust (like numerous
| string types). Is that something to be worried about if all I
| want from Rust is Python+Types+Concurrency?
| jondwillis wrote:
| Swift or Kotlin might be what you're looking for but nobody
| uses Swift for backend really, and I'm not sure about Kotlin.
| kuschku wrote:
| Kotlin has basically replaced java for many spring shops.
| It's really common in backend nowadays.
| imbusy111 wrote:
| Sounds like you want Julia. It looks like Python, but also has,
| what you ask for.
|
| You can even run Python from Julia, so that alleviates the
| problem with a lack of libraries somewhat.
| sparks1970 wrote:
| Golang? Pretty easy to pick up coming from Python and proper
| concurrency.
| scubbo wrote:
| Python supports types! https://www.mypy-lang.org/
| cmeacham98 wrote:
| This seems a bit like saying "JavaScript supports types!"
| because of typescript.
| ralphist wrote:
| It's not a separate language, you can just start typing
| your programs right now.
| nextlevelwizard wrote:
| except nothing enforces your types at run time, you can
| have typi hints all you want and everyone else cn ignore
| them
| aunderscored wrote:
| Same exists with laws. And in most languages if I want to
| avoid the type system and hand you a goldfish instead of
| an int, I can. It just may take more effort. Other
| language will blow up too if you hand them strange
| things. Just like python those languages have ways to
| verify you're not passing goldfish, You just may need
| more or less effort to use them
| wiseowise wrote:
| Both true. What's wrong with them?
| posix_monad wrote:
| Types are way more mainstream in the JS ecosystem than they
| are in the Python ecosystem. If you want a "scripty"
| language with types, then TypeScript is a reasonable
| choice.
| nick238 wrote:
| Python actually has type safety though, as you can't do
| `'1' + 1` like in JS (not that a linter wouldn't scream at
| you). If I hear another "I compile <insert language> so I
| know it will work, but you can't do that in Python" I'll
| lose it. Having the compiler not complain that the types
| match is not effing "testing".
| scubbo wrote:
| > Having the compiler not complain that the types match
| is not effing "testing".
|
| It absolutely is - it's just testing at a _very_ low
| level of correctness, and is not sufficient for testing
| actual high-level functionality.
| rightbyte wrote:
| > Python+Types+Concurrency
|
| Sounds like Groovy. But I wouldn't recommend it. Also the
| career padding hype is gone.
| willcipriano wrote:
| What sucked about FastAPI?
| behnamoh wrote:
| Not fast enough! I used it to call llama.cpp server but it
| would crash if requests were "too fast". Calling the
| llama.cpp server directly solved the issue.
| dekhn wrote:
| Honestly, that doesn't sound right. I'm curious what you
| mean by crash thoguh.
| ralphist wrote:
| Did you get "connection reset by peer" when you sent a bit
| too many requests perchance? I've never found the source of
| that in my programs. There's no server logging about it,
| connections are just rejected. None of the docs talk about
| this.
| behnamoh wrote:
| I don't remember the exact error name but the FastAPI
| server would just freeze.
| dekhn wrote:
| a freeze is not a crash.
| dumdedum wrote:
| This sounds like a layer 8 problem
| KaiserPro wrote:
| Interesting, I've used fastAPI to serve many thousands of
| requests a second (per process) for a production system.
| How were you buffering the requests?
| docc wrote:
| i moved from python timo Rust and i really like it. moving to a
| staticly typed language is bit of a mind fuck but cargo is
| amazing and so is the speed and portability of rust.
| spprashant wrote:
| As someone in a similar boat, just pick Golang. People often
| dislike the basic syntax, explicit error checking and the lack
| of algebraic data types. I did. Rust just seems like it offers
| so much more and you fear missing out on something really cool.
|
| But once you get over it, you realize Golang has a good type
| system, concurrency model, package manager that's not pip, fast
| compile times, and static binaries. For most cases it will also
| offer great performance.
|
| It has everything you need to build APIs, CLI tools, web
| servers, microservices - pieces which will form the building
| blocks of your software infrastructure. I have heard numerous
| stories of people being productive in Go in a few days,
| sometimes even hours.
|
| If Python is 0 quality of life and Rust is a 100, Golang gets
| you all they way up to 80-90. That last bit is something you
| might never need.
|
| Rust is a great language and something I hope to be in
| proficient someday, but I ll save it for where I actually need
| that last microsecond of performance.
| Hugsun wrote:
| It will be very exciting to see how much faster they'll be able
| to make vanilla python. The value proposition is being challenged
| by the plethora of tools aiming to alleviate those issues. Speed
| improvers like Mojo, pytorch, triton, numba, taichi come to mind.
|
| There are so many different attempts at solving this problem that
| that the last time I wanted to try one of them, I found myself
| overwhelmed with options. I chose taichi which is pretty fun and
| easy to use, although somewhat limited in scope.
| nerpderp82 wrote:
| Mojo should be viewed as an attack on the Python ecosystem due
| to it being a superset. It can _consume_ Python, but it itself
| is not Python.
|
| Taichi is really underrated, it works across all platforms
| (including Metal), has tons of examples and the code is easy to
| write. And lastly, it integrates with the ecosystem and doesn't
| displace it.
|
| https://github.com/taichi-dev
|
| great demo reel of what Taichi can do,
| https://www.youtube.com/watch?v=oXRJoQGCYFg
|
| https://www.youtube.com/watch?v=WNh4Q7-OSJs
|
| https://www.taichi-lang.org/
| objektif wrote:
| Never heard of taichi before looks promising. Do you know any
| shop that uses it for prod code?
| fragmede wrote:
| ETH Zurich is using it for their physics sim courses,
| University of Utah is using it for simulations (SIGGRAPH
| 2022), OPPO (they make smart devices running Android),
| Kuaishou uses it for liquid and gas simulation on GPUs.
| Lots of GPU accelerated sim stuff.
|
| https://www.taichi-lang.org/
|
| https://www.researchgate.net/publication/337118128_Taichi_a
| _...
|
| https://github.com/taichi-dev/taichi
| KerrAvon wrote:
| I think "attack" is a bit much; C++ isn't an attack on C.
| Kranar wrote:
| While Ken Thompson never used the word attack, he certainly
| didn't have a positive opinion of the language or of Bjarne
| Stroustrup either in terms of his technical contributions
| or his handling of C++ adoption:
|
| https://gigamonkeys.wordpress.com/2009/10/16/coders-c-
| plus-p...
| hn_throwaway_99 wrote:
| Thanks for posting that, I thought it was a great read
| (as someone who last used C++ probably about 25 years
| ago...)
|
| Given that so many of the criticisms were about C++ being
| over-complicated, I do worry about languages just
| becoming more and more difficult over time as everyone
| wants their pet feature added, but due to backwards-
| compatibility concerns old/obsolete features are rarely
| removed. For example, take Java. I think that a _ton_ of
| goodness has been added to Java over the decades, and for
| people who have been working with it throughout, it 's
| great. But it feels like the learning curve for someone
| just getting involved with Java would be really steep,
| not just because there is just a ton of stuff, but
| because without having the context of the history and how
| things were added over time (usually with an eye towards
| backwards-compatibility) it feels like it would be hard
| to wrap your head around everything. If you're writing
| your own new program that's not really a problem as you
| can just stick to what you know, but if you're getting
| into an existing codebase that could use lots of
| different features it feels like it could be daunting.
|
| It's been quite a while since I've programmed in Java, so
| I'm just speculating, but would be curious how other
| folks relatively new to the language in production
| environments find the learning curve.
| dimitrios1 wrote:
| I was doing primarily go development since it was first
| released up until a few years ago when the pandemic
| allowed me the opportunity to move into a full time
| remote gig doing primarily Java development, so I can
| answer this as I hadn't done Java at that point for over
| 10 years, so I felt completely new (what Java I did
| before that, I was mostly trying to _not_ use Java by
| using play framework or jruby on rails)
|
| As someone in the boat you mentioned (sort of) the short
| answer is modern Java development for 90% of tasks is not
| complicated at all: it's very much like any programming
| language used in a bizdev/corp environment -- you are
| mostly using a framework and a bunch of DSLs. Almost
| everyone uses Intellij and Gradle for IDE and build, and
| Junit5 or Spock for unit testing. I passed a technical
| interview mostly on Spring Framework concepts knowing
| almost nothing about it, nor having ever used it in
| production by simply just having the documentation open
| while I was being interviewed so I could look up the
| answers. Any language that is popular is going to have
| frameworks with decent documentation that help you be
| productive quickly, so I just jumped in doing Spring. The
| java stuff came as needed, or I referenced something like
| Effective Java (great book), or a Baeldung article. Java
| world has made some great strides since the 2000's and
| early 2010s of XML chaos. It took a while, but I feel
| like it's in a really good spot and getting better.
|
| As an aside, if it hasn't been mentioned to you before,
| if you like simplicity in a language, but still
| incredibly productive, you might enjoy Go.
| nerpderp82 wrote:
| I didn't mention C++ at all. Was that my argument? This
| thread is about Python, the GIL. Mojo was brought up as a
| way to speed up Python code.
|
| C++ predates on C in a similar way to how Mojo predates on
| Python. At least C++ has extern C.
|
| https://docs.modular.com/mojo/manual/python/#call-mojo-
| from-...
|
| > As shown above, you can call out to Python modules from
| Mojo. However, there's currently no way to do the reverse--
| import Mojo modules from Python or call Mojo functions from
| Python.
|
| One way street. Classic commons harvesting.
| android42 wrote:
| I wasn't sure whether to agree with this or not, so I
| finally took a slightly closer look at Mojo just now.
|
| This depends on how they license it going forward, and
| whether they make it open, or use being a superset as a way
| to capture then trap python users in their ecosystem, and I
| don't think we have a certain answer which path they'll
| take yet.
|
| The way they let you mix python compatible code with their
| similar but more performant code [1] looks interesting and
| provides a nice path for gradual migration and performance
| improvements. It looks like one of the ways they do this is
| by letting you define functions that only use typed
| variables which is something I would like to see make its
| way back to CPython someday (that is optionally enforcing
| typing in modules and getting some performance gains out of
| it).
|
| [1] https://en.wikipedia.org/wiki/Mojo_(programming_languag
| e)#Pr...
| its-summertime wrote:
| Cython is also a superset, is Cython also guilty of such
| crimes?
| misoukrane wrote:
| Finally, looking forward to the benchmarks of many tools!
| xuhu wrote:
| PEP-703 predicted in June 2023 an overhead of 15% when running
| _with_ NoGIL: https://discuss.python.org/t/pep-703-making-the-
| global-inter...
| rnmmrnm wrote:
| More than seeing it in main, I'm happy for the "python thread
| slow" meme officially going away now.
| scubbo wrote:
| I wish I had your optimism. Thoughtless bandwagon-y "criticism"
| is extraordinarily persistent.
| samatman wrote:
| There's no need to pretend Python has virtues which it lacks.
| It's not a fast language. It's fast enough for many purposes,
| sure, but it isn't fast, and this work is unlikely to change
| that. Fast _er_ , sure, and that's great.
| markhahn wrote:
| You seem to be implying that there is something inherently
| slow to Python. What?
|
| This topic is an example: a detail of one particular
| implementation, since GIL is definitely not inherent to the
| language. Just the usual worry about looseness of types?
| oivey wrote:
| Python is inherently slow. That's why people tend to
| rewrite bits that need high performance in C/C++.
| Removing the GIL is a massively welcome change, but it
| isn't going to make C extensions go away.
| doctorpangloss wrote:
| There are worse hills to die on than this. But the Python
| ecosystem is very slow. It's a cultural thing.
|
| The biggest impact would be completely redoing package
| discovery. Not in some straightforward sense of "what if
| PyPi showed you a Performance Measurement?" No, that's
| symptomatic of the same problem: harebrained and
| simplistic stuff for the masses.
|
| But who's going to get rid of PyPi? Conda tried and it
| sucks, it doesn't change anything fundamental, they're
| too small and poor to matter.
|
| Meta should run its own package index and focus on
| setuptools. This is a decision PyTorch has already taken,
| maybe the most exciting package in Python today, and for
| all the headaches that decision causes, look: torch
| "won," it is high performance Python with a vibrant high
| performance ecosystem.
|
| These same problems exist in NPM too. It isn't an
| engineering or language problem. Poetry and Conda are not
| solutions, they're symptoms. There are already too many
| ideas. The ecosystem already has too much manic energy
| spread way too thinly.
|
| Golang has "fixed" this problem as well as it could for
| non-commercial communities.
| pphysch wrote:
| The "Python ecosystem" includes packages like numpy,
| pytorch & derivatives which are responsible for a large
| chunk of HPC and research computing nowadays.
|
| Or did you mean to say the "Python language"?
| doctorpangloss wrote:
| > The "Python ecosystem" includes packages like numpy,
| pytorch & derivatives which are responsible for a large
| chunk of HPC and research computing nowadays.
|
| The "& derivatives" part is the problem! Torch does not
| have derivatives. It won. You just use it and its
| extensions, and you're done. That is what people use to
| do exciting stuff in Python.
|
| It's the manic developers writing manic derivatives that
| make the Python ecosystem shitty. I mean I hate ragging
| on those guys, because they're really nice people who
| care a lot about X, but if only they could focus all
| their energy to work together! Python has like 20 ideas
| for accelerated computing. They all abruptly stopped
| mattering because of Torch. If the numba and numpy and
| scikit-learn and polars and pandas and... all those
| people, if they would focus on working on one package
| together, instead of reinventing the same thing over and
| over again - high level cross compilers or an HPC DSL or
| whatever, the ecosystem would be so much nicer and
| performance would be better.
|
| This idea that it's a million little ideas incubating and
| flourishing, it's cheerful and aesthetically pleasing but
| it isn't the truth. CUDA has been around for a long time,
| and it was obviously the fastest per dollar & watt HPC
| approach throughout its whole lifetime, so most of those
| little flourishing ideas were DOA. They should have all
| focused on Torch from the beginning instead of getting
| caught up in little manic compiler projects. We have
| enough compilers and languages and DSLs. I don't want
| another DataFrame DSL!
|
| I see this in new, influential Python projects made even
| now, in 2024. Library authors are always, constantly,
| reinventing the wheel because the development is driven
| by one person's manic energy more than anything else.
| Just go on GitHub and look how many packages are written
| by one person. GitHub & Git, PyPi are just not adequate
| ways to coordinate the energies of these manic developers
| on a single valuable task. They don't merge PRs, they
| stake out pleasing names on PyPi, and they complain
| relentlessly about other people's stuff. It's NIH
| syndrome on the 1m+ repository scale.
| fragmede wrote:
| yeah. like xkcd 927 to the nth degree.
| sneed_chucker wrote:
| CPython is slow. That's not really something you can
| dispute.
|
| It is a non-optimizing bytecode interpreter and it makes
| no use of JIT compilation.
|
| JavaScript with V8 or any other modern JIT JS engine runs
| circles around it.
|
| Go, Java, and C# are an order of magnitude faster but
| they have type systems that make optimizing compilation
| much easier.
|
| There's no language-inherent reason why Python can't be
| at least as fast as JavaScript.
| mixmastamyk wrote:
| I've read that it can't even be as fast as JS, because
| everything is monkey-patchable at runtime. Maybe they can
| optimize for that when it doesn't happen, but remains to
| be seen.
| sneed_chucker wrote:
| I've heard similar claims but I don't think it's true.
|
| JavaScript is just as monkey-patchable. You can reassign
| class methods at runtime. You can even reassign an
| object's prototype.
|
| Existing Python JIT runtimes and compilers are already
| pretty fast.
| rmbyrro wrote:
| Although true, it doesn't mean they can't improve its
| performance.
|
| Working with threads is a pain in Python. If you want to
| spawn +10-20 threads in a process, it can quickly become
| way slower than running a single thread.
|
| Removing the GIL and refactoring some of the core will
| unlock levels of concurrency that are currently not
| feasible with Python. And that's a great deal, in my
| opinion. Well worth the trouble they're going through.
| bb88 wrote:
| Working with threads is a pain regardless of which
| language you use.
|
| Some might say: "Use Go!" Alas:
| https://songlh.github.io/paper/go-study.pdf
|
| After a couple decades of coding, I can say that
| threading is better if it's tightly controlled, limited
| to usages of tight parallelism of an algorithm.
|
| Where it doesn't work is in a generic worker pool where
| you need to put mutex locks around everything -- and then
| prod randomly deadlocks in ways the developer boxes can't
| recreate.
| heinrich5991 wrote:
| Concurrency with rayon in Rust isn't pain, I'd say. It's
| basically hidden away from the user.
| jcranmer wrote:
| > After a couple decades of coding, I can say that
| threading is better if it's tightly controlled, limited
| to usages of tight parallelism of an algorithm.
|
| This may be a case of violent agreement, but there are a
| few clear cases where multithreading is easily viable.
| The best case is some sort of parallel-for construct,
| even if you include parallel reductions, although there
| may need to be some smarts around how to do the reduction
| (e.g., different methods for reduce-within-thread versus
| reduce-across-thread). You can extend this to
| heterogeneous parallel computations, a general,
| structured fork-join form of concurrency. But in both
| cases, you essentially have to forbid inter-thread
| communication between the fork and the join parameters.
| There's another case you might be able to make work,
| where you have a thread act as an internal server that
| runs all requests to completion before attempting to take
| on more work.
|
| What the paper you link to is pointing out, in short, is
| that message passing doesn't necessarily free you from
| the burden of shared-mutable-state-is-bad concurrency.
| The underlying problem is largely that communication
| between different threads (or even tasks within a thread)
| can only safely occur at a limited number of safe slots,
| and any communication outside of that is risky, be it an
| atomic RMW access, a mutex lock, or waiting on a message
| in a channel.
| bmitc wrote:
| > Working with threads is a pain regardless of which
| language you use.
|
| That's not true at all. F#, Elixir, Erlang, LabVIEW, and
| several other languages make it _very easy_. Python makes
| it incredibly tough.
| amethyst wrote:
| > Python makes it incredibly tough.
|
| I disagree, Python makes it incredibly easy to work with
| threads in many different ways. It just doesn't make
| threads _faster_.
| rmbyrro wrote:
| The whole purpose of threads is to improve overall speed
| of execution. Unless you're working with a very small
| number of threads (single digits), that's a very hard to
| achieve goal in Python. I wouldn't count this as easy to
| use. It's easy to program, yes, but not easy to get
| working with reasonably acceptable performance.
| bmitc wrote:
| In what way? Threading, asyncio, tasks, event loops,
| multiprocessing, etc. are all complicated and interact
| poorly if at all. In other languages, these are
| effectively the same thing, lighter weight, _and_
| actually use multicore.
|
| If I launch 50 threads with run away while loops in
| Python, it takes minutes to laumch and barely works
| after. I can run hundreds of thousands and even millions
| of runaway processes in Elixir/Erlang that launch very
| fast and processes keep chugging along just fine.
| rmbyrro wrote:
| It's not such a big pain in every language. And certainly
| not as hard to get working with acceptable performance in
| many languages.
|
| Even if you have zero shared resources, zero mutexes, no
| communication whatsoever between threads, it's a huge
| pain in Python if you need +10-ish threads going. And
| many times the GIL is the bottleneck.
| KaiserPro wrote:
| > If you want to spawn +10-20 threads in a process, it
| can quickly become way slower than running a single
| thread.
|
| as you know thats mostly threads in general. Any
| optimisation has a drawback so you need to choose wisely.
|
| I once made a horror of a thing that synced S3 with
| another S3, but not quite object store. I needed to move
| millions of files, but on the S3 like store every
| metadata operation took 3 seconds.
|
| So I started with async (pro tip: its never a good idea
| to use async. its basically gotos with two dimensions of
| surprise: 1 when the function returns, 2 when you get an
| exception ) I then moved to threads, which got a tiny bit
| extra performance, but much easier debugability. Then I
| moved to multiprocess pools of threads (fuck yeah super
| fast) but then I started hitting network IO limits.
|
| So then I busted out to airflow like system with
| operators spawning 10 processes with 500 threads.
|
| it wasnt very memory efficient, but it moved many
| thousands of files a second.
| scubbo wrote:
| This is entirely fair, and I wish I'd been a little less
| grumpy in my initial reply (I assign some blame to just
| getting over an illness). Thank you for the gentle
| correction!
|
| That said - I think it's fair to be irritated by people who
| write Python off as entirely useless because it is not _the
| fastest_ language. As you rightly say - it's fast enough
| for many purposes. It does bother me to see Python
| immediately counted out of discussions because of its speed
| when the app in question is extremely insensitive to speed.
| wongarsu wrote:
| In some ways the weakness even was a virtue. Because
| Python threads are slow Python has incredible toolsets
| for multiprocess communication, task queues, job systems,
| etc.
| nick238 wrote:
| Maybe it'll shut up "architects" who hack up a toy example
| in <new fast language hotness>, drop it on a team to add
| all the actual features, tests, deployment strategy, and
| maintain, and fly away to swoop and poop on someone else.
| Gee thanks for your insight; this API serves maybe 1
| request a second, tops. Glad we optimized for SPEEEEEED of
| service over speed of development.
| fragmede wrote:
| "Faster, sure" seems unnecessarily dismissive. That's the
| whole point of all this work.
| bmitc wrote:
| It isn't thoughtless. I'm working in Python after having come
| from more designed languages, and concurrency in Python is an
| absolute nightmare. It feels like using a language from the
| 60s. An effectively single threaded language in 2024! That's
| really astonishing.
| nextlevelwizard wrote:
| most software doesnt need multi threading. most times
| people cry about pythons performance then write trivial
| shit programs that take milliseconds to run in python as
| well
| bmitc wrote:
| Nearly every time I've interactive with Python, its
| execution speed is absolutely an issue.
| scubbo wrote:
| If your criticism isn't thoughtless, then that's not what
| I'm complaining about. Specifically, I'm annoyed about
| people who _just_ say "Python isn't fast enough, therefore
| it's not suitable to our use-case", when their use-case
| doesn't require significant speed or concurrency. If you
| thoughtfully discount Python as being unsuitable for a use-
| case that it's _actually_ unsuitable for, then good luck to
| you!
| znpy wrote:
| I still hear the "java slow" meme from time to time... Memes
| are slow to die, sadly. Some people just won't catch on with
| the fact that java has had just-in-time compilation for like 15
| years now (it was one of the first major platforms to get
| that), has had a fully concurrent garbage collector for a
| number of releases (zgc since java 11) and can be slimmed down
| a lot (jlink).
|
| I work on low-latency stuff and we routinely get server-side
| latencies in the order of single to low double-digit
| microseconds of latency.
|
| If python ever becomes fully concurrent (python threads being
| free of any kind of GIL) we'll see the "python slow" meme for a
| number of years... Also doesn't help that python gets updated
| very very slowly in the industry (although things are getting
| better).
| RyEgswuCsn wrote:
| I feel Java deserves better. When Python finally gets true
| thread concurrency, JIT (mamba and the like), comprehensive
| static analysis (type hints), and some sophisticated GC, and
| better performance, people will realise Java have had them
| all this time.
| thorncorona wrote:
| GraalVM is a pretty magical tool
| PhilipRoman wrote:
| I think java being slow has less to do with the
| implementation (which is pretty good) and more to do with the
| culture of overengineering (including in the standard
| library). Everything creates objects (which the JIT cannot
| fully eliminate, escape analysis is not magic), cache usage
| is abysmal. Framework writers do their best to defeat the
| compiler by abusing reflection. And all these abstractions
| are far from zero cost, which is why even the JDK has to have
| hardcoded special cases for Streams of primitives and
| ByteBuffers.
|
| Of course, if you have a simple fastpath you can make it fast
| in any language with a JIT, latency is also generally not an
| issue anymore, credit where credit is due - java GCs are
| light years ahead of everything else.
|
| Regarding jlink - my main complaint is that everything
| requires java.base which already is 175M. And thats not
| counting the VM, etc. But I don't actively work with java
| anymore so please correct me if there is a way to get smaller
| images.
| IshKebab wrote:
| Well, technically it still won't be able to use the full power
| of threads in many situations because (I assume) it doesn't
| have shared memory. It'll presumably be like Web Workers /
| isolates, so Go, C++, Rust, Zig, etc. will still have a
| fundamental advantage for most applications even ignoring
| Python's inherent slowness.
|
| Probably the right design though.
| Difwif wrote:
| Why would you think it's not shared memory? Maybe I'm wrong
| here but by default Python's existing threading
| implementation uses shared memory.
|
| AFAIK we're just talking about removing the global
| interpreter lock. I'm pretty sure the threading library uses
| system threads. So running without the GIL means actual
| parallelism across system threads with shared memory access.
| IshKebab wrote:
| Yeah I think you're right actually. Seems like they do per-
| object locking instead.
| vlovich123 wrote:
| Does anyone know why the biased reference counting approach
| described in https://peps.python.org/pep-0703/ just has a single
| thread affinity requiring atomic increments/decrements when
| accessed from a different thread? What I've seen other
| implementations do (e.g. various Rust crates implementing biased
| reference counting) is that you only increment atomically when
| moving to a new thread & then that thread does non-atomic
| increments/decrements until 0 is hit again and then an atomic
| decrement is done. Is it because it's being retrofitted into an
| existing system where you have a single PyObject & can't exchange
| to point to a new thread-local object?
| colesbury wrote:
| We could implement ownership transfer in CPython in the future,
| but it's a bit trickier. In Rust, "move" to transfer ownership
| is part of the language, but there isn't an equivalent in C or
| Python, so it's difficult to determine when to transfer
| ownership and which thread should be the new owner. We could
| use heuristics: we might give up or transfer ownership when
| putting an object in a queue.SimpleQueue, but even there it's
| hard to know ahead of time which thread will "get" the enqueued
| object.
|
| I think the performance benefit would also be small. Many
| objects are only accessed by a single thread, some objects are
| accessed by many threads, but few objects are exclusively
| accessed by one thread and then exclusively accessed by a
| different thread.
| vlovich123 wrote:
| I think you would do it on first access - "if new thread,
| increment atomic & exchange for a new object reference that
| has the local thread id affinity". That way you don't care
| about whether an object actually has thread affinity or not
| and you solve the "accessed by many threads" piece. But
| thanks for answering - I figured complexity was the reason a
| simpler choice was made to start with.
| orf wrote:
| But this would now make the reference count increment
| require a conditional? It's a very hot path, and this would
| cause a slowdown for single-threaded Python code.
| vlovich123 wrote:
| It's already taking a conditional. Take a look at the
| PEP: if (op->ob_tid == _Py_ThreadId())
| op->ob_ref_local = new_local; else
| atomic_add(&op->ob_ref_shared, 1 << _Py_SHARED_SHIFT);
|
| So you're either getting a correct branch prediction or
| an atomic operation which will dominate the overhead of
| the branch anyway. All this is saying is in the else
| branch where you're doing the atomic add, create a new
| PythonObj instance that has `ob_tid` equal to
| `_Py_ThreadId`. This presumes that Py_INCREF changes the
| return type from void to `PythonObj*` and this propagates
| out so that futher on-thread references use the newer
| affinity (branch condition is always taken to the non-
| atomic add instead of the atomic one). It's easier said
| than done and there may be technical reasons why that's
| difficult / not possible, but worth exploring eventually
| so that access by multiple threads of a single object
| doesn't degrade to taking atomic reference counts
| constantly.
|
| https://peps.python.org/pep-0703/
| karmasimida wrote:
| This is exciting, can't wait
| tommiegannert wrote:
| First I read the news of tranched bread, and now this?! What a
| time!
|
| I was a bit disheartened when the Unladen Swallow project [1]
| fizzled out. Great to see Python back on the core optimization
| track.
|
| [1] https://en.wikipedia.org/wiki/CPython#Unladen_Swallow
| fragmede wrote:
| tranched bread?
| KMnO4 wrote:
| I could be wrong, but I think it's a clever alternative to
| the expression "best thing since sliced bread".
| fragmede wrote:
| ahahah. I was thinking that it was a new python library or
| something that I hadn't heard of and was coming up short
| with Google.
|
| "tranced bread" is a fun name for some sort of library that
| breaks up files into pieces for better resilience for
| sending, like over BitTorrent.
| maest wrote:
| GP is joking that this is the best thing since sliced
| (tranched) bread.
| whalesalad wrote:
| A great video tour of the GIL -
| https://www.youtube.com/watch?v=Obt-vMVdM8s
| tiffanyh wrote:
| ELI5
|
| I get in concept what the GIL is.
|
| But what's the impact of this change?
|
| Packages will now break, for the hope of better overall
| performance?
| nextaccountic wrote:
| If any package depends on the GIL, it will be enabled. Packages
| won't break
| dathery wrote:
| Previously people basically didn't bother to write
| multithreaded Python at all due to the GIL. Threads were
| primarily used when you had multiple pieces of work to do which
| could end up blocked on independent I/O. Which is common and
| useful of course, but doesn't help with the performance of CPU-
| bound Python code.
|
| Even outside of high-intensity CPU work, this can be useful. A
| problem lately is that a lot of code is written using Python's
| native asyncio language features. These run single-threaded
| with async/await to yield execution, much like in NodeJS, and
| can achieve pretty good throughput even with a single thread
| (thousands of reqs/second).
|
| However, a big problem is that any time you do _any_ CPU work,
| you block all other coroutines, which causes all kinds of
| obscure issues and ruins your reqs/second. For example, you
| might see random IO timeouts in one coroutine which are
| actually caused by a totally different coroutine hogging the
| CPU for a bit. It can be very hard to get observability into
| why this is happening. asyncio provides a `asyncio.to_thread()`
| function [1] which can help to take blocking work off the main
| thread, but because of the GIL it doesn't truly allow the CPU-
| bound to avoid interfering with other coroutines.
|
| [1] https://docs.python.org/3/library/asyncio-
| task.html#asyncio....
| pgraf wrote:
| If anyone wondered GIL = Global Interpreter Lock
___________________________________________________________________
(page generated 2024-03-11 23:00 UTC)