[HN Gopher] The first year of free-threaded Python
___________________________________________________________________
The first year of free-threaded Python
Author : rbanffy
Score : 230 points
Date : 2025-05-16 09:42 UTC (13 hours ago)
(HTM) web link (labs.quansight.org)
(TXT) w3m dump (labs.quansight.org)
| AlexanderDhoore wrote:
| Am I the only one who sort of fears the day when Python loses the
| GIL? I don't think Python developers know what they're asking
| for. I don't really trust complex multithreaded code in any
| language. Python, with its dynamic nature, I trust least of all.
| DHolzer wrote:
| I was thinking that too. I am really not a professional
| developer though.
|
| OFC it would be nice to just write python and everything would
| be 12x accelerated, but i don't see how there would not be any
| draw-backs that would interfere with what makes python so
| approachable.
| NortySpock wrote:
| I hope at least the option remains to enable the GIL, because I
| don't trust me to write thread-safe code on the first few
| attempts.
| txdv wrote:
| how does the the language being dynamic negatively affect the
| complexity of multithreading?
| nottorp wrote:
| Is there so much legacy python multithreaded code anyway?
|
| Considering everyone knew about the GIL, I'm thinking most
| people just wouldn't bother.
| toxik wrote:
| There is, and what's worse, it assumes a global lock will
| keep things synchronized.
| rowanG077 wrote:
| Does it? The GIL only ensured each interpreter
| instruction is atomic. But any group of instruction is
| not protected. This makes it very hard to rely on the GIL
| for synchronization unless you really know what you are
| doing.
| immibis wrote:
| AFAIK a group of instructions is only non-protected if
| one of the instructions does I/O. Explicit I/O - page
| faults don't count.
| kfrane wrote:
| If I understand that correctly, it would mean that
| running a function like this on two threads f(1) and f(2)
| would produce a list of 1 and 2 without interleaving.
| def f(x): for _ in range(N):
| l.append(x)
|
| I've tried it out and they start interleaving when N is
| set to 1000000.
| breadwinner wrote:
| When the language is dynamic there is less rigor. Statically
| checked code is more likely to be correct. When you add
| threads to "fast and loose" code things get really bad.
| jaoane wrote:
| Unless your claim is that the same error can happen more
| times per minute because threading can execute more code in
| the same timespan, this makes no sense.
| breadwinner wrote:
| Some statically checked languages and tools can catch
| potential data races at compile time. Example: Rust's
| ownership and borrowing system enforces thread safety at
| compile time. Statically typed functional languages like
| Haskell or OCaml encourage immutability, which reduces
| shared mutable state -- a common source of concurrency
| bugs. Statically typed code can enforce usage of thread-
| safe constructs via types (e.g., Sync/Send in Rust or
| ConcurrentHashMap in Java).
| jerf wrote:
| I have a hypothesis that being dynamic has no particular
| effect on the complexity of multithreading. I think the
| apparent effect is a combination of two things: 1. All our
| dynamic scripting languages in modern use date from the 1990s
| before this degree of threading was a concern for the
| languages and 2. It is _really hard_ to retrofit code written
| for not being threaded to work in a threaded context, and the
| "deeper" the code in the system the harder it is. Something
| like CPython is about as "deep" as you can go, so it's
| really, really hard.
|
| I think if someone set out to write a new dynamic scripting
| language today, from scratch, that multithreading it would
| not pose any particular challenge. Beyond that fact that it's
| naturally a difficult problem, I mean, but nothing _special_
| compared to the many other languages that have implemented
| threading. It 's all about all that code from before the
| threading era that's the problem, not the threading itself.
| And Python has a _loooot_ of that code.
| rocqua wrote:
| Dynamic(ally typed) languages, by virtue of not requiring
| strict typing, often lead to more complicated function
| signatures. Such functions are generally harder to reason
| about. Because they tend to require inspection of the
| function to see what is really going on.
|
| Multithreaded code is incredibly hard to reason about. And
| reasoning about it becomes a lot easier if you have certain
| guarantees (e.g. this argument / return value always has this
| type, so I can always do this to it). Code written in dynamic
| languages will more often lack such guarantees, because of
| the complicated signatures. This makes it even harder to
| reason about Multithreaded code, increasing the risk posed by
| multithreaded code.
| miohtama wrote:
| GIL or no-GIL concerns only people who want to run multicore
| workloads. If you are not already spending time threading or
| multiprocessing your code there is practically no change. Most
| race condition issues which you need to think are there
| regardless of GIL.
| immibis wrote:
| With the GIL, multithreaded Python gives concurrent I/O
| without worrying about data structure concurrency (unless you
| do I/O in the middle of it) - it's a lot like async in this
| way - data structure manipulation is atomic between "await"
| expressions (except in the "await" is implicit and you might
| have written one without realizing in which case you have a
| bug). Meanwhile you still get to use threads to handle
| several concurrent I/O operations. I bet a _lot_ of Python
| code is written this way and will start randomly crashing if
| the data manipulation becomes non-atomic.
| rowanG077 wrote:
| Afaik the only guarantee there is, is that a bytecode
| instruction is atomic. Built in data structures are mostly
| safe I think on a per operation level. But combining them
| is not. I think by default every few millisecond the
| interpreter checks for other threads to run even if there
| is no IO or async actions. See `sys.getswitchinterval()`
| hamandcheese wrote:
| This is the nugget of information I was hoping for. So
| indeed even GIL threaded code today can suffer from
| concurrency bugs (more so than many people here seem to
| think).
| ynik wrote:
| Bytecode instructions have never been atomic in Python's
| past. It was always possible for the GIL to be
| temporarily released, then reacquired, in the middle of
| operations implemented in C. This happens because C code
| is often manipulating the reference count of Python
| objects, e.g. via the `Py_DECREF` macro. But when a
| reference count reaches 0, this might run a `__del__`
| function implemented in Python, which means the "between
| bytecode instructions" thread switch can happen inside
| that reference-counting-operation. That's a lot of
| possible places!
|
| Even more fun: allocating memory could trigger Python's
| garbage collector which would also run `__del_-`
| functions. So every allocation was also a possible (but
| rare) thread switch.
|
| The GIL was only ever intended to protect Python's
| internal state (esp. the reference counts themselves);
| any extension modules assuming that their own state would
| also be protected were likely already mistaken.
| rowanG077 wrote:
| Well I didn't think of this myself. It's literally what
| the python official doc says:
|
| > A global interpreter lock (GIL) is used internally to
| ensure that only one thread runs in the Python VM at a
| time. In general, Python offers to switch among threads
| only between bytecode instructions; how frequently it
| switches can be set via sys.setswitchinterval(). Each
| bytecode instruction and therefore all the C
| implementation code reached from each instruction is
| therefore atomic from the point of view of a Python
| program.
|
| https://docs.python.org/3/faq/library.html#what-kinds-of-
| glo...
|
| If this is not the case please let the official python
| team know their documentation is wrong. It indeed does
| state that if Py_DECREF is invoked the bets are off. But
| a ton of operations never do that.
| imtringued wrote:
| You start talking about GIL and then you talk about non-
| atomic data manipulation, which happen to be completely
| different things.
|
| The only code that is going to break because of "No GIL"
| are C extensions and for very obvious reasons: You can now
| call into C code from multiple threads, which wasn't
| possible before, but is now. Python code could always be
| called from multiple python threads even in the presence of
| the GIL in python.
| OskarS wrote:
| That doesn't match with my understanding of free-threaded
| Python. The GIL is being replaced with fine-grained locking
| on the objects themselves, so sharing data-structures
| between threads is still going to work just fine. If you're
| talking about concurrency issues like this causing out-of-
| bounds errors: if len(my_list) > 5:
| print(my_list[5])
|
| (i.e. because a different thread can pop from the list in-
| between the check and the print), that could just as easily
| happen today. The GIL makes sure that only one python
| interpreter runs at once, but it's entirely possible that
| the GIL is released and switches to a different thread
| after the check but before the print, so there's no extra
| thread-safety issue in free-threaded mode.
|
| The problems (as I understand it, happy to be corrected),
| are mostly two-fold: performance and ecosystem. Using fine-
| grained locking is potentially much less efficient than
| using the GIL in the single-threaded case (you have to take
| and release many more locks, and reference count updates
| have to be atomic), and many, many C extensions are written
| under the assumption that the GIL exists.
| fulafel wrote:
| A lot of Python usage is leveraging libraries with parallel
| kernels inside written in other languages. A subset of those
| is bottlenecked on Python side speed. A sub-subset of those
| are people who want to try no-GIL to address the bottleneck.
| But if non-GIL becomes pervasive, it could mean Python
| becomes less safe for the "just parallel kernels" users.
| kccqzy wrote:
| Yes sure. Thought experiment: what happens when these
| parallel kernels suddenly need to call back in to Python?
| Let's say you have a multithreaded sorting library. If you
| are sorting numbers then fine nothing changes. But if you
| are sorting objects you need to use a single thread because
| you need to call PyObject_RichCompare. These new parallel
| kernels will then try to call PyObject_RichCompare from
| multiple threads.
| quectophoton wrote:
| I don't want to add more to your fears, but also remember that
| LLMs have been trained on decades worth of Python code that
| assumes the presence of the GIL.
| rocqua wrote:
| This could, indeed, be quite catastrophic.
|
| I wonder if companies will start adding this to their system
| prompts.
| dotancohen wrote:
| As a Python dabbler, what should I be reading to ensure my
| multi-threaded code in Python is in fact safe.
| cess11 wrote:
| The literature on distributed systems is huge. It depends a
| lot on your use case what you ought to do. If you're lucky
| you can avoid shared state, as in no race conditions in
| either end of your executions.
|
| https://www.youtube.com/watch?v=_9B__0S21y8 is fairly concise
| and gives some recommendations for literature and techniques,
| obviously making an effort in promoting PlusCal/TLA+ along
| the way but showcases how even apparently simple algorithms
| can be problematic as well as how deep analysis has to go to
| get you a guarantee that the execution will be bug free.
| dotancohen wrote:
| My current concern is a CRUD interface that transcribes
| audio in the background. The transcription is triggered by
| user action. I need the "transcription" field disabled
| until the transcript is complete and stored in the
| database, then allow the user to edit the transcription in
| the UI.
|
| Of course, while the transcription is in action the rest of
| the UI (Qt via Pyside) should remain usable. And multiple
| transcription requests should be supported - I'm thinking
| of a pool of transcription threads, but I'm uncertain how
| many to allocate. Half the quantity of CPUs? All the CPUs
| under 50% load?
|
| Advise welcome!
| realreality wrote:
| Use `concurrent.futures.ThreadPoolExecutor` to submit
| jobs, and `Future.add_done_callback` to flip the
| transcription field when the job completes.
| ptx wrote:
| Although keep in mind that the callback will be "called
| in a thread belonging to the process" (say the docs),
| presumably some thread that is not the UI thread. So the
| callback needs to post an event to the UI thread's event
| queue, where it can be picked up by the UI thread's event
| loop and only then perform the UI updates.
|
| I don't know how that's done in Pyside, though. I
| couldn't find a clear example. You might have to use a
| QThread instead to handle it.
| dotancohen wrote:
| Thank you. Perhaps I should trigger the transcription
| thread from the UI thread, then? It is a UI button that
| initiates it after all.
| dotancohen wrote:
| Thank you.
| sgarland wrote:
| Just use multiprocessing. If each job is independent and
| you aren't trying to spread it out over multiple workers,
| it seems much easier and less risky to spawn a worker for
| each job.
|
| Use SharedMemory to pass the data back and forth.
| HDThoreaun wrote:
| Honestly unless youre willing to devote a solid 4+ hours to
| learning about multi threading stick with ayncio
| dotancohen wrote:
| I'm willing to invest an afternoon learning. That's been
| the premise of my entire career!
| bayindirh wrote:
| More realistically, as it happened in ML/AI scene, the
| knowledgeable people will write the complex libraries and will
| hand these down to scientists and other less experienced, or
| risk-averse developers (which is _not_ a bad thing).
|
| With the critical mass Python acquired over the years, GIL
| becomes a very sore bottleneck in some cases. This is why I
| decided to learn Go, for example. Properly threaded (and green
| threaded) programming language which is higher level than
| C/C++, but lower than Python which allows me to do things which
| I can't do with Python. Compilation is another reason, but it
| was secondary with respect to threading.
| jillesvangurp wrote:
| You are not the only one who is afraid of changes and a bit
| change resistant. I think the issue here is that the reasons
| for this fear are not very rational. And also the interest of
| the wider community is to deal with technical debt. And the GIL
| is pure technical debt. Defensible 30 years ago, a bit awkward
| 20 years ago, and downright annoying and embarrassing now that
| world + dog does all their AI data processing with python at
| scale for the last 10. It had to go in the interest of future
| proofing the platform.
|
| What changes for you? Nothing unless you start using threads.
| You probably weren't using threads anyway because there is
| little to no point in python to using them. Most python code
| bases completely ignore the threading module and instead use
| non blocking IO, async, or similar things. The GIL thing only
| kicks in if you actually use threads.
|
| If you don't use threads, removing the GIL changes nothing.
| There's no code that will break. All those C libraries that
| aren't thread safe are still single threaded, etc. Only if you
| now start using threads do you need to pay attention.
|
| There's some threaded python code of course that people may
| have written in python somewhat naively in the hope that it
| would make things faster that is constantly hitting the GIL and
| is effectively single threaded. That code now might run a
| little faster. And probably with more bugs because naive
| threaded code tends to have those.
|
| But a simple solution to address your fears: simply don't use
| threads. You'll be fine.
|
| Or learn how to use threads. Because now you finally can and it
| isn't that hard if you have the right abstractions. I'm sure
| those will follow in future releases. Structured concurrency is
| probably high on the agenda of some people in the community.
| HDThoreaun wrote:
| > But a simple solution to address your fears: simply don't
| use threads. You'll be fine.
|
| Im not worried about new code. Im worried about stuff written
| 15 years ago by a monkey who had no idea how threads work and
| just read something on stack overflow that said to use
| threading. This code will likely break when run post-GIL. I
| suspect there is actually quite a bit of it.
| bgwalter wrote:
| If it is C-API code: Implicit protection of global
| variables by the GIL is a documented feature, which makes
| writing extensions much easier.
|
| Most C extensions that will break are not written by
| monkeys, but by conscientious developers that followed best
| practices.
| bayindirh wrote:
| Software rots, software tools evolve. When Intel released
| performance primitives libraries which required
| recompilation to analyze multi-threaded libraries, we were
| amazed. Now, these tools are built into _processors_ as
| performance counters and we have way more advanced tools to
| analyze how systems behave.
|
| Older code will break, but they break all the time. A
| language changes how something behaves in a new revision,
| suddenly 20 year old bedrock tools are getting massively
| patched to accommodate both new and old behavior.
|
| Is it painful, ugly, unpleasant? Yes, yes and yes. However
| change is inevitable, because some of the behavior was
| rooted in _inability to do some things_ with current
| technology, and as hurdles are cleared, we change how
| things work.
|
| My father's friend told me that length of a variable's name
| used to affect compile/link times. Now we can test whether
| we have memory leaks in Rust. That thing was impossible 15
| years ago due to performance of the processors.
| delusional wrote:
| > Software rots
|
| No it does not. I hate that analogy so much because it
| leads to such bad behavior. Software is a digital
| artifact that can does not degrade. With the right
| attitude, you'd be able to execute the same binary on new
| machines for as long as you desired. That is not true of
| organic matter that actually rots.
|
| The only reason we need to change software is that we
| trade that off against something else. Instructions are
| reworked, because chasing the universal Turing machine
| takes a few sacrifices. If all software has to run on the
| same hardware, those two artifacts have to have a
| dialogue about what they need from each other.
|
| If we didnt want the universal machine to do anything
| new. If we had a valuable product. We could just keep
| making the machine that executes that product. It never
| rots.
| kstrauser wrote:
| That's not what the phrase implies. If you have a C
| program from 1982, you can still compile it on a 1982
| operating system and toolchain and it'll work just as
| before.
|
| But if you tried to compile it on today's libc, making
| today's syscalls... good luck with that.
|
| Software "rots" in the sense that it has to be updated to
| run on today's systems. They're a moving target. You can
| still run HyperCard on an emulator, but good luck running
| it unmodded on a Mac you buy today.
| dahcryn wrote:
| yes it does.
|
| If software is implicitly built on wrong understanding,
| or undefined behaviour, I consider it rotting when it
| starts to fall apart as those undefined behaviours get
| defined. We do not need to sacrifice a stable future
| because of a few 15 year old programs. Let the people who
| care about the value that those programs bring, manage
| the update cycle and fix it.
| eblume wrote:
| Software is written with a context, and the context
| degrades. It must be renewed. It rots, sorry.
| igouy wrote:
| You said it's the context that rots.
| bayindirh wrote:
| It's a matter of perspective, I guess...
|
| When you look from the program's perspective, the context
| changes and becomes unrecognizable, IOW, it rots.
|
| When you look from the context's perspective, the program
| changes by not evolving and keeping up with the context,
| IOW, it rots.
|
| Maybe we anthropomorphize both and say "they grow apart".
| :)
| igouy wrote:
| We say the context has breaking changes.
|
| We say the context is not backwards compatible.
| indymike wrote:
| >> Software rots > No it does not.
|
| I'm thankful that it does, or I would have been out of
| work long ago. It's not that the files change (literal
| rot), it is that hardware, OSes, libraries, and
| everything else changes. I'm also thankful that we have
| not stopped innovating on all of the things the software
| I write depends on. You know, another thing changes -
| what we are using the software for. The accounting
| software I wrote in the late 80s... would produce
| financial reports that were what was expected then, but
| would not meet modern GAAP requirements.
| rocqua wrote:
| Fair point, but there is an interesting question posed.
|
| Software doesn't rot, it remains constant. But the
| context around it changes, which means it loses
| usefulness slowly as time passes.
|
| What is the name for this? You could say 'software
| becomes anachronistic'. But is there a good verb for
| that? It certainly seems like something that a lot more
| than just software experiences. Plenty of real world
| things that have been perfectly preserved are now much
| less useful because the context changed. Consider an
| Oxen-yoke, typewriters, horse-drawn carriages, envelopes,
| phone switchboards, etc.
|
| It really feels like this concept should have a verb.
| igouy wrote:
| obsolescence
| cestith wrote:
| My only concern is this kind of change in semantics for
| existing syntax is more worthy of a major revision than a
| point release.
| rbanffy wrote:
| It's opt-in at the moment. It won't be the default
| behavior for a couple releases.
|
| Maybe we'll get Python 4 with no GIL.
|
| /me ducks
| spookie wrote:
| The other day I compiled a 1989 C program and it did the
| job.
|
| I wish more things were like that. Tired of building
| things on shaky grounds.
| rbanffy wrote:
| If you go into mainframes, you'll compile code that was
| written 50 years ago without issue. In fact, you'll run
| code that was compiled 50 years ago and all that'll
| happen is that it'll finish much sooner than it did on
| the old 360 it originally ran on.
| actinium226 wrote:
| If code has been unmaintained for more than a few years,
| it's usually such a hassle to get it working again that 99%
| of the time I'll just write my own solution, and that's
| without threads.
|
| I feel some trepidation about threads, but at least for
| debugging purposes there's only one process to attach to.
| dhruvrajvanshi wrote:
| > Im not worried about new code. Im worried about stuff
| written 15 years ago by a monkey who had no idea how
| threads work and just read something on stack overflow that
| said to use threading. This code will likely break when run
| post-GIL. I suspect there is actually quite a bit of it.
|
| I was with OP's point but then you lost me. You'll always
| have to deal with that coworker's shitty code, GIL or not.
|
| Could they make a worse mess with multi threading? Sure. Is
| their single threaded code as bad anyway because at the end
| of the day, you can't even begin understand it? Absolutely.
|
| But yeah I think python people don't know what they're
| asking for. They think GIL less python is gonna give
| everyone free puppies.
| zahlman wrote:
| >Im worried about stuff written 15 years ago
|
| Please don't - it isn't relevant.
|
| 15 years ago, new Python code was still dominantly for 2.x.
| Even code written back then with an eye towards 3.x
| compatibility (or, more realistically, lazily run through
| `2to3` or `six`) will have quite little chance of running
| acceptably on 3.14 regardless. There have been considerable
| removals from the standard library, `async` is no longer a
| valid identifier name (you laugh, but that broke Tensorflow
| once). The attitude taken towards """strings""" in a lot of
| 2.x code results in constructs that can be automatically
| made into valid syntax that _appears_ to preserve the
| original intent, but which are not at all automatically
| _fixed_.
|
| Also, the modern expectation is of a lock-step release
| cadence. CPython only supports up to the last 5 versions,
| released annually; and whenever anyone publishes a new
| version of a package, generally they'll see no point in
| supporting unsupported Python versions. Nor is anyone who
| released a package in the 3.8 era going to patch it if it
| breaks in 3.14 - because _support for 3.14 was never
| advertised anyway_. In fact, in most cases, support for
| _3.9_ wasn 't originally advertised, and you _can 't update
| the metadata_ for an existing package upload (you have to
| make a new one, even if it's just a "post-release") even if
| you test it and it _does_ work.
|
| Practically speaking, pure-Python packages _usually do_
| work in the next version, and in the next several versions,
| perhaps beyond the support window. But you can really never
| predict what 's going to break. You can only offer a new
| version when you find out that it's going to break - and a
| lot of developers are going to just roll that fix into the
| feature development they were doing anyway, because life's
| too short to backport everything for everyone. (If there's
| no longer active development and only maintenance, well,
| good luck to everyone involved.)
|
| If 5 years isn't long enough for your purposes, practically
| speaking you need to maintain an environment with an
| outdated interpreter, and find a third party (RedHat seems
| to be a popular choice here) to maintain it.
| dkarl wrote:
| > What changes for you? Nothing unless you start using
| threads
|
| Coming from the Java world, you don't know what you're
| missing. Looking inside an application and seeing a bunch of
| threadpools managed by competing frameworks, debugging
| timeouts and discovering that tasks are waiting more than a
| second to get scheduled on the wrong threadpool, tearing your
| hair out because someone split a tiny sub-10ms bit of
| computation into two tasks and scheduling the second takes a
| hundred times longer than the actual work done, adding a
| library for a trivial bit of functionality and discovering
| that it spins up yet another threadpool when you initialize
| it.
|
| (I'm mostly being tongue in cheek here because I know it's
| nice to have threading when you need it.)
| rbanffy wrote:
| > There's some threaded python code of course
|
| A fairly common pattern for me is to start a terminal UI
| updating thread that redraws the UI every second or so while
| one or more background threads do their thing. Sometimes,
| it's easier to express something with threads and we do it
| not to make the process faster (we kind of accept it will be
| a bit slower).
|
| The real enemy is state that can me mutated from more than
| one place. As long as you know who can change what, threads
| are not that scary.
| zem wrote:
| this looks extremely promising
| https://microsoft.github.io/verona/pyrona.html
| freeone3000 wrote:
| I'm sure you'll be happy using the last language that has to
| fork() in order to thread. We've only had consumer-level
| multicore processors for 20 years, after all.
| im3w1l wrote:
| You have to understand that people come from very different
| angles with python. Some people write web servers where in
| python, where speed equals money saved. Other people write
| little UI apps that where speed is a complete non-issue. Yet
| others write aiml code that spends most of its time in gpu
| code. But then they want to do just a little data massaging
| in python which can easily bottleneck the whole thing. And
| some people people write scripts that don't use a .env but
| rather os-libraries.
| bratao wrote:
| This is a common mistake and very badly communicated. The GIL
| do not make the Python code thread-safe. It only protect the
| internal CPython state. Multi-threaded Python code is not
| thread-safe today.
| amelius wrote:
| Well, I think you can manipulate a dict from two different
| threads in Python, today, without any risk of segfaults.
| pansa2 wrote:
| You can do so in free-threaded Python too, right? The dict
| is still protected by a lock, but one that's much more
| fine-grained than the GIL.
| amelius wrote:
| Sounds good, yes.
| porridgeraisin wrote:
| Internal cpython state also includes say, a dictionary's
| internal state. So for practical purposes it is safe. Of
| course, TOCTOU, stale reads and various race conditions are
| not (and can never be) protected by the GIL.
| kevingadd wrote:
| This should not have been downvoted. It's true that the GIL
| does not make python code thread-safe implicitly, you have to
| either construct your code carefully to be atomic (based on
| knowledge of how the GIL works) or make use of mutexes,
| semaphores, etc. It's just memory-safe and can still have
| races etc.
| tialaramex wrote:
| You're not the only one. David Baron's note certainly applies:
| https://bholley.net/blog/2015/must-be-this-tall-to-write-mul...
|
| In a language conceived for this kind of work it's not as easy
| as you'd like. In most languages you're going to write nonsense
| which has no coherent meaning whatsoever. Experiments show that
| humans can't successfully understand non-trivial programs
| unless they exhibit Sequential Consistency - that is, they can
| be understood as if (which is not reality) all the things which
| happen do happen in some particular order. This is not the
| reality of how the machine works, for subtle reasons, but
| without it merely human programmers are like "Eh, no idea, I
| guess everything is computer?". It's really easy to write
| concurrent programs which do not satisfy this requirement in
| most of these languages, you just can't debug them or reason
| about what they do - a disaster.
|
| As I understand it Python without the GIL will enable more
| programs that lose SC.
| qznc wrote:
| Worst case is probably that it is like a "Python4": Things
| break when people try to update to non-GIL, so they rather stay
| with the old version for decades.
| odiroot wrote:
| It's called job security. We'll be rewriting decades of code
| that's broken by that transition.
| almostgotcaught wrote:
| Do you understand what you're implying?
|
| "Python programmers are so incompetent that Python succeeds as
| a language only because it lacks features they wouldn't know to
| use"
|
| Even if it's circumstantially true, doesn't mean it's the right
| guiding principle for the design of the language.
| frollogaston wrote:
| What reliance did you have in mind? All sorts of calls in
| Python can release the GIL, so you already need locking, and
| there are race conditions just like in most languages. It's not
| like JS where your code is guaranteed to run in order until you
| "await" something.
|
| I don't fully understand the challenge with removing it, but
| thought it was something about C extensions, not something most
| users have to directly worry about.
| pawanjswal wrote:
| This is some serious groundwork for the next era of performance!
| pansa2 wrote:
| Does removal of the GIL have any _other_ effects on multi-
| threaded Python code (other than allowing it to run in parallel)?
|
| My understanding is that the GIL has lasted this long not because
| multi-threaded Python depends on it, but because removing it:
|
| - Complicates the implementation of the interpreter
|
| - Complicates C extensions, and
|
| - Causes single-threaded code to run slower
|
| Multi-threaded Python code already has to assume that it can be
| pre-empted on the boundary between any two bytecode instructions.
| Does free-threaded Python provide the same guarantees, or does it
| require multi-threaded Python to be written differently, e.g. to
| use additional locks?
| rfoo wrote:
| > Does free-threaded Python provide the same guarantees
|
| Mostly. Some of the "can be pre-empted on the boundary between
| any two bytecode instructions" bugs are really hard to hit
| without free-threading, though. And without free-threading
| people don't use as much threading stuff. So by nature it
| exposes more bugs.
|
| Now, my rants:
|
| > have any other effects on multi-threaded Python code
|
| It stops people from using multi-process workarounds. Hence, it
| simplifies user-code. IMO totally worth it to make the
| interpreter more complex.
|
| > Complicates C extensions
|
| The alternative (sub-interpreters) complicates C extensions
| more than free-threading and the top one most important C
| extension in the entire ecosystem, numpy, stated that they
| can't and they don't want to support sub-interpreters. On
| contrary, they already support free-threading today and are
| actively sorting out remaining bugs.
|
| > Causes single-threaded code to run slower
|
| That's the trade-off. Personally I think a single digit
| percentage slow-down of single-threaded code worth it.
| celeritascelery wrote:
| > That's the trade-off. Personally I think a single digit
| percentage slow-down of single-threaded code worth it.
|
| Maybe. I would expect that 99% of python code going forward
| will still be single threaded. You just don't need that extra
| complexity for most code. So I would expect that python code
| as a whole will have worse performance, even though a handful
| of applications will get faster.
| pphysch wrote:
| But the bar to parallelizing code gets much lower, in
| theory. Your serial code got 5% slower but has a direct
| path to being 50% faster.
|
| And if there's a good free-threaded HTTP server
| implementation, the RPS of "Python code as a whole" could
| increase dramatically.
| weakfish wrote:
| Is there any news from FastAPI folks and/or Gunicorn on
| their support?
| fjasdfas wrote:
| You can do multiple processes with SO_REUSEPORT.
|
| free-threaded makes sense if you need shared state.
| pphysch wrote:
| Any webserver that wants to cache and reuse content cares
| about shared state, but usually has to outsource that to
| a shared in-memory database because the language can't
| support it.
| rfoo wrote:
| That's the mindset that leads to the funny result that `uv
| pip` is like 10x faster than `pip`.
|
| Is it because Rust is just fast? Nope. For anything after
| resolving dependency versions raw CPU performance doesn't
| matter at all. It's writing concurrent PLUS parallel code
| in Rust is easier, doesn't need to spawn a few processes
| and wait for the interpreter to start in each, doesn't need
| to serialize whatever shit you want to run constantly. So,
| someone did it!
|
| Yet, there's a pip maintainer who actively sabotages free-
| threading work. Nice.
| notpushkin wrote:
| > Yet, there's a pip maintainer who actively sabotages
| free-threading work.
|
| Wow. Could you elaborate?
| foresto wrote:
| As I recall, CPython has also been getting speed-ups
| lately, which ought to make up for the minor single-
| threaded performance loss introduced by free threading.
| With that in mind, the recent changes seem like an overall
| win to me.
| rocqua wrote:
| Note that there is an entire order of magnitude range for a
| 'single digit'.
|
| A 1% slowdown seems totally fine. A 9% slowdown is pretty
| bad.
| jacob019 wrote:
| Your understanding is correct. You can use all the cores but
| it's much slower per thread and existing libraries may need to
| be reworked. I tried it with PyTorch, it used 10x more CPU to
| do half the work. I expect these issues to improve, still great
| to see after 20 years wishing for it.
| btilly wrote:
| It makes race conditions easier to hit, and that will require
| multi-threaded Python to be written with more care to achieve
| the same level of reliability.
| heybrendan wrote:
| I am a Python user, but far from an expert. Occasionally, I've
| used 'concurrent.futures' to kick off running some very simple
| functions, at the same time.
|
| How are 'concurrent.futures' users impacted? What will I need to
| change moving forward?
| rednafi wrote:
| It's going to get faster since threads won't be locked on GIL.
| If you're locking shared objects correctly or not using them
| all, then you should be good.
| 0x000xca0xfe wrote:
| I know it's just an AI image... but a snake with two tails?
| C'mon!
| brookst wrote:
| Confusoborus
| vpribish wrote:
| shh. don't complain too loudly or we'll lose an important tell.
| python articles using snake illustrations can usually be
| ignored because they are not clueful.
|
| -- python, monty
| bgwalter wrote:
| This is just an advertisement for the company. Fact is, free-
| threading is still up to 50% slower, the tail call interpreter
| isn't much faster at all, and free-threading is still flaky.
|
| Things they won't tell you at PyCon.
| tomrod wrote:
| QuantSight isn't a formal company though, it's a skunkworks/OSS
| research group run by the Travis Oliphant.
| lenerdenator wrote:
| I don't see how any of that's a problem given that it's not the
| default for how people run Python.
|
| It's a big project that's going to take lots of time by lots of
| people to finish. Keep it behind opt-in, keep accepting pull
| requests after rigorous testing, and it's fine.
| pjmlp wrote:
| On the other news, Microsoft dumped the whole faster Python team,
| apparently the 2025 earnings weren't enough to keep the team
| around.
|
| https://www.linkedin.com/posts/mdboom_its-been-a-tough-coupl...
|
| Lets see whatever performance improvements still land on CPython,
| unless other company sponsors the work.
|
| I guess Facebook (no need to correct me on the name) is still
| sponsoring part of it.
| falcor84 wrote:
| It wouldn't have bothered me if you just said "Facebook" - I
| probably wouldn't have even noticed it. But I'm really curious
| why you chose to write "Facebook", then apparently noticed the
| issue, and instead of replacing it with "Meta" decided to add
| the much longer "(no need to correct me on the name)". What axe
| are you grinding?
| pjmlp wrote:
| Yes, because I am quite certain someone without anything
| better to do would correct me on that.
|
| For me Facebook will always be Facebook, and Twitter will
| always be Twitter.
| falcor84 wrote:
| > Yes, because I am quite certain someone without anything
| better to do would correct me on that.
|
| Well, you sure managed to avoid that by setting up camp on
| that hill. Kudos on so much time saved.
|
| > For me Facebook will always be Facebook, and Twitter will
| always be Twitter.
|
| Well, for me the product will always be "Thefacebook", but
| that's since I haven't used it since. But I do respect that
| there's a company running it now that does more stuff and
| contributes to open source projects.
| biorach wrote:
| > Well, you sure managed to avoid that by setting up camp
| on that hill. Kudos on so much time saved.
|
| Why are you picking a fight about this?
| falcor84 wrote:
| I think I'm taking it personally because I had previously
| changed my name and had people repeatedly call me by my
| old name just to annoy/hurt me.
|
| Obviously I know that companies aren't people and don't
| have feelings, but I can't understand why you would
| intentionally avoid using their chosen name, even when
| it's more effort to you.
| kstrauser wrote:
| I wouldn't do that to a person. I'm not worried about
| hurting Twitter's feelings, though.
| Flamentono2 wrote:
| With money which destroied our society
| rbanffy wrote:
| > Twitter will always be Twitter.
|
| If Elon can deadname his daughter, then we can deadname his
| company.
| kstrauser wrote:
| That's the rationale I've been using.
| rich_sasha wrote:
| Ah that's very, very sad. I guess they have embraced and
| extended, there's only one thing left to do.
| biorach wrote:
| At this stage the cliched and clueless comments about
| embrace/extend/extinguish are tiresome and inevitable
| whenever Microsoft is mentioned.
|
| A few decades ago MS did indeed have a playbook which they
| used to undermine open standards. Laying off some members of
| the Python team bears no resemblence whatsoever to that. At
| worst it will delay the improvement of free-threaded Python.
| That's all.
|
| Your comment is lazy and unfounded.
| kstrauser wrote:
| _cough_ Bullshit _cough_
|
| * VSCode got popular and they started preventing forks from
| installing its extensions.
|
| * They extended the Free Source pyright language server
| into the proprietary pylance. They don't even sell it. It's
| just there to make the FOSS version less useful.
|
| * They bought GitHub and started rate limiting it to
| unlogged in visitors.
|
| Every time Microsoft touches a thing, they end up locking
| it down. They can't help it. It's their nature. And if
| you're the frog carrying that scorpion across the pond and
| it stings you, well, you can only blame it so much. You
| knew this when they offered the deal.
|
| Every time. It hasn't changed substantially since they
| declared that Linux is cancer, except to be more subtle in
| their attacks.
| oblio wrote:
| I actually hate this trope more because of what is says
| about the poster. Which I guess would, that they're
| someone wearing horse blinders.
|
| There's a part of me that wants to scream at them:
|
| "Look around you!!! It's not 1999 anymore!!! These days
| we have Google, Amazon, Apple, Facebook, etc, which are
| just as bad if not worse!!! Cut it out with the 20+ year
| old bad jokes!!!"
|
| Yes, Microsoft is bad. The reason Micr$oft was the enemy
| back in the day is because they... won. They were bigger
| than anyone else in the fields that mattered (except for
| server-side, where they almost one). Now they're just 1
| in a gang of evils. There's nothing special about them
| anymore. I'm more scared of Apple and Google.
| kstrauser wrote:
| That's only reasonable if you believe you can only
| distrust one company at a time. I distrust every one you
| mentioned there, for different reasons, in different
| ways. I don't think that Apple is trying to exclusively
| own the field of programming tools to their own profit,
| nor do I think that Facebook is. I don't think Apple is
| trying to own all data about every human. I don't think
| Microsoft is trying to force all vendors to sell through
| their app store.
|
| But the thing is that Microsoft hasn't seemed to
| fundamentally change since 1999. They appear kinder and
| friendlier but they keep running the same EEE playbook
| everywhere they can. Lots of us give them a free pass
| because they let us run a nifty free-for-now programming
| editor. That doesn't change the leopard's spots, though.
| mixmastamyk wrote:
| All these posts and no one mentioned their numerous,
| recent, abusive deeds around Windows or negligent
| security posture, all the while having captured Uncle Sam
| and other governments.
|
| MS has continued to metastasize and is in some ways worse
| than the old days, even if they've finally accepted the
| utility of open source as a loss leader.
|
| They have the only BigTech products I've been forced to
| use if I want to eat.
| oblio wrote:
| Yet I only ever see these tired EEE memes for Microsoft
| when Chrome is basically the web, for example.
| kstrauser wrote:
| I don't know what to tell you, except that you obviously
| haven't read a lot of my stuff on that topic. (Not that I
| would expect anyone to have, mind you. I'm nobody.) I
| agree with you. I only use Chrome when I must, like when
| I'm updating a Meshtastic radio and the flasher app
| doesn't run on Firefox or Safari.
|
| I'm not anti-MS as much as anti their behavior, whoever
| is acting that way. This thread is directly related to MS
| so I'm expressing my opinion on MS here. I'll be more
| than happy to share my thoughts on Chrome in a Google
| thread.
| biorach wrote:
| None of those were independent projects or open
| standards. VScode and pyright are both MS projects from
| the get-go.
|
| Sabotaging forks is scummy, but the forks were extending
| MS functionality, not the other way around.
|
| GitHub was a private company before it was bought by MS.
| Rate limiting is.... not great, but certainly not an
| extinguish play.
|
| EEE refers to the subversion of open standards or
| independent free software projects. It does not apply to
| any of the above.
|
| MS are still scummy but at least attack them on their own
| demerits, and don't parrot some schtick from decades ago.
| kstrauser wrote:
| It's not just EEE, though. They have a history of getting
| devs all in on a thing and then killing it with
| corporate-grade ADHD. They bought Visual FoxPro, got
| bored with it, and told everyone to rewrite into Visual
| Basic (which they then killed). Then the future was
| Silverlight, until it wasn't. There are a thousand of
| these things that weren't deliberately evil in the EEE,
| but defined the word rugpull before we called it that.
|
| So even without EEE, I think it's supremely risky to
| hitch your wagon to their tech or services (unless you're
| writing primarily for Windows, which is what they'd love
| to help you migrate to). And I can't be convinced the
| GitHub acquisition wasn't some combination of these dark
| patterns.
|
| Step 1: Get a plurality of the world's FOSS into one
| place.
|
| Step 2: Feed it into a LLM and then embed it in a popular
| free editor so that everyone can use GPL code without
| actually having to abide the license.
|
| Step 3: Make it increasingly hard to use for FOSS
| development by starting to add barriers a little at a
| time. _< = we are here_
|
| As a developer, they've done nothing substantial to earn
| my trust. I think a lot of Microsoft employees are good
| people who don't subscribe to all this and who want to do
| the right thing, but corporate culture just won't let
| that be.
| biorach wrote:
| > I think it's supremely risky to hitch your wagon to
| their tech or services
|
| OK, finally, yes, this is very true, for specific parts
| of their tech.
|
| But banging on about EEE just distracts from this, more
| important message.
|
| > Make it increasingly hard to use for FOSS development
| by starting to add barriers a little at a time. <= we are
| here
|
| ....and now you've lost me again
| kstrauser wrote:
| Note I wasn't the one who said EEE upstream. I was just
| replying to the thread.
|
| Hanlon's razor is a thing, and I generally follow it.
| It's just that I've seen Microsoft make so many "oops,
| our bad!" mistakes over the years that purely
| coincidentally gave them an edge up over their
| competition, that I tend to distrust such claims from
| them.
|
| I don't feel that way about all corps. Oracle doesn't
| make little mistakes that accidentally harm the
| competition while helping themselves. No, they'll look
| you in the eye and explain that they're mugging you while
| they take your wallet. It's kind of refreshingly honest
| in its own way.
| dhruvrajvanshi wrote:
| > Oracle doesn't make little mistakes that accidentally
| harm the competition while helping themselves. No,
| they'll look you in the eye and explain that they're
| mugging you while they take your wallet. It's kind of
| refreshingly honest in its own way.
|
| Fucking hell bud :D
| kstrauser wrote:
| Tell me I'm wrong! :D
| stusmall wrote:
| That shows a misunderstanding of what EEE was. This team was
| sending changes upstream which is the exact opposite of
| "extend" step of the strategy. The idea of "extend" was to
| add propriety extensions on top of an open standard/project
| locking customers into the MSFT implementation.
| jerrygenser wrote:
| Ok so a better example of what you describe might be
| vscode.
| nothrabannosir wrote:
| What existing open standard did vscode Embrace? I thought
| Microsoft created v0 themselves.
|
| A classic example is ActiveX.
| biorach wrote:
| > A classic example is ActiveX.
|
| Nah, even that was based on earlier MS technologies - OLE
| and COM
|
| A good starter list of EEE plays is on the wikipedia
| page: https://en.wikipedia.org/wiki/Embrace,_extend,_and_
| extinguis...
| nothrabannosir wrote:
| Funny you linked that page because that's where I got
| activex from :D
|
| _> Examples by Microsoft_
|
| _> Browser incompatibilities_
|
| _> The plaintiffs in an antitrust case claimed Microsoft
| had added support for ActiveX controls in the Internet
| Explorer Web browser to break compatibility with Netscape
| Navigator, which used components based on Java and
| Netscape 's own plugin system._
| biorach wrote:
| ah ok, sorry. I thought you were saying that they tried
| an EEE play on ActiveX.
|
| You meant they used ActiveX in an EEE play in the browser
| wars.
| nothrabannosir wrote:
| Honestly I kept it vague because I didn't actually know
| so your call-out was totally valid. I know it better now
| than without your clarification so thanks :+1:
| JacobHenner wrote:
| VSCode displaced Atom, pre-GitHub acquisition, by
| building on top of Atom's rendering engine Electron.
| bgwalter wrote:
| They were quite a bit behind the schedule that was promised
| five years ago.
|
| Additionally, at this stage the severe political and governance
| problems cannot have escaped Microsoft. I imagine that no
| competent Microsoft employee wants to give his expertise to
| CPython, only later to suffer group defamation from a couple of
| elected mediocre people.
|
| CPython is an organization that overpromises, allocates jobs to
| the obedient and faithful while weeding out competent
| dissenters.
|
| It wasn't always like that. The issues are entirely self-
| inflicted.
| biorach wrote:
| > CPython is an organization that overpromises, allocates
| jobs to the obedient and faithful while weeding out competent
| dissenters.
|
| This stinks of BS
| wisty wrote:
| It sounds like an oblique reference to that time they
| temporarily suspended one of the of the most valuable
| members of the community, apparently for having the
| audacity to suggest that their powers to suspend members of
| the community seemed a little arbitrary and open to abuse.
| biorach wrote:
| Well they could just say that instead of wasting people's
| time with oblique references
| robertlagrant wrote:
| Saying "This stinks of BS" is going to mean you have
| little standing to criticise other people for wasting
| time.
| make3 wrote:
| Microsoft also fired a whole lot of other open source people
| unrelated to Python in this current layoff
| pjmlp wrote:
| Notably MAUI, ASP.NET, Typescript and AI frameworks.
| vlovich123 wrote:
| That's unfortunate but I called it when people were claiming
| that Microsoft had committed to this effort for the long term.
| mtzaldo wrote:
| Could we do a crowdfunding campaign so we can keep paying
| them? The whole world is/will benefit from their work.
| morkalork wrote:
| Didn't Google lay off their entire Python development team in
| the last year as well? I wonder if there is some impetus behind
| both.
| make3 wrote:
| doesn't print money right away = cut by executive #3442
| amelius wrote:
| The snake in the header image appears to have two tail-ends ...
| cestith wrote:
| I guess it's spawned a second thread in the same process.
| sgarland wrote:
| > Instead, many reach for multiprocessing, but spawning processes
| is expensive
|
| Agreed.
|
| > and communicating across processes often requires making
| expensive copies of data
|
| SharedMemory [0] exists. Never understood why this isn't used
| more frequently. There's even a ShareableList which does exactly
| what it sounds like, and is awesome.
|
| [0]:
| https://docs.python.org/3/library/multiprocessing.shared_mem...
| ogrisel wrote:
| You cannot share arbitrarily structured objects in the
| `ShareableList`, only atomic scalars and bytes / strings.
|
| If you want to share structured Python objects between
| instances, you have to pay the cost of
| `pickle.dump/pickle.dump` (CPU overhead for interprocess
| communication) + the memory cost of replicated objects in the
| processes.
| tomrod wrote:
| I can fit a lot of json into bytes/strings though?
| cjbgkagh wrote:
| Perhaps flatbuffers would be better?
| tomrod wrote:
| I love learning from folks on HN -- thanks! Will check it
| out.
| notpushkin wrote:
| Take a look at https://capnproto.org/ as well, while at
| it.
|
| Neither solve the copying problem, though.
| frollogaston wrote:
| Ah, I forgot capnproto doesn't let you edit a serialized
| proto in-memory, it's read-only. In theory this should be
| possible as long as you're not changing the length of
| anything, but I'm not surprised such trickery is
| unsupported.
|
| So this doesn't seem like a versatile solution for
| sharing data structs between two Python processes. You're
| gonna have to reserialize the whole thing if one side
| wants to edit, which is basically copying.
| tinix wrote:
| let me introduce you to quickle.
| vlovich123 wrote:
| That's even worse than pickle.
| tomrod wrote:
| pickle pickles to pickle binary, yeah? So can stream that
| too with an io Buffer :D
| frollogaston wrote:
| If all your state is already json-serializable, yeah. But
| that's just as expensive as copying if not more, hence what
| cjbgkagh said about flatbuffers.
| frollogaston wrote:
| oh nvm, that doesn't solve this either
| reliabilityguy wrote:
| What's the point? The whole idea is to share an object, and
| not to serialize them whether it's json, pickle, or
| whatever.
| tomrod wrote:
| I mean, the answer to this is pretty straightforward --
| because we can, not because we should :)
| notpushkin wrote:
| We need a dataclass-like interface on top of a ShareableList.
| sgarland wrote:
| So don't do that? Send data to workers as primitives, and
| have a separate process that reads the results and serializes
| it into whatever form you want.
| modeless wrote:
| Yeah I've had great success sharing numpy arrays this way.
| Explicit sharing is not a huge burden, especially when compared
| with the difficulty of debugging problems that occur when you
| accidentally share things between threads. People vastly
| overstate the benefit of threads over multiprocessing and I
| don't look forward to all the random segfaults I'm going to
| have to debug after people start routinely disabling the GIL in
| a library ecosystem that isn't ready.
|
| I wonder why people never complained so much about JavaScript
| not having shared-everything threading. Maybe because
| JavaScript is so much faster that you don't have to reach for
| it as much. I wish more effort was put into baseline
| performance for Python.
| dhruvrajvanshi wrote:
| > I wonder why people never complained so much about
| JavaScript not having shared-everything threading. Maybe
| because JavaScript is so much faster that you don't have to
| reach for it as much. I wish more effort was put into
| baseline performance for Python.
|
| This is a fair observation.
|
| I think a part of the problem is that the things that make
| GIL less python hard are also the things that make faster
| baseline performance hard. I.e. an over reliance of the
| ecosystem on the shape of the CPython data structures.
|
| What makes python different is that a large percentage of
| python code isn't python, but C code targeting the CPython
| api. This isn't true for a lot of other interpreted
| languages.
| com2kid wrote:
| > I wonder why people never complained so much about
| JavaScript not having shared-everything threading. Maybe
| because JavaScript is so much faster that you don't have to
| reach for it as much. I wish more effort was put into
| baseline performance for Python.
|
| Nobody sane tries to do math in JS. Backend JS is recommended
| for situations where processing is minimal and it is mostly
| lots of tiny IO requests that need to be shunted around.
|
| I'm a huge JS/Node proponent and if someone says they need to
| write a backend service that crunches a lot of numbers, I'll
| recommend choosing a different technology!
|
| For some reason Python peeps keep trying to do actual
| computations in Python...
| frollogaston wrote:
| Python peeps tend to do heavy numbers calc in numpy, but
| sometimes you're doing expensive things with
| dictionaries/lists.
| zahlman wrote:
| > I wish more effort was put into baseline performance for
| Python.
|
| There has been. That's why the bytecode is incompatible
| between minor versions. It was a major selling(?) point for
| 3.11 and 3.12 in particular.
|
| But the "Faster CPython" team at Microsoft was apparently
| just laid off (https://www.linkedin.com/posts/mdboom_its-
| been-a-tough-coupl...), and all of the optimization work has
| to my understanding been based around fairly traditional
| techniques. The C part of the codebase has decades of legacy
| to it, after all.
|
| Alternative implementations like PyPy often post impressive
| results, and are worth checking out if you need to worry
| about native Python performance. Not to mention the benefits
| of shifting the work onto compiled code like NumPy, as you
| already do.
| frollogaston wrote:
| "I wonder why people never complained so much about
| JavaScript not having shared-everything threading"
|
| Mainly cause Python is often used for data pipelines in ways
| that JS isn't, causing situations where you do want to use
| multiple CPU cores with some shared memory. If you want to
| use multiple CPU cores in NodeJS, usually it's just a load-
| balancing webserver without IPC and you just use throng, or
| maybe you've got microservices.
|
| Also, JS parallelism simply excelled from the start at
| waiting on tons of IO, there was no confusion about it.
| Python later got asyncio for this, and by now regular threads
| have too much momentum. Threads are the worst of both worlds
| in Py, cause you get the overhead of an OS thread and the
| possibility of race conditions without the full parallelism
| it's supposed to buy you. And all this stuff is confusing to
| users.
| chubot wrote:
| Spawning processes generally takes much less than 1 ms on Unix
|
| Spawning a PYTHON interpreter process might take 30 ms to 300
| ms before you get to main(), depending on the number of imports
|
| It's 1 to 2 orders of magnitude difference, so it's worth being
| precise
|
| This is a fallacy with say CGI. A CGI in C, Rust, or Go works
| perfectly well.
|
| e.g. sqlite.org runs with a process PER REQUEST -
| https://news.ycombinator.com/item?id=3036124
| Sharlin wrote:
| Unix is not the only platform though (and is process creation
| fast on all Unices or just Linux?) The point about
| interpreter init overhead is, of course, apt.
| btilly wrote:
| Process creation should be fast on all Unices. If it isn't,
| then the lowly shell script (heavily used in Unix) is going
| to perform very poorly.
| kragen wrote:
| While I think you've been using Unix longer than I have,
| shell scripts are known for performing very poorly, and
| on PDP-11 Unix (where perhaps shell scripts were most
| heavily used, since Perl didn't exist yet) fork()
| couldn't even do copy-on-write; it had to literally copy
| the process's entire data segment, which in most cases
| also contained a copy of its code. Moving to paged
| machines like the VAX and especially the 68000 family
| made it possible to use copy-on-write, but historically
| speaking, Linux has often been an order of magnitude
| faster than most other Unices at fork(). However, I think
| people mostly don't use _those_ Unices anymore. I imagine
| the BSDs have pretty much caught up by now.
|
| https://news.ycombinator.com/item?id=44009754 gives some
| concrete details on fork() speed on current Linux: 50ms
| for a small process, 700ms for a regular process, 1300ms
| for a venti Python interpreter process, 30000-50000ms for
| Python interpreter creation. This is on a CPU of about 10
| billion instructions per second per core, so forking
| costs on the order of 1/2-10 million instructions.
| fredoralive wrote:
| Python runs on other operating systems, like NT, where
| AIUI processes are rather more heavyweight.
|
| Not all use cases of Python and Windows intersect (how
| much web server stuff is a Windows / IIS / SQL Server /
| Python stack? Probably not many, although WISP is a nice
| acronym), but you've still got to bear it in mind for
| people doing heavy numpy stuff on their work laptop or
| whatever.
| charleshn wrote:
| > Spawning processes generally takes much less than 1 ms on
| Unix
|
| It depends on whether one uses clone, fork, posix_spawn etc.
|
| Fork can take a while depending on the size of the address
| space, number of VMAs etc.
| crackez wrote:
| Fork on Linux should use copy-on-write vmpages now, so if
| you fork inside python it should be cheap. If you launch a
| new Python process from let's say the shell, and it's
| already in the buffer cache, then you should only have to
| pay the startup CPU cost of the interpreter, since the IO
| should be satisfied from buffer cache...
| charleshn wrote:
| > Fork on Linux should use copy-on-write vmpages now, so
| if you fork inside python it should be cheap.
|
| No, that's exactly the point I'm making, copying PTEs is
| not cheap on a large address space, woth many VMAs.
|
| You can run a simple python script allocating a large
| list and see how it affects fork time.
| knome wrote:
| for glibc and linux, fork just calls clone. as does
| posix_spawn, using the flag CLONE_VFORK.
| LPisGood wrote:
| My understanding is that spawning a thread takes just a few
| micro seconds, so whether you're talking about a process or a
| Python interpreter process there are still orders of
| magnitude to be gained.
| kragen wrote:
| To be concrete about this,
| http://canonical.org/~kragen/sw/dev3/forkovh.c took 670ms to
| fork, exit, and wait on the first laptop I tried it on, but
| only 130ms compiled with dietlibc instead of glibc, and with
| glibc on a 2.3 GHz E5-2697 Xeon, it took 130ms compiled with
| glibc.
|
| httpdito http://canonical.org/~kragen/sw/dev3/server.s (which
| launches a process per request) seems to take only about 50ms
| because it's not linked with any C library and therefore only
| maps 5 pages. Also, that doesn't include the time for exit()
| because it runs multiple concurrent child processes.
|
| On _this_ laptop, a Ryzen 5 3500U running at 2.9GHz, forkovh
| takes about 330ms built with glibc and about 130-140ms built
| with dietlibc, and `time python3 -c True` takes about
| 30000-50000ms. I wrote a Python version of forkovh
| http://canonical.org/~kragen/sw/dev3/forkovh.py and it takes
| about 1200ms to fork(), _exit(), and wait().
|
| If anyone else wants to clone that repo and test their own
| machines, I'm interested to hear the results, especially if
| they aren't in Linux. `make forkovh` will compile the C
| version.
|
| 1200ms is pretty expensive in some contexts but not others.
| Certainly it's cheaper than spawning a new Python interpreter
| by more than an order of magnitude.
| jaoane wrote:
| >Spawning a PYTHON interpreter process might take 30 ms to
| 300 ms before you get to main(), depending on the number of
| imports
|
| That's lucky. On constrained systems launching a new
| interpreter can very well take 10 seconds. Python is
| ssssslllloooowwwww.
| morningsam wrote:
| >Spawning a PYTHON interpreter process might take 30 ms to
| 300 ms
|
| Which is why, at least on Linux, Python's multiprocessing
| doesn't do that but fork()s the interpreter, which takes low-
| single-digit ms as well.
| zahlman wrote:
| Even when the 'spawn' strategy is used (default on Windows,
| and can be chosen explicitly on Linux), the overhead can
| largely be avoided. (Why choose it on Linux? Apparently
| forking can cause problems if you also use threads.) Python
| imports can be deferred (`import` is a _statement_ , not a
| compiler or pre-processor directive), and child processes
| (regardless of the creation strategy) name the main module
| as `__mp_main__` rather than `__main__`, allowing the
| programmer to distinguish. (Being able to distinguish is of
| course _necessary_ here, to avoid making a fork bomb -
| since the top-level code runs automatically and `if
| __name__ == '__main__':` is normally top-level code.)
|
| But also keep in mind that _cleanup_ for a Python process
| also takes time, which is harder to trace.
|
| Refs:
|
| https://docs.python.org/3/library/multiprocessing.html#cont
| e... https://stackoverflow.com/questions/72497140
| ori_b wrote:
| As another example: I run https://shithub.us with shell
| scripts, serving a terabyte or so of data monthly (mostly due
| to AI crawlers that I can't be arsed to block).
|
| I'm launching between 15 and 3000 processes per request.
| While Plan 9 is about 10x faster at spawning processes than
| Linux, it's telling that 3000 C processes launching in a
| shell is about as fast as one python interpreter.
| isignal wrote:
| Processes can die independently so the state of a concurrent
| shared memory data structure when a process dies while
| modifying this under a lock can be difficult to manage.
| Postgres which uses shared memory data structures can sometimes
| need to kill all its backend processes because it cannot fully
| recover from such a state.
|
| In contrast, no one thinks about what happens if a thread dies
| independently because the failure mode is joint.
| wongarsu wrote:
| > In contrast, no one thinks about what happens if a thread
| dies independently because the failure mode is joint.
|
| In Rust if a thread holding a mutex dies the mutex becomes
| poisoned, and trying to acquire it leads to an error that has
| to be handled. As a consequence every rust developer that
| touches a mutex has to think about that failure mode. Even if
| in 95% of cases the best answer is "let's exit when that
| happens".
|
| The operating system tends to treat your whole process as one
| and shot down everything or nothing. But a thread can still
| crash in its own due to unhandled oom, assertion failures or
| any number of other issues
| jcalvinowens wrote:
| > But a thread can still crash in its own due to unhandled
| oom, assertion failures or any number of other issues
|
| That's not really true on POSIX. Unless you're doing nutty
| things with clone(), or you actually have explicit code
| that calls pthread_exit() or gettid()/pthread_kill(), the
| whole process is always going to die at the same time.
|
| POSIX signal dispositions are process-wide, the only way
| e.g. SIGSEGV kills a single thread is if you write an
| explicit handler which actually does that by hand.
| Unhandled exceptions usually SIGABRT, which works the same
| way.
|
| ** Just to expand a bit: there is a subtlety in that, while
| dispositions are process-wide, one individual thread does
| indeed take the signal. If the signal is handled, only that
| thread sees -EINTR from a blocking syscall; but if the
| signal is not handled, the default disposition affects _all
| threads in the process simultaneously_ no matter which
| thread is actually signalled.
| wahern wrote:
| It would be nice if someday we got per-thread signal
| handlers to complement per-thread signal masking and per-
| thread alternate signal stacks.
| jcalvinowens wrote:
| This is a solvable problem though, the literature is
| overflowing with lock-free implementations of common data
| structures. The real question is how much performance you
| have to sacrifice for the guarantee...
| tinix wrote:
| shared memory only works on dedicated hardware.
|
| if you're running in something like AWS fargate, there is no
| shared memory. have to use the network and file system which
| adds a lot of latency, way more than spawning a process.
|
| copying processes through fork is a whole different problem.
|
| green threads and an actor model will get you much further in
| my experience.
| bradleybuda wrote:
| Fargate is just a container runtime. You can fork processes
| and share memory like you can in any other Linux environment.
| You may not want to (because you are running many cheap /
| small containers) but if your Fargate containers are running
| 0.25 vCPUs then you probably don't want traditional
| multiprocessing or multithreading...
| tinix wrote:
| Go try it and report back.
|
| Fargate isn't just ECS and plain containers.
|
| You cannot use shared memory in fargate, there is literally
| no /dev/shm.
|
| See "sharedMemorySize" here: https://docs.aws.amazon.com/Am
| azonECS/latest/developerguide/...
|
| > If you're using tasks that use the Fargate launch type,
| the sharedMemorySize parameter isn't supported.
| sgarland wrote:
| Well don't use Fargate, there's your problem. Run programs on
| actual servers, not magical serverless bullshit.
| YouWhy wrote:
| Hey, I've been developing professionally with Python for 20
| years, so wanted to weigh in:
|
| Decent threading is awesome news, but it only affects a small
| minority of use cases. Threads are only strictly necessary when
| it's prohibitive to message pass. The Python ecosystem these days
| includes a playbook solution for literally any such case.
| Considering the multiple major pitfalls of threads (i.e.,
| locking), they are likely to become a thing useful only in
| specific libraries/domains and not as a general.
|
| Additionally, with all my love to vanilla Python, anyone who
| needs to squeeze the juice out of their CPU (which is actually
| memory bandwidth) has a plenty of other tools -- off the shelf
| libraries written in native code. (Honorable mention to Pypy,
| numba and such).
|
| Finally, the one dramatic performance innovation in Python has
| been async programming - I warmly encourage everyone not familiar
| with it to consider taking a look.
| kstrauser wrote:
| I haven't been using it that much longer than you, and I agree
| with most of what you're saying, but I'd characterize it
| differently.
|
| Python has a lot of solid workarounds for avoid threading
| because until now Python threading has absolutely sucked. I had
| naively tried to use it to make a CPU-bound workload twice as
| fast and soon realized the implications of the GIL, so I threw
| all that code away and made it multiprocessing instead. That
| sucked in its own way because I had to serialize lots of large
| data structures to pass around, so 2x the cores got me about
| 1.5x the speed and a warmer server room.
|
| I would _love_ to have good threading support in Python. It's
| not always the right solution, but there are a lot of
| circumstances where it'd be absolutely peachy, and today we're
| faking our way around its absence with whole playbooks of
| alternative approaches to avoid the elephant in the room.
|
| But yes, use async when it makes sense. It's a thing of beauty.
| (Yes, Glyph, we hear the "I told you so!" You were right.)
| sylware wrote:
| Got myself a shiny python 3.13.3 (ssl module still unable to
| compile with libressl) replacing a 3.12.2, feels clearly slower.
|
| What's wrong?
| ipsum2 wrote:
| python 3.13 doesn't ship with free-threaded Python compiled
| AFAIK.
| sylware wrote:
| You mean it is not default anymore?
| ipsum2 wrote:
| It's never been the default.
| sylware wrote:
| huh... then why it feels significantly slower since I did
| not touch the build conf.
| jdsleppy wrote:
| Did you compile the Python yourself? If so, you may need to add
| optimization flags https://devguide.python.org/getting-
| started/setup-building/i...
| aitchnyu wrote:
| Whats currently stopping me (apart from library support) from
| running a single command that starts up WSGI workers and Celery
| workers in a single process?
| gchamonlive wrote:
| Nothing, it's just that these aren't first class features of
| the language. Also someone already explained that the GIL is
| mostly about technical debt in the CPython interpreter, so
| there are reasons other than full parallelism to get rid of the
| GIL.
| hello_computer wrote:
| Opting to enable low-level parallelism for user code in an
| imperative, dynamically typed scripting language seems like
| regression. It's less bad for LISP because of the pure-functional
| nature. It's less bad for BEAM languages & Clojure due to
| immutability. It is less bad for C/C++/Rust because you have a
| stronger type system--allowing for deeper static analysis. For
| Python, this is " _high priests of a low cult_ " shitting things
| up for corporate agendas and/or street cred.
| p0w3n3d wrote:
| Look behind! A free-threaded Python!
| EGreg wrote:
| I thought this was mostly a solved problem.
| Fibers Green threads Coroutines Actors
| Queues (eg GCD) ...
|
| Basically you need to reason about what your thing will do.
|
| Separate concerns. Each thing is a server (microservice?) with
| its own backpressure.
|
| They schedule jobs on a queue.
|
| The jobs come with some context, I don't care if it's a closure
| on the heap or a fiber with a stack or whatever. Javascript being
| single threaded with promises wastefully unwinds the entire stack
| for each tick instead of saving context. With callbacks you can
| save context in closures. But even that is pretty fast.
|
| Anyway then you can just load-balance the context across
| machines. Easiest approach is just to have server affinity for
| each job. The servers just contain a cache of the data so if the
| servers fail then their replacements can grab the job from an
| indexed database. The insertion and the lookup is O(log n) each.
| And jobs are deleted when done (maybe leaving behind a small log
| that is compacted) so there are no memory leaks.
|
| Oh yeah and whatever you store durably should be sharded and
| indexed properly, so practicalkt unlimited amounts can be stored.
| Availability in a given share is a function of replicating the
| data, and the economics of it is that the client should pay with
| credits for every time they access. You can even replicate on
| demand (like bittorrent re-seeding) to handle spikes.
|
| This is the general framework whether you use Erlang, Go, Python
| or PHP or whatever. It scales within a company and even across
| companies (as long as you sign/encrypt payloads
| cryptographically).
|
| It doesn't matter so much whether you use php-fpm with threads,
| or swoole, or the new kid on the block, FrankenPHP. Well, I
| should say I prefer the shared-nothing architecture of PHP and
| APC. But in Python, it is the same thing with eg Twisted vs just
| some SAPI.
|
| You're welcome.
| kccqzy wrote:
| It's only a mostly solved problem for concurrent I/O heavy
| workloads. It's not solved in the Python world for parallel
| CPU-bound workloads.
| henry700 wrote:
| I find it peculiar how, in a language so riddled with simple
| concurrency architectural issues, the approach is to
| painstankingly fix every library after fixing the runtime,
| instead of just using some better language. Why does the
| community insist on such a bad language when literally even
| fucking Javascript has a saner execution model?
| mylons wrote:
| i find it peculiar how tribal people are about languages.
| python is fantastic. you're not winning anyone over with
| comments like this. just go write your javascript and be happy,
| bud.
| forrestthewoods wrote:
| > instead of just using some better language
|
| Python the language is pretty bad. Python the ecosystem of
| libraries and tools has no equal, unfortunately.
|
| Switching a language is easy. Switching a billion lines of
| library less so.
|
| And the tragic part is that many of the top "python libraries"
| are just Python interfaces to a C library! But if you want to
| switch to a "better language" that fact isn't helpful.
| kubb wrote:
| I wonder if we get automatic LLM translation of codebases
| from language to language soon - this could close the library
| gap and diminish the language lock in factor.
| dash2 wrote:
| I think the opposite. Every language has flaws. What's
| impressive about Python is their ongoing commitment to work on
| theirs, even the deepest-rooted. It makes me optimistic that
| this is a language to stick with for the long run.
| rednafi wrote:
| I agree about using other languages that have better
| concurrency support if concurrency is your bottleneck.
|
| But changing the language in a brownfield project is hard. I
| love Go, and these days I don't bother with Python if I know
| the backend needs to scale.
|
| But Python's ecosystem is huge, and for data work, there's
| little alternative to it.
|
| With all that said, JavaScript ain't got shit on any language.
| The only good thing about it is Google's runtime, and that has
| nothing to do with the language. JS doesn't have true
| concurrency and is a mess of a language in general. Python is
| slow, riddled with concurrency problems, but at least it's a
| real language created by a guy who knew what he was doing.
| make3 wrote:
| I hate how these threads always devolve in insane discussions
| about why not using threads is better, while most real world
| people who have tried to do real world speeding up of Python code
| realize how amazing it would be to have proper threads with
| shared memory instead of the processes that have so many
| limitations, like forcing to pickle objects back and forth, &
| fork so often just not working in the cloud setting, & spawn
| being so slow in a lot of applications. The usage of processes is
| just much heavier and less straightforward.
___________________________________________________________________
(page generated 2025-05-16 23:00 UTC)