[HN Gopher] Python performance myths and fairy tales
___________________________________________________________________
Python performance myths and fairy tales
Author : todsacerdoti
Score : 204 points
Date : 2025-08-06 08:36 UTC (14 hours ago)
(HTM) web link (lwn.net)
(TXT) w3m dump (lwn.net)
| NeutralForest wrote:
| Cool article, I think a lot of those issues are not Python
| specific so it's a good overview of whatever others can learn
| from a now 30 years old language! I think we'll probably go down
| the JS/TS route where another compiler (Pypy or mypyc or
| something else) will work alongside CPython but I don't see
| Python4 happening.
| tweakimp wrote:
| I thought we would never see the GIL go away and yet, here we
| are. Never say never. Maybe Python4 is Python with another
| compiler.
| pjmlp wrote:
| It required Facebook and Microsoft to change the point of
| view on it, and now the Microsoft team is no more.
|
| So lets see what remains from CPython performance efforts.
| ngrilly wrote:
| I'm not sure I understand the reference to JS/TS: TS is only a
| type checker and has zero effect on runtime performance.
| 2d8a875f-39a2-4 wrote:
| Do you still need an add-on library to use more than one core?
| BlackFly wrote:
| Latest version officially supports full-threaded mode:
| https://docs.python.org/3.14/whatsnew/3.14.html#whatsnew314-...
| franktankbank wrote:
| Eh? Multiprocessing has existed since 2.X days.
| writebetterc wrote:
| Good job on dispelling the myth of "compiler = fast". I hope
| SPython will be able to transfer some of its ideas to CPython
| with time.
| nromiun wrote:
| You would think Luajit would have convinced people by now. But
| most people still think you need a static language and an AOT
| compiler for performance.
| pjmlp wrote:
| Also Smalltalk (Pharo, Squeak, Cincom, Dolphin), Common Lisp
| (SBCL, Clozure, Allegro, LispWorks), Self,....
|
| But yeah.
| mrkeen wrote:
| Where was this dispelled?
| quantumspandex wrote:
| So we are paying 99% of the performance just for the 1% of cases
| where it's nice to code in.
|
| Why do people think it's a good trade-off?
| nromiun wrote:
| Because it's nice to code in. Not everything needs to scale or
| be fast.
|
| Personally I think it is more crazy that you would optimize 99%
| of the time just to need it for 1% of the time.
| BlackFly wrote:
| It isn't an either or choice. The people interested in
| optimizing the performance are typically different people
| than those interested in implementing syntactic sugar. It is
| certainly true that growing the overall codebase risks
| introducing tensions for some feature sets but that is just a
| consideration you take when diligently adding to the
| language.
| nomel wrote:
| That's why Python is the second best language for everything.
|
| The amount of complexity you can code up in a short time,
| that most everyone can contribute to, is incredible.
| dgfitz wrote:
| I can say with certainty I've never paid a penny. Have you?
| lmm wrote:
| Because computers are more than 100x faster than they were when
| I started programming, and they were already fast enough back
| then? (And meanwhile my coding ability isn't any better, if
| anything it's worse)
| jonathrg wrote:
| It's much more than 1%, it is what enables commonly used
| libraries like pytest and Pydantic.
| pjmlp wrote:
| Because many never used Smalltalk, Common Lisp, Self, Dylan,...
| so they think CPython is the only way there is, plus they
| already have their computer resources wasted by tons of
| Electron apps anyway, that they hardly question CPython's
| performance, or lack thereof.
| wiseowise wrote:
| Has it ever crossed your mind that they just like Python?
| pjmlp wrote:
| And slow code, yes it has cross my mind.
|
| Usually they also call Python to libraries that are 95% C
| code.
| Fraterkes wrote:
| The hypocrisy gets even worse: the C code then gets
| compiled to assembly!
| Hilift wrote:
| It isn't. There are many things Python isn't up to the task.
| However, it has been around forever, and some influential niche
| verticals like cyber security Python was as or more useful than
| native tooling, and works on multiple platforms.
| bluGill wrote:
| Most of the time you are waiting on a human or at least
| something other than the cpu. Most of the time more time is
| spent by the programmer writing the code than all the users
| combined waiting for the program to run.
|
| between those two, most often performance is just fine to trade
| off.
| Krssst wrote:
| Performance is worthless if the code isn't correct. It's easier
| to write correct code reasonably quickly in Python in simple
| cases (integers don't overflow like in C, don't wrap around
| like in C#, no absurd implicit conversions like in other
| scripting languages).
|
| Also you don't need code to be fast a lot of the time. If you
| just need some number crunching that is occasionally run by a
| human, taking a whole second is fine. Pretty good replacement
| for shell scripting too.
| Spivak wrote:
| I mean you can see it with your own experience, folks will
| post a 50 line snippet of ordinary C code in an blog post
| which looks like you're reading a long dead ancient language
| littered with macros and then be like "this is a lot to grok
| here's the equivalent code in Python / Ruby" and it's 3 lines
| and completely obvious.
|
| Folks on HN are so weird when it comes to why these languages
| exist and why people keep writing in them. For all their
| faults and dynamism and GC and lack of static typing in the
| real world with real devs you get code that is more correct
| written faster when you use a higher level language. It's
| Go's raison d'etre.
| pavon wrote:
| But many of the language decisions that make Python so slow
| don't make code easier to write correctly. Like monkey
| patching; it is very powerful and can be useful, but it can
| also create huge maintainability issues, and its existence as
| a feature hinders making the code faster.
| Mawr wrote:
| I don't think anyone aware of this thinks it's a good tradeoff.
|
| The more interesting question is why the tradeoff was made in
| the first place.
|
| The answer is, it's relatively easy for us to see and
| understand the impact of these design decisions because we've
| been able to see their outcomes over the last 20+ years of
| Python. Hindsight is 20/20.
|
| Remember that Python was released in 1991, before even Java.
| What we knew about programming back then vs what we know now is
| very different.
|
| Oh and also, these tradeoffs are very hard to make in general.
| A design decision that you may think is irrelevant at the time
| may in fact end up being crucial to performance later on, but
| by that point the design is set in stone due to backwards
| compatibility.
| nromiun wrote:
| I really hope PyPy gets more popular so that I don't have to
| argue Python is pretty fast for the nth time.
|
| Even if you have to stick to CPython, Numba, Pythran etc, can
| give you amazing performance for minimal code changes.
| meinersbur wrote:
| Is it just me or does the talk actually confirm all its Python
| "myths and fairy tales"?
| daneel_w wrote:
| It confirms that Python indeed has poor executional
| performance.
| xg15 wrote:
| Well, the fairy tale was that Python was fast, or "fast enough"
| or "fast if we could compile it and get rid of the GIL".
| btown wrote:
| I think an important bit of context here is that computers are
| very, very good at speculative happy-path execution.
|
| The examples in the article seem gloomy: how could a JIT possibly
| do all the checks to make sure the arguments aren't funky before
| adding them together, in a way that's meaningfully better than
| just running the interpreter? But in practice, a JIT _can_ create
| code that does these checks, and modern processors will branch-
| predict the happy path and effectively run it _in parallel with_
| the checks.
|
| JavaScript, too, has complex prototype chains and common use of
| boxed objects - but v8 has made common use cases extremely fast.
| I'm excited for the future of Python.
| DanielHB wrote:
| The main problem is when the optimizations silently fail
| because of seemingly innocent changes and suddenly your
| performance tanked 10x. This is a problem with any language
| really (CPU cache misses are a thing afterall and many non-
| dynamic languages have boxed objects) but it is a much, much
| worse in dynamic languages like Python, JS and Ruby.
|
| Most of the time it doesn't matter, most high-throughput python
| code just invokes C/C++ where these concerns are not as big of
| a problem. Most JS code just invokes C/C++ browser DOM objects.
| As long as the hot-path is not in those languages you are not
| at such high risk of "innocent change tanked performance"
|
| Even server-side most JS/Python/Ruby code is just simple HTTP
| stack handlers and invoking databases and shuffling data
| around. And often large part of the process of handling a
| request (encoding JSON/XML/etc, parsing HTTP messages, etc) can
| be written in lower-level languages.
| nxobject wrote:
| To be slightly flip, we could say that the Lisp Machine CISC-
| supports-language full stack design philosophy lives on in how
| massive M-series reorder buffers and ILP supports
| JavaScriptCore.
| jerf wrote:
| That makes it so that in absolute terms, Python is not as slow
| as you might naively expect.
|
| But we don't measure programming language performance in
| absolute terms. We measure them in relative terms, generally
| against C. And while your Python code is speculating about how
| this Python object will be unboxed, where its methods are, how
| to unbox its parameters, what methods will be called on those,
| etc., compiled code is speculating on _actual code_ the
| programmer has written, running _that_ in parallel, such that
| by the time the Python interpreter is done speculating
| successfully on how some method call will resolve with actual
| objects the compiled code language is now done with ~50 lines
| of code of similar grammatical complexity. (Which is a sloppy
| term, since this is a bit of a sloppy conversation, but
| consider a series "p.x = y"-level statements in Python versus
| C as the case I'm looking at here.)
|
| There's no way around it. You can spend your amazingly capable
| speculative parallel CPU on churning through Python
| interpretation or you can spend it on doing real work, but you
| can't do both.
|
| After all, the interpreter is just C code too. It's not like it
| gets access to special speculation opcodes that no other
| program does.
| Demiurge wrote:
| I love this "real work". Real work, like writing linked
| lists, array bounds checking, all the error handling for
| opening files, etc, etc? There is a reason Python and C both
| have a use case, and it's obvious Python will never be as
| fast as C doing "1 + 1". The real "real work" is in getting
| stuff done, not just making sure the least amount of cpu
| cycles are used to accomplish some web form generation.
|
| Anyway, I think you're totally right, in your general
| message. Python will never be the fastest language in all
| contexts. Still, there is a lot of room for optimization, and
| given it's a popular language, it's worth the effort.
| btown wrote:
| To put it another way, I choose Python _because_ of its
| semantics around dynamic operator definition, duck typing
| etc.
|
| Just because I don't write the bounds-checking and type-
| checking and dynamic-dispatch and error-handling code
| myself, doesn't make it any less a conscious decision I
| made by choosing Python. It's all "real work."
| kragen wrote:
| Type checking and bounds checking aren't "real work" in
| the sense that, when somebody checks their bank account
| balance on your website or applies a sound effect to an
| audio track in their digital audio workstation, they
| don't think, "Oh good! The computer is going to do some
| type checking for me now!" Type checking and bounds
| checking may be good means to an end, but they are not
| the end, from the point of view of the outside world.
|
| Of course, the bank account is only a means to the end of
| paying the dentist for installing crowns on your teeth
| and whatnot, and the sound effect is only a means to the
| end of making your music sound less like Daft Punk or
| something, so it's kind of fuzzy. It depends on what
| people are _thinking_ about achieving. As programmers,
| because we know the experience of late nights debugging
| when our array bounds overflow, we think of bounds
| checking and type checking as ends in themselves.
|
| But only up to a point! Often, type checking and bounds
| checking can be done at compile time, which is more
| efficient. When we do that, as long as it works
| correctly, we _never_ + feel disappointed that our
| program isn't doing run-time type checks. We never look
| at our running programs and say, "This program would be
| better if it did more of its type checks at runtime!"
|
| No. Run-time type checking is purely a deadweight loss:
| wasting some of the CPU on computation that doesn't move
| the program toward achieving the goals we were trying to
| achieve when we wrote it. It may be a worthwhile tradeoff
| (for simplicity of implementation, for example) but we
| must weigh it on the debit side of the ledger, not the
| credit side.
|
| ______
|
| + Well, unless we're trying to debug a PyPy type-
| specialization bug or something. Then we might work hard
| to construct a program that forces PyPy to do more type-
| checking at runtime, and type checking does become an
| end.
| rightbyte wrote:
| > and the sound effect is only a means to the end of
| making your music sound less like Daft Punk or something
|
| What do you mean. Daft Punk is not daft punk. Why single
| them out :)
| kragen wrote:
| Well, originally I wrote "more like Daft Punk", but then
| I thought someone might think I was stereotyping
| musicians as being unoriginal and derivative, so I swung
| the other way.
| jerf wrote:
| I can't figure out what your first paragraph is about. The
| topic under discussion is Python performance. We do not
| generally try to measure something as fuzzy as "real work"
| as you seem to be using the term in performance discussions
| because what even is that. There's a reason my post
| referenced "lines of code", still a rather fuzzy thing
| (which I _already pointed out_ in my post), but it gets
| across the idea that while Python has to do a lot of work
| for "x.y = z" for all the things that "x.y" _might_ mean
| including the possibility that the user has changed what it
| means since the last time this statement ran, compiled
| languages generally do over an order of magnitude less
| "work" in resolving that.
|
| This is one of the issues with Python I've pointed out
| before, to the point I suggest that someone could make a
| language around this idea: https://jerf.org/iri/post/2025/p
| rogramming_language_ideas/#s... In Python you pay and pay
| and pay and pay and pay for all this dynamic functionality,
| but in practice you aren't actually dynamically modifying
| class hierarchies and attaching arbitrary attributes to
| arbitrary instances with arbitrary types. You pay for the
| feature but you benefit from them far less often than the
| number of times Python is paying for them. Python spends
| rather a lot of time spinning its wheels double-checking
| that it's still safe to do the thing it thinks it can do,
| and it's hard to remove that even in JIT because it is
| extremely difficult to _prove_ it can eliminate those
| checks.
| mmcnl wrote:
| You claimed churning through Python interpretation is not
| "real work". You now correctly ask the question: what is
| "real work"? Why is interpreting Python not real work, if
| it means I don't have to check for array bounds?
| Demiurge wrote:
| I understand what you're saying. In a way, my comment is
| actually off-topic to most of your comment. What I was
| saying in my first paragraph is that the words you use in
| your context of a language runtime in-effeciency, can be
| used to describe why these in-effeciences exist, in the
| context of higher level processes, like business
| effeciency. I find your choice of words amusing, given
| the juxtoposition of these contexts, even saying "you
| pay, pay, pay".
| Calavar wrote:
| I believe they are talking about the processor doing real
| work, not the programmer.
| Demiurge wrote:
| Yeah, I get it, but I found the choice of words funny,
| because these words can apply in the larger context. It's
| like saying, Python transfers work from your man hours to
| cpu hours :)
| dragonwriter wrote:
| > After all, the interpreter is just C code too.
|
| What interpreter? We're talking about JITting Python to
| native code.
| qaq wrote:
| Welp there is Mojo so looks like soon you will not really
| need to care that much. Prob will get better performance than
| C too.
| jerf wrote:
| I've been hearing promises about "better than C"
| performance from Python for over 25 years. I remember them
| on comp.lang.python, back on that Usenet thing most people
| reading this have only heard about.
|
| At this point, you just shouldn't be making that promise.
| Decent chance that promise is already older than you are.
| Just let the performance be what it is, and if you need
| better performance today, be aware that there are a wide
| variety of languages of all shapes and sizes standing by to
| give you ~25-50x better single threaded performance and
| even more on multi-core performance _today_ if you need it.
| If you need it, waiting for Python to provide it is not a
| sensible bet.
| hnfong wrote:
| You're probably right, Mojo seems to be more "python-
| like" than actually source-compatible with python. Bunch
| of features notably classes are missing.
| qaq wrote:
| Give em a bit of time it's pretty young lang
| qaq wrote:
| I am a bit older than Python :). I imagine creator of
| clang and LLVM has fairly good grasp on making things
| performant. Think of Mojo as Rust with better ergonomics
| and more advanced compiler that you can mix and match
| with regular python.
| lenkite wrote:
| Mojo feels less like a real programming language for humans
| and primarily a language for AI's. The docs for the
| language immediately dive into chatbots and AI prompts.
| qaq wrote:
| I mean thats the use case they care about for obvious
| reasons but it's not the only use case
| CraigJPerry wrote:
| > And while your Python code is speculating about how this
| Python object will be unboxed
|
| This is wrong i think? The GP is talking about JIT'd code.
| fpoling wrote:
| Although JS supports prototype mutations, the with operator and
| other constructs that make optimization harder, typical JS code
| does not use that. Thus JIT can add few checks for presence of
| problematic constructions to direct it to a slow path while
| optimizing not particularly big set of common patterns. And
| then the JS JIT does not need to care much about calling
| arbitrary native code as the browser internals can be
| adjusted/refactored to tune to JIT needs.
|
| With Python that does not work. There are simply more
| optimization-unfriendly constructs and popular libraries use
| those. And Python calls arbitrary C libraries with fixed ABI.
|
| So optimizing Python is inherently more difficult.
| josefx wrote:
| > but v8 has made common use cases extremely fast. I'm excited
| for the future of Python.
|
| Isn't v8 still entirely single threaded with limited message
| passing? Python just went through a lot of work to make
| multithreaded code faster, it would be disappointing if it had
| to scrap threading entirely and fall back to multiprocessing on
| shared memory in order to match v8.
| zozbot234 wrote:
| Multithreaded code is usually bottlenecked by memory
| bandwidth, even more so than raw compute. C/C++/Rust are
| great at making efficient use of memory bandwidth, whereas
| scripting languages are rather wasteful of it by comparison.
| So I'm not sure that multithreading will do much to bridge
| the performance gap between binary compiled languages and
| scripting languages like Python.
| loeg wrote:
| JS is single-threaded. Python isn't.
| mcdeltat wrote:
| I wonder if branch prediction can still hide the performance
| loss when the happy path checks become large/complex. Branch
| prediction is a very low level optimisation. And if the
| predictor is right you don't get everything for free. The CPU
| must still evaluate the condition, which takes resources,
| albeit it's no longer on the critical path. However I'd think
| the CPU would stall if it got too far ahead of the condition
| execution (ultimately all the code must execute before the
| program completes). Perhaps given the nature of Python, the
| checks would be so complex that in a tight loop they'd exert
| significant resource pressure?
| mrkeen wrote:
| I didn't read with 100% focus, but this lwn account of the talk
| seemed to confirm those myths instead of debunking.
| diegocg wrote:
| Yep, for me it confirms all the reasons why I think python is
| slow and not a good language for anything that goes beyond a
| script. I work with it everyday, and I have learned that I
| can't even trust tooling such as mypy because it's full of
| corner cases - turns out that not having a clear type design in
| a language is not something that can be fundamentally fixed by
| external tools. Tests are the only thing that can make me trust
| code written in this language
| jdhwosnhw wrote:
| > Yep, for me it confirms all the reasons why I think python
| is slow
|
| Yes, that is literally the explicit point of the talk. The
| first myth of the article was "python is not slow"
| postexitus wrote:
| A more careful reading of the article is required.
|
| The first myth is "Python is not slow" - it is debunked, it is
| slow.
|
| The second myth is ""it's just a glue language / you just need
| to rewrite the hot parts in C/C++" - it is debunked, just
| rewriting stuff in C/Rust does not help.
|
| The third myth is " Python is slow because it is interpreted" -
| it is debunked, it is not slow only because it is interpreted.
| mrkeen wrote:
| Thanks! As a Python outsider, I was primed for a Python
| insider to be trying to change my views, not confirm them,
| and I did indeed misread.
| zahlman wrote:
| My impression is that GvR conceded a long time ago that
| Python is slow, and doesn't particularly care (and
| considers it trolling to keep bringing it up). The point is
| that in the real world this doesn't matter a lot of the
| time, at least as long as you aren't making big-O mistakes
| -- and easier-to-use languages make it easier to avoid
| those mistakes.
|
| For that matter, I recently saw a talk in the Python world
| that was about convincing people to let their computer do
| more work locally in general, because computers really are
| just that fast now.
| ActorNightly wrote:
| >just rewriting stuff in C/Rust does not help.
|
| Except it does. The key is to figure out which part you
| actually need to go fast, and write it in C. If most of your
| use case is dominated by network latency.
|
| Overall, people seem to miss the point of Python. The best
| way to develop software is "make it work, make it good, make
| it fast" - the first part gets you to an end to end prototype
| that gives you a testable environment, the second part
| establishes the robustness and consistency, and the third
| part lets you focus on optimizing the performance with a
| robust framework that lets you ensure that your changes are
| not breaking anything.
|
| Pythons focus is on the first part. The idea is that you
| spend less time making it work. Once you have it working,
| then its much easier to do the second part (adding tests,
| type checking, whatever else), and then the third part. Now
| with LLMs, its actually pretty straightforward to take a
| python file and translate it to .c/.h files, especially with
| agents that do additional "thinking" loops.
|
| However, even given all of that, in practice you often don't
| need to move away from Python. For example, I have a project
| that datamines Strava Heatmaps (i.e I download png tiles for
| entire US). The amount of time that it took me to write it in
| Python in addition to running it (which takes about a day) is
| much shorter than it would have taken me to write it in
| C++/Rust and then run it with speedup in processing.
| IshKebab wrote:
| In fairness I wouldn't really call those "myths", just bad
| defences of Python's slowness. I don't think the people
| saying them _really_ believe it - if it came to life or
| death. They just really like Python and are trying to avoid
| the cognitive dissonance of liking a really slow language.
|
| Like, I wouldn't say it's a "myth" that Linux is easy to use.
| akkad33 wrote:
| > The first myth is "Python is not slow" - it is debunked, it
| is slow
|
| This is strange. Most people in programming community know
| python is slow. If it has any reputation, it's that it is
| quite slow
| pjmlp wrote:
| Basically, leave Python for OS and application scripting tasks,
| and as BASIC replacement for those learning to program.
| aragilar wrote:
| And yet, most of what people end up doing ends up being
| effectively OS and application scripting. Most ML projects are
| really just setting up a pipeline and telling the computer to
| go and run it. Cloud deployments are "take this yaml and
| transform it some other yaml". In as much as I don't want to
| use Fortran to parse a yaml file, I don't really want to write
| an OS (or a database) in Python. Even something like django is
| mostly deferring off tasks to faster systems, and is really
| about being a DSL-as-programming-language while still being
| able to call out to other things (e.g. ML code).
| pjmlp wrote:
| I would rather use Fortran actually, not all of us are stuck
| with Fortran 77.
|
| Ironically Fortran support is one of the reasons CUDA won
| over OpenCL.
|
| Having said that, plenty of programming languages with
| JIT/AOT toolchains have nice YAML parsers, I don't see the
| need to bother with Python for that.
| Ulti wrote:
| Feel like Mojo is worth a shoutout in this context
| https://www.modular.com/mojo Solves the issue of having a
| superset of Python in syntax where "fn" instead of "def"
| functions are assumed static typed and compilable with Numba
| style optimisations.
| _aavaa_ wrote:
| Mojo NOT being open-source is a complete non-starter.
| alankarmisra wrote:
| Genuinely curious; while I understand why we would want a
| language to be open-source (there's plenty of good reasons),
| do you have anecdotes where the open-sourceness helped you
| solve a problem?
| yupyupyups wrote:
| Not the OP, but I have needed to patch Qt due to bugs that
| couldn't be easily worked around.
|
| I have also been frustrated while trying to interoperate
| with expensive proprietary software because documentation
| was lacking, and the source code was unavailable.
|
| In one instance, a proprietary software had the source code
| "exposed", which helped me work around its bugs and use it
| properly (also poorly documented).
|
| There are of course other advantages of having that
| transparancy, like being able to independently audit the
| code for vulnerabilities or unacceptable "features", and
| fix those.
|
| Open source is oftentimes a prerequisite for us to be able
| to control our software.
| _aavaa_ wrote:
| It has helped prevent problems. I am not worried about a
| python suddenly adding a clause stating that I can't
| release a ML framework...
| Philpax wrote:
| In the earlier days of rustc, it was handy to be able to
| look at the context for a specific compiler error (this is
| before the error reporting it is now known for). Using
| that, I was able to diagnose what was wrong with my code
| and adjust it accordingly.
| Ulti wrote:
| More of a question of /will/ Mojo eventually be entirely open
| source, chunks of it already are. The intent from Modular is
| eventually it will be, just not everything all at once and
| not whilst they're internally doing loads of dev for their
| own commercial entity. Which seems fair enough to me.
| Importantly they have open sourced lots of the stdlib which
| is probably what anyone external would contribute to or want
| to change anyway? https://www.modular.com/blog/the-next-big-
| step-in-mojo-open-...
| _aavaa_ wrote:
| _When_ it has become open source I will consider building
| up expertise and a product on it. Until it has happened
| there are no guarantees that it will.
| Ulti wrote:
| Well the "expertise" is mostly just Python thats sort of
| the value prop. But yeah building an actual AI product
| ontop I'd be more worried about the early stage nature of
| Modular rather than the implementation is closed source.
| _aavaa_ wrote:
| Sure, that's the value prop of numba too. But reality is
| different.
| abhijeetpbodas wrote:
| An earlier version of the talk is at
| https://www.youtube.com/watch?v=ir5ShHRi5lw (I could not find the
| EuroPython one).
| fragebogen wrote:
| Here's a newer one https://www.youtube.com/watch?v=1uFMW0IcZuw
| ic_fly2 wrote:
| It's a good article on speed.
|
| But honestly the thing that makes any of my programs slow is
| network calls. And there a nice async setup goes a long way. And
| then k8 for the scaling.
| nicolaslem wrote:
| This. I maintain an ecommerce platform written in Python. Even
| with Python being slow, less than 30% of our request time is
| spent executing code, the rest is talking to stuff over the
| network.
| stackskipton wrote:
| SRE here, that horizontal scaling with Python has impacts as
| it's more connections to database and so forth so you are
| impacting things even if you don't see it.
| ic_fly2 wrote:
| Meh, even with basic async I've been able to overload azure's
| premium ampq offering memory capacity.
|
| But yes managing db connections is a pain. But I don't think
| it's any better in Java (my only other reference at this
| scale)
| gen220 wrote:
| I think articles like this cast too wide a net when they say
| "performance" or "<language> is fast/slow".
|
| A bunch of SREs discussing which languages/servers/runtimes are
| fast/slow/efficient in comparable production setups would give
| more practical guidance.
|
| If you're building an http daemon in a traditional three-tiered
| app (like a large % of people on HN), IME, Python has quietly
| become a great language in that space, compared to its peers,
| over the last 8 years.
| ntoll wrote:
| Antonio is a star. He's also a very talented artist.
| dgan wrote:
| "Rewrite the hot path in C/C++" is also a landmine because how
| inefficient the boundary crossing is. so you really need
| "dispatch as much as possible at once" instead of continuously
| calling the native code
| aragilar wrote:
| Isn't this just a specific example of the general rule of
| pulling out repeated use of the same operation in a loop? I'm
| not sure calls out to C are specifically slow in CPython (given
| many operations are really just calling C underneath).
| KeplerBoy wrote:
| The key is to move the entire loop to a compiled language
| instead of just the inner operation.
| dgan wrote:
| they are specifically slow. there was a project which
| measured FFI cost in different languages, and python is
| awfully bad
| Twirrim wrote:
| The serialisation cost of translating data representations
| between python and C (or whatever compiled language you're
| using) is notable. Instead of having the compiled code sit in
| the centre of a hot loop, it's significantly better to have
| the loop in the compiled code and call it once
|
| https://pythonspeed.com/articles/python-extension-
| performanc...
| morkalork wrote:
| The overhead of copying and moving data around in Python is
| frustrating. When you are CPU bound on a task, you can't
| use threads (which do have shared memory) because of the
| GIL, so you end up using whole processes and then waste a
| bunch of cycles communicating stuff back and forth. And
| yes, you can create shared memory buffers between Python
| processes but that is nowhere near as smooth as say two
| Java threads working off a shared data structure that's got
| synchronized sprinkled on it.
| kragen wrote:
| You don't have to serialize data or translate data
| representations between CPython and C. That article is
| wrong. What's slow in their example is storing data (such
| as integers) the way CPython likes to store it, not
| translating that form to a form easily manipulated in C,
| such as a native integer in a register. That's just a
| single MOV instruction, once you get past all the type
| checking and reference counting.
|
| You can avoid that problem to some extent by implementing
| your own data container as part of your C extension (the
| article's solution #1); frobbing that from a Python loop
| can still be significantly faster than allocating and
| deallocating boxed integers all the time, with dynamic
| dispatch and reference counting. But, yes, to really get
| reasonable performance you want to not be running bytecodes
| in the Python interpreter loop at all (the article's
| solution #2).
|
| But that's not because of serialization or other kinds of
| data format translation.
| ActorNightly wrote:
| >how inefficient the boundary crossing is
|
| For 99.99% of the programs that people write, the modern M.2
| NVME hard drives are plenty fast, and thats the laziest way to
| load data into a C extension or process.
|
| Then there is unix pipes which are sufficiently fast.
|
| Then there is shared memory, which basically involves no
| loading.
|
| As with Python, all depends on the setup.
| zahlman wrote:
| The problem isn't loading the data, but marshalling it (i.e,
| transforming it into a data structure that makes sense for
| the faster language to operate on, and back again). Or if you
| don't transform (or the data is special-cased enough that no
| transformation makes sense) then the available optimizations
| become much more limited.
| ActorNightly wrote:
| Thats all just design. Nothing having to do with particular
| language.
| jononor wrote:
| There are several datastructures for numeric data that do
| not need marshalling, and are suitable for very efficient
| interoperetion between Python and C/C++/Rust etc. Examples
| include array.array (in standard library), numpy.array, and
| PyArrow.
| didip wrote:
| These days it's "rewrite in Rust".
|
| Typically Python is just the entry and exit point (with a
| little bit of massaging), right?
|
| And then the overwhelming majority of the business logic is
| done in Rust/C++/Fortran, no?
| 01HNNWZ0MV43FF wrote:
| With computer vision you end up wanting to read and write to
| huge buffers that aren't practical to serialize and are
| difficult to share. And even allocating and freeing multi-
| megabyte framebuffers at 60 FPS can put a little strain on
| the allocator, so you want to reuse them, which means you
| have to think about memory safety.
|
| That is probably why his demo was Sobel edge detection with
| Numpy. Sobel can run fast enough at standard resolution on a
| CPU, but once that huge buffer needs to be read or written
| outside of your fast language, things will get tricky.
|
| This also comes up in Tauri, since you have to bridge between
| Rust and JS. I'm not sure if Electron apps have the same
| problem or not.
| aeroevan wrote:
| In the data science/engineering world apache arrow is the
| bridge between languages, so you don't actually need to
| serialize into language specific structures which is really
| nice
| jononor wrote:
| The "numpy" Sobel code is not that good, unfortunately -
| all the iteration is done in Python, so there is not much
| benefit from involving numpy. If one would use say
| scipy.convolve2d on a numpy.array, it would be much faster.
| pavon wrote:
| One use of Python as a "glue language" I've seen that actually
| avoids the performance problems of those bindings is GNU Radio.
| That is because its architecture basically uses python as a
| config language that sets up the computation flow-graph at
| startup, and then the rest of runtime is entirely in compiled
| code (generally C++). Obviously that approach isn't applicable
| to all problems, but it really shaped my opinion of when/how a
| slow glue language is acceptable.
| slt2021 wrote:
| This. Use python only for control flow, and offload data flow
| to a library that is better suited for this: written in C,
| uses packed structs, cache friendly, etc.
|
| if you want multiprocessing, use the multiprocessing library,
| scatter and gather type computation, etc
| IshKebab wrote:
| And it's not just inefficiency. Even with fancy FFI generators
| like PyO3 or SWIG, adding FFI adds a ton of work, complexity,
| makes debugging harder, distribution harder, etc.
|
| In my opinion in most cases where you might want to write a
| project in two languages with FFI, it's usually better not to
| and just use one language even if that language isn't optimal.
| In this case, just write the whole thing in C++ (or Rust).
|
| There are some exceptions but generally FFI is a _huge_ cost
| and Python doesn 't bring enough to the table to justify its
| use if you are already using C++.
| robmccoll wrote:
| Python as a language will likely never have a "fast"
| implementation and still be Python. It is way too dynamic to be
| predictable from the code alone or even an execution stream in a
| way that allows you to simplify the actual code that will be
| executed at runtime either through AOC or JIT. The language is
| itself is also quite large in terms of syntax and built-in
| capability at this point which makes new feature-conplete
| implementations that don't make major trade offs quite
| challenging. Given how capable LLMs are at translating code, it
| seems like the perfect time to build a language with similar
| syntax, but better scoped behavior, stricter rules around typing,
| and tooling to make porting code and libraries automated and
| relatively painless. What would existing candidates be and why
| won't they work as a replacement?
| pjmlp wrote:
| Self and Smalltalk enter the room.
|
| As for the language with similar syntax, do you want Nim, Mojo
| or Scala 3?
| BlackFly wrote:
| The secret as stated is the comlexity of a JIT. In practice,
| that dynamism just isn't used much in practice and in
| particular in optimization targets. The JIT analyses the code
| paths, sees that no writes to the target are possible so treats
| it as a constant.
|
| Java has similar levels of dynamism-with invokedynamic
| especially, but already with dynamic dispatch-in practice the
| JIT monomorphises to a single class even though by default
| classes default to non-final in Java and there may even be
| multiple implementations known to the JVM when it
| monomorphises. Such is the strength of the knowledge that a JIT
| has compared to a local compiler.
| pjmlp wrote:
| Yes, Java syntax might look like C++, but the execution
| semantics are closer to Objective-C and Smalltalk, which is
| why adopting StrongTalk JIT for Java Hotspot was such a win.
| acmj wrote:
| Pypy is 10x faster and is compatible with most cpython code.
| IMHO it was a big mistake not to adopt JIT during the 2-to-3
| transition.
| cestith wrote:
| That "most" is doing a big lift there. At some point you
| might consider that you're actually programming in the
| language of Pypy and not pure Python. It's effectively a
| dialect of the language like Turbo Pascal vs ISO Pascal or
| RPerl instead of Perl.
| cma wrote:
| Most is more CPython code than python 3 was compatible
| with. But the port of the broken code was likely much
| easier than if it had moved to a JIT at the same time too.
| rirze wrote:
| Isn't there an incoming JIT in 3.14?
| nu11ptr wrote:
| The primary focus here is good and something I hadn't considered:
| python memory being so dynamic leads to poor cache locality.
| Makes sense. I will leave that to others to dig into.
|
| That aside, I was expecting some level of a pedantic argument,
| and wasn't disappointed by this one:
|
| "A compiler for C/C++/Rust could turn that kind of expression
| into three operations: load the value of x, multiply it by two,
| and then store the result. In Python, however, there is a long
| list of operations that have to be performed, starting with
| finding the type of p, calling its __getattribute__() method,
| through unboxing p.x and 2, to finally boxing the result, which
| requires memory allocation. None of that is dependent on whether
| Python is interpreted or not, those steps are required based on
| the language semantics."
|
| The problem with this argument is the user isn't trying to do
| these things, they are trying to do multiplication, so the fact
| that the lang. has to do all things things in the end DOES mean
| it is slow. Why? Because if these things weren't done, the end
| result could still be achieved. They are pure overhead, for no
| value in this situation. Iow, if Python had a sufficiently
| intelligent compiler/JIT, these things could be optimized away
| (in this use case, but certainly not all). The argument is akin
| to: "Python isn't slow, it is just doing a lot of work". That
| might be true, but you can't leave it there. You have to ask if
| this work has value, and in this case, it does not.
|
| By the same argument, someone could say that any interpreted
| language that is highly optimized is "fast" because the
| interpreter itself is optimized. But again, this is the wrong way
| to think about this. You always have to start by asking "What is
| the user trying to do? And (in comparison to what is considered a
| fast language) is it fast to compute?". If the answer is "no",
| then the language isn't fast, even if it meets the expected
| objectives. Playing games with things like this is why users get
| confused on "fast" vs "slow" languages. Slow isn't inherently
| "bad", but call a spade a spade. In this case, I would say the
| proper way to talk about this is to say: "It has a fast
| interpreter". The last word tells any developer with sufficient
| experience what they need to know (since they understand
| statically compiled/JIT and interpreted languages are in
| different speed classes and shouldn't be directly compared for
| execution speed).
| andylei wrote:
| The previous paragraph is
|
| > Another "myth" is that Python is slow because it is
| interpreted; again, there is some truth to that, but
| interpretation is only a small part of what makes Python slow.
|
| He concedes its slow, he's just saying it's not related to how
| interpreted it is.
| nu11ptr wrote:
| I would argue this isn't true. It is a big part of what makes
| it slow. The fastest interpreted languages are one to two
| orders of magnitude slower than for example C/C++/Rust. If
| your language does math 20-100 times slower than C, it isn't
| fast from a user perspective. Full stop. It might, however,
| have a "fast interpreter". Remember, the user doesn't care if
| it is a fast for an interpreted language, they are just
| trying to obtain their objective (aka do math as fast as
| possible). They can get cache locality perfect, and Python
| would still be very slow (from a math/computation
| perspective).
| nyrikki wrote:
| The 200-100 times slower is a bit cherry picked, but use
| case does matter.
|
| Typically from a user perspective, the initial starting
| time is either manageable or imperceptible in the cases of
| long running services, although there are other costs.
|
| If you look at examples that make the above claim, they are
| almost always tiny toy programs where the cost of producing
| byte/machine code isn't easily amortized.
|
| This quote from the post is an oversimplification too:
|
| > But the program will then run into Amdahl's law, which
| says that the improvement for optimizing one part of the
| code is limited by the time spent in the now-optimized code
|
| I am a huge fan of Amdahl's law, but also realize it is
| pessimistic and most realistic with parallelization.
|
| It runs into serious issues when you are multiprocessing vs
| parallel processing due to preemption, etc .
|
| Yes you still have the costs of abstractions etc...but in
| today's world, zero pages on AMD, 16k pages and a large
| number of mapped registers on arm, barrel shifters etc...
| make that much more complicated especially with C being
| forced into trampolines etc...
|
| If you actually trace the CPU operations, the actual
| operations for 'math' are very similar.
|
| That said modern compilers are a true wonder.
|
| Interpreted language are often all that is necessary and
| sufficient. Especially when you have Internet, database and
| other aspects of the system that also restrict the benefits
| of the speedups due to...Amdahl's law.
| nu11ptr wrote:
| I'm not so much cherry picking as I am specifically
| talking compute (not I/O,stdlib) performance. However,
| when measured for general purpose tasks, that would
| involve compute and things like I/O, stdlib performance,
| etc., Python on the whole is typically NOT 20-100x times
| slower for a given task. Its I/O layer is written in C
| like many other languages, so the moment you are waiting
| on I/O you have leveled the playing field. Likewise,
| Python has a very fast dict implementation in C, so when
| doing heavy map work, you also amortorize the time
| between the (brutally slow) compute and the very fast
| maps.
|
| In summary, it depends. I am talking about compute
| performance, not I/O or general purpose task
| benchmarking. Yes, if you have a mix of compute and I/O
| (which admittedly is a typical use case), it isn't going
| to be 20-100x slower, but more likely "only" 3-20x
| slower. If it is nearly 100% I/O bound, it might not be
| any slower at all (or even faster if properly buffered).
| If you are doing number crunching (w/o a C lib like
| NumPy), your program will likely be 40-100x slower than
| doing it in C, and many of these aren't toy programs.
| nyrikki wrote:
| Even with compute performance it is probably closer than
| you expect.
|
| Python isn't evaluated line-by-line, even in micropython,
| which is about the only common implementation that
| doesn't work in the same way.
|
| Cython VM will produce an AST of opcodes, and binary
| operations just end up popping off a stack, or you can
| hit like pypy.
|
| How efficiently you can keep the pipeline fed is more
| critical than computation costs. int a
| = 5; int b = 10; int sum = a + b;
|
| Is compiled to: MOV EAX, 5
| MOV EBX, 10 ADD EAX, EBX MOV
| [sum_variable]
|
| In the PVM binary operations remove the top of the stack
| (TOS) and the second top-most stack item (TOS1) from the
| stack. They perform the operation, and put the result
| back on the stack.
|
| That pop, pop isn't much more expensive on modern CPUs
| and some C compilers will use a stack depending on many
| factors. And even in C you have to use structs of arrays
| etc... depending on the use case. Stalled pipelines and
| fetching due to the costs is the huge difference.
|
| It is the setup costs, GC, GIL etc... that makes python
| slower in many cases.
|
| While I am not suggesting it is as slow as python, Java
| is also byte code, and often it's assumptions and design
| decisions are even better or at least nearly equal to C
| in the general case unless you highly optimize.
|
| But the actual equivalent computations are almost
| identical, optimizations that the compilers make differ.
| andylei wrote:
| i'll answer your argument with the initial paragraph you
| quoted:
|
| > A compiler for C/C++/Rust could turn that kind of
| expression into three operations: load the value of x,
| multiply it by two, and then store the result. In Python,
| however, there is a long list of operations that have to be
| performed, starting with finding the type of p, calling its
| __getattribute__() method, through unboxing p.x and 2, to
| finally boxing the result, which requires memory
| allocation. None of that is dependent on whether Python is
| interpreted or not, those steps are required based on the
| language semantics.
| immibis wrote:
| Typically a dynamic language JIT handles this by
| observing what actual types the operation acts on, then
| hardcoding fast paths for the one type that's actually
| used (in most cases) or a few different types. When the
| type is different each time, it has to actually do the
| lookup each time - but that's very rare.
|
| i.e.
|
| if(a->type != int_type || b->type != int_type)
| abort_to_interpreter();
|
| result = ((intval*)a)->val + ((intval*)b)->val;
|
| The CPU does have to execute both lines, but it does them
| in parallel so it's not as bad as you'd expect. Unless
| you abort to the interpreter, of course.
| ActivePattern wrote:
| A "sufficiently smart compiler" can't legally skip Python's
| semantics.
|
| In Python, p.x * 2 means dynamic lookup, possible descriptors,
| big-int overflow checks, etc. A compiler can drop that only if
| it proves they don't matter or speculates and adds guards--
| which is still overhead. That's why Python is slower on scalar
| hot loops: not because it's interpreted, but because its
| dynamic contract must be honored.
| pjmlp wrote:
| In Smalltalk, p x * 2 has that flow that as well, and even
| worse, lets assume the value returned by p x message
| selector, does not understand the * message, thus it will
| break into the debugger, then the developer will add the *
| message to the object via the code browser, hit save, and
| exit the debugger with redo, thus ending the execution with
| success.
|
| Somehow Smalltalk JIT compilers handle it without major
| issues.
| ActivePattern wrote:
| Smalltalk JITs make p x * 2 fast by speculating on types
| and inserting guards, not by skipping semantics. Python
| JITs do the same (e.g. PyPy), but Python's dynamic features
| (like __getattribute__, unbounded ints, C-API hooks) make
| that harder and costlier to optimize away.
|
| You get real speed in Python by narrowing the semantics
| (e.g. via NumPy, Numba, or Cython) not by hoping the
| compiler outsmarts the language.
| pjmlp wrote:
| People keep forgetting about image based semantics
| development, debugger, meta-classes, messages like
| _becomes:_ ,...
|
| There is to say everything dynamic that can be used as
| Python excuse, Smalltalk and Self, have it, and double
| up.
| tekknolagi wrote:
| If I may toot my own horn:
| https://bernsteinbear.com/blog/typed-python/
| cma wrote:
| edit and continue is available on lots of JIT-runtime
| languages
| nu11ptr wrote:
| First, we need to add the word 'only': "not ONLY because it's
| interpreted, but because its dynamic contract must be
| honored." Interpreted languages are slow by design. This
| isn't bad, it just is a fact.
|
| Second, at most this describes WHY it is slow, not that it
| isn't, which is my point. Python is slow. Very slow (esp. for
| computation heavy workloads). And that is okay, because it
| does what it needs to do.
| rstuart4133 wrote:
| > The problem with this argument is the user isn't trying to do
| these things,
|
| I'd argue differently. I'd say the the problem isn't that the
| user is doing those things, it's that the language doesn't know
| what he's trying to do.
|
| Python's explicit goal was always ergonomics, and it was always
| ergonomics over speed or annoying compile time error messages.
| "Just run the code as written dammit" was always the goal. I
| remember when the never class model was introduced,
| necessitating the introduction of __get_attribute__. My first
| reaction as a C programmer was "gee you took a speed hit
| there". A later reaction was to use it to twist the new system
| into something it's inventors possibly never thought of. It was
| a LR(1) parser, that let you write the grammars as regular
| Python statements.
|
| While they may not have thought abusing the language in that
| particular way, I'm sure the explicit goal was to create a
| framework that any idea to be expressed with minimal code.
| Others also used to hooks they provided into the way the
| language builds to create things like pydantic and spyne. Spyne
| for example lets you express the on-the-wire serialisation
| formats used by RPC as Python class declarations, and then
| compile them into JSON, xml, SOAP of whatever. Sqlalchamey lets
| you express SQL using Python syntax, although in a more
| straightforward way.
|
| All of them are very clever in how they twist the language.
| Inside those frameworks, "a = b + c" does not mean "add b to c,
| and place the result in a". In the LR(1) parser for example it
| means "there is a production called 'a', that is a 'b' followed
| by a 'c'". 'a' in that formulation holds references to 'b' and
| 'c'. Later the LR(1) parser will consume that, compiling it
| into something very different. The result is a long way from
| two's compliment addition.
|
| It is possible to use a power powerful type systems in a
| similar way. For example I've seen FPGA designs expressed in
| Scalar. However, because Scalar's type system insists on
| knowing what is going on at compile time, Scalar had a fair
| idea of what the programmer is building. The compile result
| isn't going to be much slower than any other code. Python
| achieved the same flexibility by abandoning type checking at
| compile time almost entirely, pushing it all to run time. Thus
| the compiler has no idea of what going to executed in the end
| (the + operation in the LR parser only gets executed once for
| example), which is what I said above "it's that the language
| doesn't know what the programmer is trying to do".
|
| You argue that since it's an interpreted language, it's the
| interpreters jobs to figure out what the programmer is trying
| to do at run time. Surely it can figure out that "a = b + c"
| really is adding two 32 bit integers that won't overflow.
| That's true, but that creates a low of work to do at run time.
| Which is a round about way of saying the same thing as the
| talk: electing to do it at run time means the language chose
| flexibility over speed.
|
| You can't always fix this in an interpreter. Javascript has
| some of the best interpreters around, and they do make the
| happy path run quickly. But those interpreters come with
| caveats, usually of the form "if you muck around with the
| internals of classes, by say replacing function definitions at
| run time, we abandon all attempts to JIT it". People don't
| typically do such things in Javascript, but as it happens,
| Python's design with it's meta classes, dynamic types created
| with "type(...)", and "__new__(..)" almost could be said
| encourage that coding style. That is, again, a language design
| choice, and it's one that favours flexibility over speed.
| teo_zero wrote:
| I don't know Python so well as to propose any meaningful
| contribution, but it seems to me that most issues would be
| mitigated by a sort of "final" statement or qualifier, that
| prohibits any further changes to the underlying data structure,
| thus enabling all the nice optimizations, tricks and shortcuts
| that compilers and interpreters can't afford when data is allowed
| to change shape under their feet.
| Fraterkes wrote:
| I assume people dislike those kinds of solutions because the
| extreme dynamism is used pretty rarely in a lot of meat and
| potatoes python scripst. So a lot of "regular" python scripts
| would have to just plaster "final" everywhere to make it as
| fast as it can be.
|
| At that point youd maybe want to have some sort of broader way
| to signify which parts of your script are dynamic. But then,
| youd have a language that can be dynamic even in how dynamic it
| is...
| game_the0ry wrote:
| I know I am going to get some hate for this from the "Python-
| stans" but..."python" and "performance" should never be
| associated with each other, and same for any
| scripting/interpreted programming language. Especially if it has
| a global interpreter lock.
|
| While performance (however you may mean that) is always a worthy
| goal, you may need to question your choice of language if you
| start hitting performance ceilings.
|
| As the saying goes - "Use the right tool for the job." Use case
| should dictate tech choices, with few exceptions.
|
| Ok, now that I have said my piece, now you can down vote me :)
| danielrico wrote:
| That's used by some people as excuse to write the most
| inefficient code.
|
| Ok, you are not competing with c++, but also you shouldn't be
| redoing all the calculations because you haven't figured the
| data access pattern..
| ahoka wrote:
| Have you read the fine article?
| throwaway6041 wrote:
| > the "Python-stans"
|
| I think the term "Pythonistas" is more widely used
|
| > you may need to question your choice of language if you start
| hitting performance ceilings.
|
| Developers should also question if a "fast" language like Rust
| is really needed, if implementing a feature takes longer than
| it would in Python.
|
| I don't like bloat in general, but sometimes it can be worth
| spinning up a few extra instances to get to market faster. If
| Python lets you implement a feature a month earlier, the new
| sales may even cover the additional infrastructure costs.
|
| Once you reach a certain scale you may need to rewrite parts of
| your system anyway, because the assumptions you made are often
| wrong.
| game_the0ry wrote:
| > Developers should also question if a "fast" language like
| Rust is really needed...
|
| Agreed.
| wiseowise wrote:
| Do you get off from bashing on languages or what?
| ActorNightly wrote:
| >"Use the right tool for the job."
|
| Python + C covers pretty much anything you really ever need to
| build, unless you are doing something with game engines that
| require the use of C++/C#. Rust is even more niche.
| crabbone wrote:
| Again and again, the most important question is "why?" not
| "how?". Python isn't made to be fast. If you wanted a language
| that can go fast, you needed to build it into the language from
| the start: give developers tools to manage memory layout, give
| developers tools to manage execution flow, hint the compiler
| about situations that present potential for optimization,
| restrict dispatch and polymorphism, restrict semantics to fewer
| interpretations.
|
| Python has none of that. It's a hyper-bloated language with
| extremely poor design choices all around. Many ways of doing the
| same thing, many ways of doing stupid things, no way of
| communicating programmer's intention to the compiler... So why
| even bother? Why not use a language that's designed by a sensible
| designer for this specific purpose?
|
| The news about performance improvements in Python just sound to
| me like spending useful resources on useless goals. We aren't
| going forward by making Python slightly faster and slightly more
| bloated, we just make this bad language even harder to get rid
| of.
| Danmctree wrote:
| The frustrating thing is that the math and AI support in the
| python ecosystem is arguably the best. These happen to also be
| topics where performance is critical and where you want things
| to be tight.
|
| c++ has great support too but often isn't usable in communities
| involving researchers and juniors because it's too hard for
| them. Startup costs are also much higher.
|
| Ans so you're often stuck with python.
|
| We desperately need good math/AI support in faster languages
| than python but which are easier than c++. c#? Java?
| adsharma wrote:
| The most interesting part of this article is the link to SPy.
| Attempts to find a subset of python that could be made
| performant.
| ajross wrote:
| Honestly that seems Sisyphean to me. The market doesn't want a
| "performant subset". The market is very well served by
| performant languages. The market wants Python's expressivity.
| The market wants duck typing and runtime-inspectable type
| hierarchies and mutable syntax and decorators. It loves it.
| It's why Python is successful.
|
| My feeling is that numba has exactly the right tactic here.
| Don't try to subset python from on high, give developers the
| tools[1] so that they can limit _themselves_ to the fast
| subset, for the code they actually want. And let them make the
| call.
|
| (The one thing numba completely fails on though is that it
| insists on using its own 150+MB build of LLVM, so it's not
| nearly as cleanly deployable as you'd hope. Come on folks, if
| you use the system libc you should be prepared to use the
| system toolchain.)
|
| [1] Simple ones, even. I mean, to first approximation you just
| put "@jit" on the stuff you want fast and make sure it sticks
| to a single numeric type and numpy arrays instead of python
| data structures, and you're done.
| zozbot234 wrote:
| > The market wants duck typing and runtime-inspectable type
| hierarchies and mutable syntax and decorators. It loves it.
|
| These features have one thing in common: they're only useful
| for prototype-quality throwaway code, if at all. Once your
| needs shift to an increased focus on production use and
| maintainability, they become serious warts. It's not just
| about performance (though it's obviously a factor too),
| there's real reasons why most languages don't do this.
| ajross wrote:
| > These features have one thing in common: they're only
| useful for prototype-quality throwaway code, if at all.
|
| As a matter of practice: the python community disagrees
| strongly. And the python community ate the world.
|
| It's fine to have an opinion, but you're not going to
| change python.
| Philpax wrote:
| The existence of several type-checkers and Astral's
| largely-successful efforts to build tooling that pulls
| Python out of its muck seems to suggest otherwise.
|
| Better things are possible, and I'm hoping that higher
| average quality of Python code is one of those things.
| adsharma wrote:
| That assumes python is one monolithic thing and everyone
| agrees what it is.
|
| True, the view you express here has strong support in the
| community and possibly in the steering committee.
|
| But there are differing ideas on what python is and why
| it's successful.
| ajross wrote:
| > That assumes python is one monolithic thing and
| everyone agrees what it is.
|
| It's exactly the opposite! I'm saying that python is _BIG
| AND DIVERSE_ and that attempts like SPy to invent a new
| (monolithic!) subset language that everyone should use
| instead are doomed, because it won 't meet the needs of
| all the yahoos out there doing weird stuff the SPy
| authors didn't think was important.
|
| It's fine to have "differing ideas on what python is",
| but if those ideas don't match those of _all_ of the
| community, and not just what you think are the good
| parts, it 's not really about what "python" is, is it?
| adsharma wrote:
| My cursory reading is that SPy is generous in what it
| accepts.
|
| The subset I've been working with is even narrower. Given my
| stance on pattern matching, it may not even be a subset.
|
| https://github.com/py2many/py2many/blob/main/doc/langspec.md
| pabe wrote:
| The SPy demo is really good in showing the the difference in
| performance between Python and their derivative. Well done!
| hansvm wrote:
| In the "dynamic" section, it's much worse than the author
| outlines. You can't even assume that the constant named "10" will
| point to a value which behaves like you expect the number 10 to
| behave.
| zahlman wrote:
| I guess you mean "N". 10 is a literal, not a name. The part "N
| cannot be assumed to be ten, because that could be changed
| elsewhere in the code" implies well enough that the change
| could be to a non-integer value. (For that matter, writing `N:
| int = 10` does nothing to fix that.)
| hansvm wrote:
| No, I mean the literal. CPython is more flexible than it has
| any right to be, and you're free to edit the memory pointed
| to by the literal 10.
| zahlman wrote:
| Care to show how you believe this can be achieved, from
| within Python?
| hansvm wrote:
| import ctypes ten = 10 addr = id(ten)
| class PyLongObject(ctypes.Structure): _fields_
| = [ ("ob_refcnt", ctypes.c_ssize_t),
| ("ob_type", ctypes.c_void_p), ("ob_size",
| ctypes.c_ssize_t), ("ob_digit",
| ctypes.c_uint32 * 1), ] long_obj =
| PyLongObject.from_address(addr)
| long_obj.ob_digit[0] = 3 assert 10 == 3
| # using an auxiliary variable to prevent any inlining
| # done at the interpreter level before actually querying
| # the value of the literal `10` x = 3 assert
| 10 * x == 9 assert 10 + x == 6
| zahlman wrote:
| Okay, but this is going out of one's way to view the
| runtime itself as a C program and connect to it with the
| FFI. For that matter, the notion that the result of `id`
| (https://docs.python.org/3/library/functions.html#id)
| could sensibly be passed to `from_address` is an
| implementation detail. This is one reason the language
| suffers from not having a formal specification: it's
| unclear exactly how much of this madness alternative
| implementations like PyPy are expected to validate
| against. But I think people would agree that poking at
| the runtime's own memory cannot be expected to give
| deterministic results, and thus the implementation should
| in fact consider itself free to assume that isn't
| happening. (After all, we could take that further; e.g.
| what if we had another process do the dirty work?)
| hansvm wrote:
| Except, that sort of thing is important in places like
| gevent, pytest, and numba, and that functionality isn't
| easy to replace without a lot of additional
| language/stdlib work (no sane developer would reach for
| it if other APIs sufficed).
|
| The absurd example of overwriting the literal `10` is
| "obviously" bad, but your assertion that the interpreter
| should be able to assume nobody is overwriting its memory
| isn't borne out in practice.
| zahlman wrote:
| > Except, that sort of thing is important in places like
| gevent, pytest, and numba
|
| What, mutating the data representation of built-in types
| documented to be immutable? For what purpose?
| pu_pe wrote:
| Python and other high-level languages may actually decrease in
| popularity with better LLMs. If you are not the one programming
| it, might as well do it in a more performant language from the
| start.
| richard_todd wrote:
| In my workflows I already tend to tell LLMs to write scripts in
| Go instead of python. The LLM doesn't care about the increased
| tediousness and verbosity that would drive me to Python, and
| the result will be much faster.
| Philpax wrote:
| I saw a short post to this effect here:
| https://solmaz.io/typed-languages-are-better-suited-for-
| vibe...
| fumeux_fume wrote:
| Slow or fast ultimately matter in the context for which you need
| to use it. Perhaps these are only myths and fairly tales for an
| incredibly small subset of people who value execution speed as
| the highest priority, but choose to use Python for
| implementation.
| Mithriil wrote:
| > His "sad truth" conclusion is that "Python cannot be super-
| fast" without breaking compatibility.
|
| A decent case of Python 4.0?
|
| > So, maybe, "a JIT compiler can solve all of your problems";
| they can go a long way toward making Python, or any dynamic
| language, faster, Cuni said. But that leads to "a more subtle
| problem". He put up a slide with a trilemma triangle: a dynamic
| language, speed, or a simple implementation. You can have two of
| those, but not all three.
|
| This trilemma keeps getting me back towards Julia. It's less
| simple than Python, but much faster (mitigated by pre-compilation
| time), and almost as dynamic. I'm glad this language didn't die.
| Alex3917 wrote:
| > A decent case of Python 4.0?
|
| I definitely agree with this eventually, but for now why not
| just let developers set `dynamic=False` on objects and make it
| opt in? This is how Google handles breaking Angular upgrades,
| and in practice it works great because people have multiple
| years to prepare for any breaking changes.
| zahlman wrote:
| > A decent case of Python 4.0?
|
| I think "Python 4.0" is going to have to be effectively a new
| language by a different team that simply happens to bear strong
| syntactic similarities. (And at least part of why that isn't
| already happening is that everyone keeps getting scared off by
| the scale of the task.)
|
| Thanks for the reminder that I never got around to checking out
| Julia.
| olejorgenb wrote:
| Isn't that kinda what Mojo is?
| zahlman wrote:
| I haven't tried it, but that matches my understanding,
| yeah.
|
| Personally I'd be more interested in designing from
| scratch.
| rirze wrote:
| If Julia fixes it package manager problems (does it still take
| a while to load imports?), I think it could become popular.
| rybosome wrote:
| Yeah, this is a case of "horses for courses", as you suggest.
|
| I love Python. It's amazing with uv; I just implemented a
| simple CLI this morning for analyzing data with inline
| dependencies that's absolutely perfect for what I need and is
| extremely easy to write, run, and tweak.
|
| Based on previous experience, I would not suggest Python should
| be used for an API server where performance - latency,
| throughput - and scalability of requests is a concern. There's
| lots of other great tools for that. And if you need to write an
| API server and it's ok not to have super high performance, then
| yeah Python is great for that, too.
|
| But it's great for what it is. If they do make a Python 4.0
| with some breaking changes, I hope they keep the highly
| interpreted nature such that something like Pydantic continues
| to work.
| taeric wrote:
| Is amusing to see the top comment on the site be about how Common
| LISP approached this. And hard not to agree with it.
|
| I don't understand how we had super dynamic systems decades ago
| that were easier to optimize than people care to understand.
| Heaven help folks if they ever get a chance to use Mathematica.
| tuna74 wrote:
| In computing terms, saying something is "slow" is kind of
| pointless. Saying something is "effective" or "low latency"
| provides much more information.
| Redoubts wrote:
| Wonder if mojo has gotten anywhere further, since they're trying
| to bring speed while not sacrificing most of the syntax
|
| https://docs.modular.com/mojo/why-mojo/#a-member-of-the-pyth...
| actinium226 wrote:
| A lot of the examples he gives, like the numpy/calc function, are
| easily converted to C/C++/Rust. The article sort of dismisses
| this at the start, and that's fine if we want to focus on the
| speed of Python itself, but it seems like both the only solution
| and the obvious solution to many of the problems specified.
| lkirk wrote:
| For me, in my use of Python as a data analysis language, it's not
| python's speed that is an annoyance or pain point, it's the
| concurrency story. Julia's built in concurrency primatives are
| much more ergonomic in my opinion.
| 1vuio0pswjnm7 wrote:
| "He started by asking the audience to raise their hands if they
| thought "Python is slow or not fast enough";"
|
| Wrong question
|
| Maybe something like, "Python startup time is as fast as other
| interpreters"
|
| Comparatively, Python (startup time) is slow(er)
| ehsantn wrote:
| The article highlights important challenges regarding Python
| performance optimization, particularly due to its highly dynamic
| nature. However, a practical solution involves viewing Python
| fundamentally as a Domain Specific Language (DSL) framework,
| rather than purely as a general-purpose interpreted language.
| DSLs can effectively be compiled into highly efficient machine
| code.
|
| Examples such as Numba JIT for numerical computation, Bodo
| JIT/dataframes for data processing, and PyTorch for deep learning
| demonstrate this clearly. Python's flexible syntax enables
| creating complex objects and their operators such as array and
| dataframe operations, which these compilers efficiently transform
| into code approaching C++-level performance. DSL operator
| implementations can also leverage lower-level languages such as
| C++ or Rust when necessary. Another important aspect not
| addressed in the article is parallelism, which DSL compilers
| typically handle quite effectively.
|
| Given that data science and AI are major use cases for Python,
| compilers like Numba, Bodo, and PyTorch illustrate how many
| performance-critical scenarios can already be effectively
| addressed. Investing further in DSL compilers presents a
| practical pathway to enhancing Python's performance and
| scalability across numerous domains, without compromising
| developer usability and productivity.
|
| Disclaimer: I have previously worked on Numba and Bodo JIT.
| echoangle wrote:
| Was this comment written by an LLM?
___________________________________________________________________
(page generated 2025-08-06 23:01 UTC)