[HN Gopher] Python performance myths and fairy tales
       ___________________________________________________________________
        
       Python performance myths and fairy tales
        
       Author : todsacerdoti
       Score  : 204 points
       Date   : 2025-08-06 08:36 UTC (14 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | NeutralForest wrote:
       | Cool article, I think a lot of those issues are not Python
       | specific so it's a good overview of whatever others can learn
       | from a now 30 years old language! I think we'll probably go down
       | the JS/TS route where another compiler (Pypy or mypyc or
       | something else) will work alongside CPython but I don't see
       | Python4 happening.
        
         | tweakimp wrote:
         | I thought we would never see the GIL go away and yet, here we
         | are. Never say never. Maybe Python4 is Python with another
         | compiler.
        
           | pjmlp wrote:
           | It required Facebook and Microsoft to change the point of
           | view on it, and now the Microsoft team is no more.
           | 
           | So lets see what remains from CPython performance efforts.
        
         | ngrilly wrote:
         | I'm not sure I understand the reference to JS/TS: TS is only a
         | type checker and has zero effect on runtime performance.
        
       | 2d8a875f-39a2-4 wrote:
       | Do you still need an add-on library to use more than one core?
        
         | BlackFly wrote:
         | Latest version officially supports full-threaded mode:
         | https://docs.python.org/3.14/whatsnew/3.14.html#whatsnew314-...
        
         | franktankbank wrote:
         | Eh? Multiprocessing has existed since 2.X days.
        
       | writebetterc wrote:
       | Good job on dispelling the myth of "compiler = fast". I hope
       | SPython will be able to transfer some of its ideas to CPython
       | with time.
        
         | nromiun wrote:
         | You would think Luajit would have convinced people by now. But
         | most people still think you need a static language and an AOT
         | compiler for performance.
        
           | pjmlp wrote:
           | Also Smalltalk (Pharo, Squeak, Cincom, Dolphin), Common Lisp
           | (SBCL, Clozure, Allegro, LispWorks), Self,....
           | 
           | But yeah.
        
         | mrkeen wrote:
         | Where was this dispelled?
        
       | quantumspandex wrote:
       | So we are paying 99% of the performance just for the 1% of cases
       | where it's nice to code in.
       | 
       | Why do people think it's a good trade-off?
        
         | nromiun wrote:
         | Because it's nice to code in. Not everything needs to scale or
         | be fast.
         | 
         | Personally I think it is more crazy that you would optimize 99%
         | of the time just to need it for 1% of the time.
        
           | BlackFly wrote:
           | It isn't an either or choice. The people interested in
           | optimizing the performance are typically different people
           | than those interested in implementing syntactic sugar. It is
           | certainly true that growing the overall codebase risks
           | introducing tensions for some feature sets but that is just a
           | consideration you take when diligently adding to the
           | language.
        
           | nomel wrote:
           | That's why Python is the second best language for everything.
           | 
           | The amount of complexity you can code up in a short time,
           | that most everyone can contribute to, is incredible.
        
         | dgfitz wrote:
         | I can say with certainty I've never paid a penny. Have you?
        
         | lmm wrote:
         | Because computers are more than 100x faster than they were when
         | I started programming, and they were already fast enough back
         | then? (And meanwhile my coding ability isn't any better, if
         | anything it's worse)
        
         | jonathrg wrote:
         | It's much more than 1%, it is what enables commonly used
         | libraries like pytest and Pydantic.
        
         | pjmlp wrote:
         | Because many never used Smalltalk, Common Lisp, Self, Dylan,...
         | so they think CPython is the only way there is, plus they
         | already have their computer resources wasted by tons of
         | Electron apps anyway, that they hardly question CPython's
         | performance, or lack thereof.
        
           | wiseowise wrote:
           | Has it ever crossed your mind that they just like Python?
        
             | pjmlp wrote:
             | And slow code, yes it has cross my mind.
             | 
             | Usually they also call Python to libraries that are 95% C
             | code.
        
               | Fraterkes wrote:
               | The hypocrisy gets even worse: the C code then gets
               | compiled to assembly!
        
         | Hilift wrote:
         | It isn't. There are many things Python isn't up to the task.
         | However, it has been around forever, and some influential niche
         | verticals like cyber security Python was as or more useful than
         | native tooling, and works on multiple platforms.
        
         | bluGill wrote:
         | Most of the time you are waiting on a human or at least
         | something other than the cpu. Most of the time more time is
         | spent by the programmer writing the code than all the users
         | combined waiting for the program to run.
         | 
         | between those two, most often performance is just fine to trade
         | off.
        
         | Krssst wrote:
         | Performance is worthless if the code isn't correct. It's easier
         | to write correct code reasonably quickly in Python in simple
         | cases (integers don't overflow like in C, don't wrap around
         | like in C#, no absurd implicit conversions like in other
         | scripting languages).
         | 
         | Also you don't need code to be fast a lot of the time. If you
         | just need some number crunching that is occasionally run by a
         | human, taking a whole second is fine. Pretty good replacement
         | for shell scripting too.
        
           | Spivak wrote:
           | I mean you can see it with your own experience, folks will
           | post a 50 line snippet of ordinary C code in an blog post
           | which looks like you're reading a long dead ancient language
           | littered with macros and then be like "this is a lot to grok
           | here's the equivalent code in Python / Ruby" and it's 3 lines
           | and completely obvious.
           | 
           | Folks on HN are so weird when it comes to why these languages
           | exist and why people keep writing in them. For all their
           | faults and dynamism and GC and lack of static typing in the
           | real world with real devs you get code that is more correct
           | written faster when you use a higher level language. It's
           | Go's raison d'etre.
        
           | pavon wrote:
           | But many of the language decisions that make Python so slow
           | don't make code easier to write correctly. Like monkey
           | patching; it is very powerful and can be useful, but it can
           | also create huge maintainability issues, and its existence as
           | a feature hinders making the code faster.
        
         | Mawr wrote:
         | I don't think anyone aware of this thinks it's a good tradeoff.
         | 
         | The more interesting question is why the tradeoff was made in
         | the first place.
         | 
         | The answer is, it's relatively easy for us to see and
         | understand the impact of these design decisions because we've
         | been able to see their outcomes over the last 20+ years of
         | Python. Hindsight is 20/20.
         | 
         | Remember that Python was released in 1991, before even Java.
         | What we knew about programming back then vs what we know now is
         | very different.
         | 
         | Oh and also, these tradeoffs are very hard to make in general.
         | A design decision that you may think is irrelevant at the time
         | may in fact end up being crucial to performance later on, but
         | by that point the design is set in stone due to backwards
         | compatibility.
        
       | nromiun wrote:
       | I really hope PyPy gets more popular so that I don't have to
       | argue Python is pretty fast for the nth time.
       | 
       | Even if you have to stick to CPython, Numba, Pythran etc, can
       | give you amazing performance for minimal code changes.
        
       | meinersbur wrote:
       | Is it just me or does the talk actually confirm all its Python
       | "myths and fairy tales"?
        
         | daneel_w wrote:
         | It confirms that Python indeed has poor executional
         | performance.
        
         | xg15 wrote:
         | Well, the fairy tale was that Python was fast, or "fast enough"
         | or "fast if we could compile it and get rid of the GIL".
        
       | btown wrote:
       | I think an important bit of context here is that computers are
       | very, very good at speculative happy-path execution.
       | 
       | The examples in the article seem gloomy: how could a JIT possibly
       | do all the checks to make sure the arguments aren't funky before
       | adding them together, in a way that's meaningfully better than
       | just running the interpreter? But in practice, a JIT _can_ create
       | code that does these checks, and modern processors will branch-
       | predict the happy path and effectively run it _in parallel with_
       | the checks.
       | 
       | JavaScript, too, has complex prototype chains and common use of
       | boxed objects - but v8 has made common use cases extremely fast.
       | I'm excited for the future of Python.
        
         | DanielHB wrote:
         | The main problem is when the optimizations silently fail
         | because of seemingly innocent changes and suddenly your
         | performance tanked 10x. This is a problem with any language
         | really (CPU cache misses are a thing afterall and many non-
         | dynamic languages have boxed objects) but it is a much, much
         | worse in dynamic languages like Python, JS and Ruby.
         | 
         | Most of the time it doesn't matter, most high-throughput python
         | code just invokes C/C++ where these concerns are not as big of
         | a problem. Most JS code just invokes C/C++ browser DOM objects.
         | As long as the hot-path is not in those languages you are not
         | at such high risk of "innocent change tanked performance"
         | 
         | Even server-side most JS/Python/Ruby code is just simple HTTP
         | stack handlers and invoking databases and shuffling data
         | around. And often large part of the process of handling a
         | request (encoding JSON/XML/etc, parsing HTTP messages, etc) can
         | be written in lower-level languages.
        
         | nxobject wrote:
         | To be slightly flip, we could say that the Lisp Machine CISC-
         | supports-language full stack design philosophy lives on in how
         | massive M-series reorder buffers and ILP supports
         | JavaScriptCore.
        
         | jerf wrote:
         | That makes it so that in absolute terms, Python is not as slow
         | as you might naively expect.
         | 
         | But we don't measure programming language performance in
         | absolute terms. We measure them in relative terms, generally
         | against C. And while your Python code is speculating about how
         | this Python object will be unboxed, where its methods are, how
         | to unbox its parameters, what methods will be called on those,
         | etc., compiled code is speculating on _actual code_ the
         | programmer has written, running _that_ in parallel, such that
         | by the time the Python interpreter is done speculating
         | successfully on how some method call will resolve with actual
         | objects the compiled code language is now done with ~50 lines
         | of code of similar grammatical complexity. (Which is a sloppy
         | term, since this is a bit of a sloppy conversation, but
         | consider a series  "p.x = y"-level statements in Python versus
         | C as the case I'm looking at here.)
         | 
         | There's no way around it. You can spend your amazingly capable
         | speculative parallel CPU on churning through Python
         | interpretation or you can spend it on doing real work, but you
         | can't do both.
         | 
         | After all, the interpreter is just C code too. It's not like it
         | gets access to special speculation opcodes that no other
         | program does.
        
           | Demiurge wrote:
           | I love this "real work". Real work, like writing linked
           | lists, array bounds checking, all the error handling for
           | opening files, etc, etc? There is a reason Python and C both
           | have a use case, and it's obvious Python will never be as
           | fast as C doing "1 + 1". The real "real work" is in getting
           | stuff done, not just making sure the least amount of cpu
           | cycles are used to accomplish some web form generation.
           | 
           | Anyway, I think you're totally right, in your general
           | message. Python will never be the fastest language in all
           | contexts. Still, there is a lot of room for optimization, and
           | given it's a popular language, it's worth the effort.
        
             | btown wrote:
             | To put it another way, I choose Python _because_ of its
             | semantics around dynamic operator definition, duck typing
             | etc.
             | 
             | Just because I don't write the bounds-checking and type-
             | checking and dynamic-dispatch and error-handling code
             | myself, doesn't make it any less a conscious decision I
             | made by choosing Python. It's all "real work."
        
               | kragen wrote:
               | Type checking and bounds checking aren't "real work" in
               | the sense that, when somebody checks their bank account
               | balance on your website or applies a sound effect to an
               | audio track in their digital audio workstation, they
               | don't think, "Oh good! The computer is going to do some
               | type checking for me now!" Type checking and bounds
               | checking may be good means to an end, but they are not
               | the end, from the point of view of the outside world.
               | 
               | Of course, the bank account is only a means to the end of
               | paying the dentist for installing crowns on your teeth
               | and whatnot, and the sound effect is only a means to the
               | end of making your music sound less like Daft Punk or
               | something, so it's kind of fuzzy. It depends on what
               | people are _thinking_ about achieving. As programmers,
               | because we know the experience of late nights debugging
               | when our array bounds overflow, we think of bounds
               | checking and type checking as ends in themselves.
               | 
               | But only up to a point! Often, type checking and bounds
               | checking can be done at compile time, which is more
               | efficient. When we do that, as long as it works
               | correctly, we _never_ + feel disappointed that our
               | program isn't doing run-time type checks. We never look
               | at our running programs and say, "This program would be
               | better if it did more of its type checks at runtime!"
               | 
               | No. Run-time type checking is purely a deadweight loss:
               | wasting some of the CPU on computation that doesn't move
               | the program toward achieving the goals we were trying to
               | achieve when we wrote it. It may be a worthwhile tradeoff
               | (for simplicity of implementation, for example) but we
               | must weigh it on the debit side of the ledger, not the
               | credit side.
               | 
               | ______
               | 
               | + Well, unless we're trying to debug a PyPy type-
               | specialization bug or something. Then we might work hard
               | to construct a program that forces PyPy to do more type-
               | checking at runtime, and type checking does become an
               | end.
        
               | rightbyte wrote:
               | > and the sound effect is only a means to the end of
               | making your music sound less like Daft Punk or something
               | 
               | What do you mean. Daft Punk is not daft punk. Why single
               | them out :)
        
               | kragen wrote:
               | Well, originally I wrote "more like Daft Punk", but then
               | I thought someone might think I was stereotyping
               | musicians as being unoriginal and derivative, so I swung
               | the other way.
        
             | jerf wrote:
             | I can't figure out what your first paragraph is about. The
             | topic under discussion is Python performance. We do not
             | generally try to measure something as fuzzy as "real work"
             | as you seem to be using the term in performance discussions
             | because what even is that. There's a reason my post
             | referenced "lines of code", still a rather fuzzy thing
             | (which I _already pointed out_ in my post), but it gets
             | across the idea that while Python has to do a lot of work
             | for  "x.y = z" for all the things that "x.y" _might_ mean
             | including the possibility that the user has changed what it
             | means since the last time this statement ran, compiled
             | languages generally do over an order of magnitude less
             | "work" in resolving that.
             | 
             | This is one of the issues with Python I've pointed out
             | before, to the point I suggest that someone could make a
             | language around this idea: https://jerf.org/iri/post/2025/p
             | rogramming_language_ideas/#s... In Python you pay and pay
             | and pay and pay and pay for all this dynamic functionality,
             | but in practice you aren't actually dynamically modifying
             | class hierarchies and attaching arbitrary attributes to
             | arbitrary instances with arbitrary types. You pay for the
             | feature but you benefit from them far less often than the
             | number of times Python is paying for them. Python spends
             | rather a lot of time spinning its wheels double-checking
             | that it's still safe to do the thing it thinks it can do,
             | and it's hard to remove that even in JIT because it is
             | extremely difficult to _prove_ it can eliminate those
             | checks.
        
               | mmcnl wrote:
               | You claimed churning through Python interpretation is not
               | "real work". You now correctly ask the question: what is
               | "real work"? Why is interpreting Python not real work, if
               | it means I don't have to check for array bounds?
        
               | Demiurge wrote:
               | I understand what you're saying. In a way, my comment is
               | actually off-topic to most of your comment. What I was
               | saying in my first paragraph is that the words you use in
               | your context of a language runtime in-effeciency, can be
               | used to describe why these in-effeciences exist, in the
               | context of higher level processes, like business
               | effeciency. I find your choice of words amusing, given
               | the juxtoposition of these contexts, even saying "you
               | pay, pay, pay".
        
             | Calavar wrote:
             | I believe they are talking about the processor doing real
             | work, not the programmer.
        
               | Demiurge wrote:
               | Yeah, I get it, but I found the choice of words funny,
               | because these words can apply in the larger context. It's
               | like saying, Python transfers work from your man hours to
               | cpu hours :)
        
           | dragonwriter wrote:
           | > After all, the interpreter is just C code too.
           | 
           | What interpreter? We're talking about JITting Python to
           | native code.
        
           | qaq wrote:
           | Welp there is Mojo so looks like soon you will not really
           | need to care that much. Prob will get better performance than
           | C too.
        
             | jerf wrote:
             | I've been hearing promises about "better than C"
             | performance from Python for over 25 years. I remember them
             | on comp.lang.python, back on that Usenet thing most people
             | reading this have only heard about.
             | 
             | At this point, you just shouldn't be making that promise.
             | Decent chance that promise is already older than you are.
             | Just let the performance be what it is, and if you need
             | better performance today, be aware that there are a wide
             | variety of languages of all shapes and sizes standing by to
             | give you ~25-50x better single threaded performance and
             | even more on multi-core performance _today_ if you need it.
             | If you need it, waiting for Python to provide it is not a
             | sensible bet.
        
               | hnfong wrote:
               | You're probably right, Mojo seems to be more "python-
               | like" than actually source-compatible with python. Bunch
               | of features notably classes are missing.
        
               | qaq wrote:
               | Give em a bit of time it's pretty young lang
        
               | qaq wrote:
               | I am a bit older than Python :). I imagine creator of
               | clang and LLVM has fairly good grasp on making things
               | performant. Think of Mojo as Rust with better ergonomics
               | and more advanced compiler that you can mix and match
               | with regular python.
        
             | lenkite wrote:
             | Mojo feels less like a real programming language for humans
             | and primarily a language for AI's. The docs for the
             | language immediately dive into chatbots and AI prompts.
        
               | qaq wrote:
               | I mean thats the use case they care about for obvious
               | reasons but it's not the only use case
        
           | CraigJPerry wrote:
           | > And while your Python code is speculating about how this
           | Python object will be unboxed
           | 
           | This is wrong i think? The GP is talking about JIT'd code.
        
         | fpoling wrote:
         | Although JS supports prototype mutations, the with operator and
         | other constructs that make optimization harder, typical JS code
         | does not use that. Thus JIT can add few checks for presence of
         | problematic constructions to direct it to a slow path while
         | optimizing not particularly big set of common patterns. And
         | then the JS JIT does not need to care much about calling
         | arbitrary native code as the browser internals can be
         | adjusted/refactored to tune to JIT needs.
         | 
         | With Python that does not work. There are simply more
         | optimization-unfriendly constructs and popular libraries use
         | those. And Python calls arbitrary C libraries with fixed ABI.
         | 
         | So optimizing Python is inherently more difficult.
        
         | josefx wrote:
         | > but v8 has made common use cases extremely fast. I'm excited
         | for the future of Python.
         | 
         | Isn't v8 still entirely single threaded with limited message
         | passing? Python just went through a lot of work to make
         | multithreaded code faster, it would be disappointing if it had
         | to scrap threading entirely and fall back to multiprocessing on
         | shared memory in order to match v8.
        
           | zozbot234 wrote:
           | Multithreaded code is usually bottlenecked by memory
           | bandwidth, even more so than raw compute. C/C++/Rust are
           | great at making efficient use of memory bandwidth, whereas
           | scripting languages are rather wasteful of it by comparison.
           | So I'm not sure that multithreading will do much to bridge
           | the performance gap between binary compiled languages and
           | scripting languages like Python.
        
           | loeg wrote:
           | JS is single-threaded. Python isn't.
        
         | mcdeltat wrote:
         | I wonder if branch prediction can still hide the performance
         | loss when the happy path checks become large/complex. Branch
         | prediction is a very low level optimisation. And if the
         | predictor is right you don't get everything for free. The CPU
         | must still evaluate the condition, which takes resources,
         | albeit it's no longer on the critical path. However I'd think
         | the CPU would stall if it got too far ahead of the condition
         | execution (ultimately all the code must execute before the
         | program completes). Perhaps given the nature of Python, the
         | checks would be so complex that in a tight loop they'd exert
         | significant resource pressure?
        
       | mrkeen wrote:
       | I didn't read with 100% focus, but this lwn account of the talk
       | seemed to confirm those myths instead of debunking.
        
         | diegocg wrote:
         | Yep, for me it confirms all the reasons why I think python is
         | slow and not a good language for anything that goes beyond a
         | script. I work with it everyday, and I have learned that I
         | can't even trust tooling such as mypy because it's full of
         | corner cases - turns out that not having a clear type design in
         | a language is not something that can be fundamentally fixed by
         | external tools. Tests are the only thing that can make me trust
         | code written in this language
        
           | jdhwosnhw wrote:
           | > Yep, for me it confirms all the reasons why I think python
           | is slow
           | 
           | Yes, that is literally the explicit point of the talk. The
           | first myth of the article was "python is not slow"
        
         | postexitus wrote:
         | A more careful reading of the article is required.
         | 
         | The first myth is "Python is not slow" - it is debunked, it is
         | slow.
         | 
         | The second myth is ""it's just a glue language / you just need
         | to rewrite the hot parts in C/C++" - it is debunked, just
         | rewriting stuff in C/Rust does not help.
         | 
         | The third myth is " Python is slow because it is interpreted" -
         | it is debunked, it is not slow only because it is interpreted.
        
           | mrkeen wrote:
           | Thanks! As a Python outsider, I was primed for a Python
           | insider to be trying to change my views, not confirm them,
           | and I did indeed misread.
        
             | zahlman wrote:
             | My impression is that GvR conceded a long time ago that
             | Python is slow, and doesn't particularly care (and
             | considers it trolling to keep bringing it up). The point is
             | that in the real world this doesn't matter a lot of the
             | time, at least as long as you aren't making big-O mistakes
             | -- and easier-to-use languages make it easier to avoid
             | those mistakes.
             | 
             | For that matter, I recently saw a talk in the Python world
             | that was about convincing people to let their computer do
             | more work locally in general, because computers really are
             | just that fast now.
        
           | ActorNightly wrote:
           | >just rewriting stuff in C/Rust does not help.
           | 
           | Except it does. The key is to figure out which part you
           | actually need to go fast, and write it in C. If most of your
           | use case is dominated by network latency.
           | 
           | Overall, people seem to miss the point of Python. The best
           | way to develop software is "make it work, make it good, make
           | it fast" - the first part gets you to an end to end prototype
           | that gives you a testable environment, the second part
           | establishes the robustness and consistency, and the third
           | part lets you focus on optimizing the performance with a
           | robust framework that lets you ensure that your changes are
           | not breaking anything.
           | 
           | Pythons focus is on the first part. The idea is that you
           | spend less time making it work. Once you have it working,
           | then its much easier to do the second part (adding tests,
           | type checking, whatever else), and then the third part. Now
           | with LLMs, its actually pretty straightforward to take a
           | python file and translate it to .c/.h files, especially with
           | agents that do additional "thinking" loops.
           | 
           | However, even given all of that, in practice you often don't
           | need to move away from Python. For example, I have a project
           | that datamines Strava Heatmaps (i.e I download png tiles for
           | entire US). The amount of time that it took me to write it in
           | Python in addition to running it (which takes about a day) is
           | much shorter than it would have taken me to write it in
           | C++/Rust and then run it with speedup in processing.
        
           | IshKebab wrote:
           | In fairness I wouldn't really call those "myths", just bad
           | defences of Python's slowness. I don't think the people
           | saying them _really_ believe it - if it came to life or
           | death. They just really like Python and are trying to avoid
           | the cognitive dissonance of liking a really slow language.
           | 
           | Like, I wouldn't say it's a "myth" that Linux is easy to use.
        
           | akkad33 wrote:
           | > The first myth is "Python is not slow" - it is debunked, it
           | is slow
           | 
           | This is strange. Most people in programming community know
           | python is slow. If it has any reputation, it's that it is
           | quite slow
        
       | pjmlp wrote:
       | Basically, leave Python for OS and application scripting tasks,
       | and as BASIC replacement for those learning to program.
        
         | aragilar wrote:
         | And yet, most of what people end up doing ends up being
         | effectively OS and application scripting. Most ML projects are
         | really just setting up a pipeline and telling the computer to
         | go and run it. Cloud deployments are "take this yaml and
         | transform it some other yaml". In as much as I don't want to
         | use Fortran to parse a yaml file, I don't really want to write
         | an OS (or a database) in Python. Even something like django is
         | mostly deferring off tasks to faster systems, and is really
         | about being a DSL-as-programming-language while still being
         | able to call out to other things (e.g. ML code).
        
           | pjmlp wrote:
           | I would rather use Fortran actually, not all of us are stuck
           | with Fortran 77.
           | 
           | Ironically Fortran support is one of the reasons CUDA won
           | over OpenCL.
           | 
           | Having said that, plenty of programming languages with
           | JIT/AOT toolchains have nice YAML parsers, I don't see the
           | need to bother with Python for that.
        
       | Ulti wrote:
       | Feel like Mojo is worth a shoutout in this context
       | https://www.modular.com/mojo Solves the issue of having a
       | superset of Python in syntax where "fn" instead of "def"
       | functions are assumed static typed and compilable with Numba
       | style optimisations.
        
         | _aavaa_ wrote:
         | Mojo NOT being open-source is a complete non-starter.
        
           | alankarmisra wrote:
           | Genuinely curious; while I understand why we would want a
           | language to be open-source (there's plenty of good reasons),
           | do you have anecdotes where the open-sourceness helped you
           | solve a problem?
        
             | yupyupyups wrote:
             | Not the OP, but I have needed to patch Qt due to bugs that
             | couldn't be easily worked around.
             | 
             | I have also been frustrated while trying to interoperate
             | with expensive proprietary software because documentation
             | was lacking, and the source code was unavailable.
             | 
             | In one instance, a proprietary software had the source code
             | "exposed", which helped me work around its bugs and use it
             | properly (also poorly documented).
             | 
             | There are of course other advantages of having that
             | transparancy, like being able to independently audit the
             | code for vulnerabilities or unacceptable "features", and
             | fix those.
             | 
             | Open source is oftentimes a prerequisite for us to be able
             | to control our software.
        
             | _aavaa_ wrote:
             | It has helped prevent problems. I am not worried about a
             | python suddenly adding a clause stating that I can't
             | release a ML framework...
        
             | Philpax wrote:
             | In the earlier days of rustc, it was handy to be able to
             | look at the context for a specific compiler error (this is
             | before the error reporting it is now known for). Using
             | that, I was able to diagnose what was wrong with my code
             | and adjust it accordingly.
        
           | Ulti wrote:
           | More of a question of /will/ Mojo eventually be entirely open
           | source, chunks of it already are. The intent from Modular is
           | eventually it will be, just not everything all at once and
           | not whilst they're internally doing loads of dev for their
           | own commercial entity. Which seems fair enough to me.
           | Importantly they have open sourced lots of the stdlib which
           | is probably what anyone external would contribute to or want
           | to change anyway? https://www.modular.com/blog/the-next-big-
           | step-in-mojo-open-...
        
             | _aavaa_ wrote:
             | _When_ it has become open source I will consider building
             | up expertise and a product on it. Until it has happened
             | there are no guarantees that it will.
        
               | Ulti wrote:
               | Well the "expertise" is mostly just Python thats sort of
               | the value prop. But yeah building an actual AI product
               | ontop I'd be more worried about the early stage nature of
               | Modular rather than the implementation is closed source.
        
               | _aavaa_ wrote:
               | Sure, that's the value prop of numba too. But reality is
               | different.
        
       | abhijeetpbodas wrote:
       | An earlier version of the talk is at
       | https://www.youtube.com/watch?v=ir5ShHRi5lw (I could not find the
       | EuroPython one).
        
         | fragebogen wrote:
         | Here's a newer one https://www.youtube.com/watch?v=1uFMW0IcZuw
        
       | ic_fly2 wrote:
       | It's a good article on speed.
       | 
       | But honestly the thing that makes any of my programs slow is
       | network calls. And there a nice async setup goes a long way. And
       | then k8 for the scaling.
        
         | nicolaslem wrote:
         | This. I maintain an ecommerce platform written in Python. Even
         | with Python being slow, less than 30% of our request time is
         | spent executing code, the rest is talking to stuff over the
         | network.
        
         | stackskipton wrote:
         | SRE here, that horizontal scaling with Python has impacts as
         | it's more connections to database and so forth so you are
         | impacting things even if you don't see it.
        
           | ic_fly2 wrote:
           | Meh, even with basic async I've been able to overload azure's
           | premium ampq offering memory capacity.
           | 
           | But yes managing db connections is a pain. But I don't think
           | it's any better in Java (my only other reference at this
           | scale)
        
         | gen220 wrote:
         | I think articles like this cast too wide a net when they say
         | "performance" or "<language> is fast/slow".
         | 
         | A bunch of SREs discussing which languages/servers/runtimes are
         | fast/slow/efficient in comparable production setups would give
         | more practical guidance.
         | 
         | If you're building an http daemon in a traditional three-tiered
         | app (like a large % of people on HN), IME, Python has quietly
         | become a great language in that space, compared to its peers,
         | over the last 8 years.
        
       | ntoll wrote:
       | Antonio is a star. He's also a very talented artist.
        
       | dgan wrote:
       | "Rewrite the hot path in C/C++" is also a landmine because how
       | inefficient the boundary crossing is. so you really need
       | "dispatch as much as possible at once" instead of continuously
       | calling the native code
        
         | aragilar wrote:
         | Isn't this just a specific example of the general rule of
         | pulling out repeated use of the same operation in a loop? I'm
         | not sure calls out to C are specifically slow in CPython (given
         | many operations are really just calling C underneath).
        
           | KeplerBoy wrote:
           | The key is to move the entire loop to a compiled language
           | instead of just the inner operation.
        
           | dgan wrote:
           | they are specifically slow. there was a project which
           | measured FFI cost in different languages, and python is
           | awfully bad
        
           | Twirrim wrote:
           | The serialisation cost of translating data representations
           | between python and C (or whatever compiled language you're
           | using) is notable. Instead of having the compiled code sit in
           | the centre of a hot loop, it's significantly better to have
           | the loop in the compiled code and call it once
           | 
           | https://pythonspeed.com/articles/python-extension-
           | performanc...
        
             | morkalork wrote:
             | The overhead of copying and moving data around in Python is
             | frustrating. When you are CPU bound on a task, you can't
             | use threads (which do have shared memory) because of the
             | GIL, so you end up using whole processes and then waste a
             | bunch of cycles communicating stuff back and forth. And
             | yes, you can create shared memory buffers between Python
             | processes but that is nowhere near as smooth as say two
             | Java threads working off a shared data structure that's got
             | synchronized sprinkled on it.
        
             | kragen wrote:
             | You don't have to serialize data or translate data
             | representations between CPython and C. That article is
             | wrong. What's slow in their example is storing data (such
             | as integers) the way CPython likes to store it, not
             | translating that form to a form easily manipulated in C,
             | such as a native integer in a register. That's just a
             | single MOV instruction, once you get past all the type
             | checking and reference counting.
             | 
             | You can avoid that problem to some extent by implementing
             | your own data container as part of your C extension (the
             | article's solution #1); frobbing that from a Python loop
             | can still be significantly faster than allocating and
             | deallocating boxed integers all the time, with dynamic
             | dispatch and reference counting. But, yes, to really get
             | reasonable performance you want to not be running bytecodes
             | in the Python interpreter loop at all (the article's
             | solution #2).
             | 
             | But that's not because of serialization or other kinds of
             | data format translation.
        
         | ActorNightly wrote:
         | >how inefficient the boundary crossing is
         | 
         | For 99.99% of the programs that people write, the modern M.2
         | NVME hard drives are plenty fast, and thats the laziest way to
         | load data into a C extension or process.
         | 
         | Then there is unix pipes which are sufficiently fast.
         | 
         | Then there is shared memory, which basically involves no
         | loading.
         | 
         | As with Python, all depends on the setup.
        
           | zahlman wrote:
           | The problem isn't loading the data, but marshalling it (i.e,
           | transforming it into a data structure that makes sense for
           | the faster language to operate on, and back again). Or if you
           | don't transform (or the data is special-cased enough that no
           | transformation makes sense) then the available optimizations
           | become much more limited.
        
             | ActorNightly wrote:
             | Thats all just design. Nothing having to do with particular
             | language.
        
             | jononor wrote:
             | There are several datastructures for numeric data that do
             | not need marshalling, and are suitable for very efficient
             | interoperetion between Python and C/C++/Rust etc. Examples
             | include array.array (in standard library), numpy.array, and
             | PyArrow.
        
         | didip wrote:
         | These days it's "rewrite in Rust".
         | 
         | Typically Python is just the entry and exit point (with a
         | little bit of massaging), right?
         | 
         | And then the overwhelming majority of the business logic is
         | done in Rust/C++/Fortran, no?
        
           | 01HNNWZ0MV43FF wrote:
           | With computer vision you end up wanting to read and write to
           | huge buffers that aren't practical to serialize and are
           | difficult to share. And even allocating and freeing multi-
           | megabyte framebuffers at 60 FPS can put a little strain on
           | the allocator, so you want to reuse them, which means you
           | have to think about memory safety.
           | 
           | That is probably why his demo was Sobel edge detection with
           | Numpy. Sobel can run fast enough at standard resolution on a
           | CPU, but once that huge buffer needs to be read or written
           | outside of your fast language, things will get tricky.
           | 
           | This also comes up in Tauri, since you have to bridge between
           | Rust and JS. I'm not sure if Electron apps have the same
           | problem or not.
        
             | aeroevan wrote:
             | In the data science/engineering world apache arrow is the
             | bridge between languages, so you don't actually need to
             | serialize into language specific structures which is really
             | nice
        
             | jononor wrote:
             | The "numpy" Sobel code is not that good, unfortunately -
             | all the iteration is done in Python, so there is not much
             | benefit from involving numpy. If one would use say
             | scipy.convolve2d on a numpy.array, it would be much faster.
        
         | pavon wrote:
         | One use of Python as a "glue language" I've seen that actually
         | avoids the performance problems of those bindings is GNU Radio.
         | That is because its architecture basically uses python as a
         | config language that sets up the computation flow-graph at
         | startup, and then the rest of runtime is entirely in compiled
         | code (generally C++). Obviously that approach isn't applicable
         | to all problems, but it really shaped my opinion of when/how a
         | slow glue language is acceptable.
        
           | slt2021 wrote:
           | This. Use python only for control flow, and offload data flow
           | to a library that is better suited for this: written in C,
           | uses packed structs, cache friendly, etc.
           | 
           | if you want multiprocessing, use the multiprocessing library,
           | scatter and gather type computation, etc
        
         | IshKebab wrote:
         | And it's not just inefficiency. Even with fancy FFI generators
         | like PyO3 or SWIG, adding FFI adds a ton of work, complexity,
         | makes debugging harder, distribution harder, etc.
         | 
         | In my opinion in most cases where you might want to write a
         | project in two languages with FFI, it's usually better not to
         | and just use one language even if that language isn't optimal.
         | In this case, just write the whole thing in C++ (or Rust).
         | 
         | There are some exceptions but generally FFI is a _huge_ cost
         | and Python doesn 't bring enough to the table to justify its
         | use if you are already using C++.
        
       | robmccoll wrote:
       | Python as a language will likely never have a "fast"
       | implementation and still be Python. It is way too dynamic to be
       | predictable from the code alone or even an execution stream in a
       | way that allows you to simplify the actual code that will be
       | executed at runtime either through AOC or JIT. The language is
       | itself is also quite large in terms of syntax and built-in
       | capability at this point which makes new feature-conplete
       | implementations that don't make major trade offs quite
       | challenging. Given how capable LLMs are at translating code, it
       | seems like the perfect time to build a language with similar
       | syntax, but better scoped behavior, stricter rules around typing,
       | and tooling to make porting code and libraries automated and
       | relatively painless. What would existing candidates be and why
       | won't they work as a replacement?
        
         | pjmlp wrote:
         | Self and Smalltalk enter the room.
         | 
         | As for the language with similar syntax, do you want Nim, Mojo
         | or Scala 3?
        
         | BlackFly wrote:
         | The secret as stated is the comlexity of a JIT. In practice,
         | that dynamism just isn't used much in practice and in
         | particular in optimization targets. The JIT analyses the code
         | paths, sees that no writes to the target are possible so treats
         | it as a constant.
         | 
         | Java has similar levels of dynamism-with invokedynamic
         | especially, but already with dynamic dispatch-in practice the
         | JIT monomorphises to a single class even though by default
         | classes default to non-final in Java and there may even be
         | multiple implementations known to the JVM when it
         | monomorphises. Such is the strength of the knowledge that a JIT
         | has compared to a local compiler.
        
           | pjmlp wrote:
           | Yes, Java syntax might look like C++, but the execution
           | semantics are closer to Objective-C and Smalltalk, which is
           | why adopting StrongTalk JIT for Java Hotspot was such a win.
        
         | acmj wrote:
         | Pypy is 10x faster and is compatible with most cpython code.
         | IMHO it was a big mistake not to adopt JIT during the 2-to-3
         | transition.
        
           | cestith wrote:
           | That "most" is doing a big lift there. At some point you
           | might consider that you're actually programming in the
           | language of Pypy and not pure Python. It's effectively a
           | dialect of the language like Turbo Pascal vs ISO Pascal or
           | RPerl instead of Perl.
        
             | cma wrote:
             | Most is more CPython code than python 3 was compatible
             | with. But the port of the broken code was likely much
             | easier than if it had moved to a JIT at the same time too.
        
           | rirze wrote:
           | Isn't there an incoming JIT in 3.14?
        
       | nu11ptr wrote:
       | The primary focus here is good and something I hadn't considered:
       | python memory being so dynamic leads to poor cache locality.
       | Makes sense. I will leave that to others to dig into.
       | 
       | That aside, I was expecting some level of a pedantic argument,
       | and wasn't disappointed by this one:
       | 
       | "A compiler for C/C++/Rust could turn that kind of expression
       | into three operations: load the value of x, multiply it by two,
       | and then store the result. In Python, however, there is a long
       | list of operations that have to be performed, starting with
       | finding the type of p, calling its __getattribute__() method,
       | through unboxing p.x and 2, to finally boxing the result, which
       | requires memory allocation. None of that is dependent on whether
       | Python is interpreted or not, those steps are required based on
       | the language semantics."
       | 
       | The problem with this argument is the user isn't trying to do
       | these things, they are trying to do multiplication, so the fact
       | that the lang. has to do all things things in the end DOES mean
       | it is slow. Why? Because if these things weren't done, the end
       | result could still be achieved. They are pure overhead, for no
       | value in this situation. Iow, if Python had a sufficiently
       | intelligent compiler/JIT, these things could be optimized away
       | (in this use case, but certainly not all). The argument is akin
       | to: "Python isn't slow, it is just doing a lot of work". That
       | might be true, but you can't leave it there. You have to ask if
       | this work has value, and in this case, it does not.
       | 
       | By the same argument, someone could say that any interpreted
       | language that is highly optimized is "fast" because the
       | interpreter itself is optimized. But again, this is the wrong way
       | to think about this. You always have to start by asking "What is
       | the user trying to do? And (in comparison to what is considered a
       | fast language) is it fast to compute?". If the answer is "no",
       | then the language isn't fast, even if it meets the expected
       | objectives. Playing games with things like this is why users get
       | confused on "fast" vs "slow" languages. Slow isn't inherently
       | "bad", but call a spade a spade. In this case, I would say the
       | proper way to talk about this is to say: "It has a fast
       | interpreter". The last word tells any developer with sufficient
       | experience what they need to know (since they understand
       | statically compiled/JIT and interpreted languages are in
       | different speed classes and shouldn't be directly compared for
       | execution speed).
        
         | andylei wrote:
         | The previous paragraph is
         | 
         | > Another "myth" is that Python is slow because it is
         | interpreted; again, there is some truth to that, but
         | interpretation is only a small part of what makes Python slow.
         | 
         | He concedes its slow, he's just saying it's not related to how
         | interpreted it is.
        
           | nu11ptr wrote:
           | I would argue this isn't true. It is a big part of what makes
           | it slow. The fastest interpreted languages are one to two
           | orders of magnitude slower than for example C/C++/Rust. If
           | your language does math 20-100 times slower than C, it isn't
           | fast from a user perspective. Full stop. It might, however,
           | have a "fast interpreter". Remember, the user doesn't care if
           | it is a fast for an interpreted language, they are just
           | trying to obtain their objective (aka do math as fast as
           | possible). They can get cache locality perfect, and Python
           | would still be very slow (from a math/computation
           | perspective).
        
             | nyrikki wrote:
             | The 200-100 times slower is a bit cherry picked, but use
             | case does matter.
             | 
             | Typically from a user perspective, the initial starting
             | time is either manageable or imperceptible in the cases of
             | long running services, although there are other costs.
             | 
             | If you look at examples that make the above claim, they are
             | almost always tiny toy programs where the cost of producing
             | byte/machine code isn't easily amortized.
             | 
             | This quote from the post is an oversimplification too:
             | 
             | > But the program will then run into Amdahl's law, which
             | says that the improvement for optimizing one part of the
             | code is limited by the time spent in the now-optimized code
             | 
             | I am a huge fan of Amdahl's law, but also realize it is
             | pessimistic and most realistic with parallelization.
             | 
             | It runs into serious issues when you are multiprocessing vs
             | parallel processing due to preemption, etc .
             | 
             | Yes you still have the costs of abstractions etc...but in
             | today's world, zero pages on AMD, 16k pages and a large
             | number of mapped registers on arm, barrel shifters etc...
             | make that much more complicated especially with C being
             | forced into trampolines etc...
             | 
             | If you actually trace the CPU operations, the actual
             | operations for 'math' are very similar.
             | 
             | That said modern compilers are a true wonder.
             | 
             | Interpreted language are often all that is necessary and
             | sufficient. Especially when you have Internet, database and
             | other aspects of the system that also restrict the benefits
             | of the speedups due to...Amdahl's law.
        
               | nu11ptr wrote:
               | I'm not so much cherry picking as I am specifically
               | talking compute (not I/O,stdlib) performance. However,
               | when measured for general purpose tasks, that would
               | involve compute and things like I/O, stdlib performance,
               | etc., Python on the whole is typically NOT 20-100x times
               | slower for a given task. Its I/O layer is written in C
               | like many other languages, so the moment you are waiting
               | on I/O you have leveled the playing field. Likewise,
               | Python has a very fast dict implementation in C, so when
               | doing heavy map work, you also amortorize the time
               | between the (brutally slow) compute and the very fast
               | maps.
               | 
               | In summary, it depends. I am talking about compute
               | performance, not I/O or general purpose task
               | benchmarking. Yes, if you have a mix of compute and I/O
               | (which admittedly is a typical use case), it isn't going
               | to be 20-100x slower, but more likely "only" 3-20x
               | slower. If it is nearly 100% I/O bound, it might not be
               | any slower at all (or even faster if properly buffered).
               | If you are doing number crunching (w/o a C lib like
               | NumPy), your program will likely be 40-100x slower than
               | doing it in C, and many of these aren't toy programs.
        
               | nyrikki wrote:
               | Even with compute performance it is probably closer than
               | you expect.
               | 
               | Python isn't evaluated line-by-line, even in micropython,
               | which is about the only common implementation that
               | doesn't work in the same way.
               | 
               | Cython VM will produce an AST of opcodes, and binary
               | operations just end up popping off a stack, or you can
               | hit like pypy.
               | 
               | How efficiently you can keep the pipeline fed is more
               | critical than computation costs.                    int a
               | = 5;          int b = 10;          int sum = a + b;
               | 
               | Is compiled to:                    MOV EAX, 5
               | MOV EBX, 10          ADD EAX, EBX          MOV
               | [sum_variable]
               | 
               | In the PVM binary operations remove the top of the stack
               | (TOS) and the second top-most stack item (TOS1) from the
               | stack. They perform the operation, and put the result
               | back on the stack.
               | 
               | That pop, pop isn't much more expensive on modern CPUs
               | and some C compilers will use a stack depending on many
               | factors. And even in C you have to use structs of arrays
               | etc... depending on the use case. Stalled pipelines and
               | fetching due to the costs is the huge difference.
               | 
               | It is the setup costs, GC, GIL etc... that makes python
               | slower in many cases.
               | 
               | While I am not suggesting it is as slow as python, Java
               | is also byte code, and often it's assumptions and design
               | decisions are even better or at least nearly equal to C
               | in the general case unless you highly optimize.
               | 
               | But the actual equivalent computations are almost
               | identical, optimizations that the compilers make differ.
        
             | andylei wrote:
             | i'll answer your argument with the initial paragraph you
             | quoted:
             | 
             | > A compiler for C/C++/Rust could turn that kind of
             | expression into three operations: load the value of x,
             | multiply it by two, and then store the result. In Python,
             | however, there is a long list of operations that have to be
             | performed, starting with finding the type of p, calling its
             | __getattribute__() method, through unboxing p.x and 2, to
             | finally boxing the result, which requires memory
             | allocation. None of that is dependent on whether Python is
             | interpreted or not, those steps are required based on the
             | language semantics.
        
               | immibis wrote:
               | Typically a dynamic language JIT handles this by
               | observing what actual types the operation acts on, then
               | hardcoding fast paths for the one type that's actually
               | used (in most cases) or a few different types. When the
               | type is different each time, it has to actually do the
               | lookup each time - but that's very rare.
               | 
               | i.e.
               | 
               | if(a->type != int_type || b->type != int_type)
               | abort_to_interpreter();
               | 
               | result = ((intval*)a)->val + ((intval*)b)->val;
               | 
               | The CPU does have to execute both lines, but it does them
               | in parallel so it's not as bad as you'd expect. Unless
               | you abort to the interpreter, of course.
        
         | ActivePattern wrote:
         | A "sufficiently smart compiler" can't legally skip Python's
         | semantics.
         | 
         | In Python, p.x * 2 means dynamic lookup, possible descriptors,
         | big-int overflow checks, etc. A compiler can drop that only if
         | it proves they don't matter or speculates and adds guards--
         | which is still overhead. That's why Python is slower on scalar
         | hot loops: not because it's interpreted, but because its
         | dynamic contract must be honored.
        
           | pjmlp wrote:
           | In Smalltalk, p x * 2 has that flow that as well, and even
           | worse, lets assume the value returned by p x message
           | selector, does not understand the * message, thus it will
           | break into the debugger, then the developer will add the *
           | message to the object via the code browser, hit save, and
           | exit the debugger with redo, thus ending the execution with
           | success.
           | 
           | Somehow Smalltalk JIT compilers handle it without major
           | issues.
        
             | ActivePattern wrote:
             | Smalltalk JITs make p x * 2 fast by speculating on types
             | and inserting guards, not by skipping semantics. Python
             | JITs do the same (e.g. PyPy), but Python's dynamic features
             | (like __getattribute__, unbounded ints, C-API hooks) make
             | that harder and costlier to optimize away.
             | 
             | You get real speed in Python by narrowing the semantics
             | (e.g. via NumPy, Numba, or Cython) not by hoping the
             | compiler outsmarts the language.
        
               | pjmlp wrote:
               | People keep forgetting about image based semantics
               | development, debugger, meta-classes, messages like
               | _becomes:_ ,...
               | 
               | There is to say everything dynamic that can be used as
               | Python excuse, Smalltalk and Self, have it, and double
               | up.
        
               | tekknolagi wrote:
               | If I may toot my own horn:
               | https://bernsteinbear.com/blog/typed-python/
        
             | cma wrote:
             | edit and continue is available on lots of JIT-runtime
             | languages
        
           | nu11ptr wrote:
           | First, we need to add the word 'only': "not ONLY because it's
           | interpreted, but because its dynamic contract must be
           | honored." Interpreted languages are slow by design. This
           | isn't bad, it just is a fact.
           | 
           | Second, at most this describes WHY it is slow, not that it
           | isn't, which is my point. Python is slow. Very slow (esp. for
           | computation heavy workloads). And that is okay, because it
           | does what it needs to do.
        
         | rstuart4133 wrote:
         | > The problem with this argument is the user isn't trying to do
         | these things,
         | 
         | I'd argue differently. I'd say the the problem isn't that the
         | user is doing those things, it's that the language doesn't know
         | what he's trying to do.
         | 
         | Python's explicit goal was always ergonomics, and it was always
         | ergonomics over speed or annoying compile time error messages.
         | "Just run the code as written dammit" was always the goal. I
         | remember when the never class model was introduced,
         | necessitating the introduction of __get_attribute__. My first
         | reaction as a C programmer was "gee you took a speed hit
         | there". A later reaction was to use it to twist the new system
         | into something it's inventors possibly never thought of. It was
         | a LR(1) parser, that let you write the grammars as regular
         | Python statements.
         | 
         | While they may not have thought abusing the language in that
         | particular way, I'm sure the explicit goal was to create a
         | framework that any idea to be expressed with minimal code.
         | Others also used to hooks they provided into the way the
         | language builds to create things like pydantic and spyne. Spyne
         | for example lets you express the on-the-wire serialisation
         | formats used by RPC as Python class declarations, and then
         | compile them into JSON, xml, SOAP of whatever. Sqlalchamey lets
         | you express SQL using Python syntax, although in a more
         | straightforward way.
         | 
         | All of them are very clever in how they twist the language.
         | Inside those frameworks, "a = b + c" does not mean "add b to c,
         | and place the result in a". In the LR(1) parser for example it
         | means "there is a production called 'a', that is a 'b' followed
         | by a 'c'". 'a' in that formulation holds references to 'b' and
         | 'c'. Later the LR(1) parser will consume that, compiling it
         | into something very different. The result is a long way from
         | two's compliment addition.
         | 
         | It is possible to use a power powerful type systems in a
         | similar way. For example I've seen FPGA designs expressed in
         | Scalar. However, because Scalar's type system insists on
         | knowing what is going on at compile time, Scalar had a fair
         | idea of what the programmer is building. The compile result
         | isn't going to be much slower than any other code. Python
         | achieved the same flexibility by abandoning type checking at
         | compile time almost entirely, pushing it all to run time. Thus
         | the compiler has no idea of what going to executed in the end
         | (the + operation in the LR parser only gets executed once for
         | example), which is what I said above "it's that the language
         | doesn't know what the programmer is trying to do".
         | 
         | You argue that since it's an interpreted language, it's the
         | interpreters jobs to figure out what the programmer is trying
         | to do at run time. Surely it can figure out that "a = b + c"
         | really is adding two 32 bit integers that won't overflow.
         | That's true, but that creates a low of work to do at run time.
         | Which is a round about way of saying the same thing as the
         | talk: electing to do it at run time means the language chose
         | flexibility over speed.
         | 
         | You can't always fix this in an interpreter. Javascript has
         | some of the best interpreters around, and they do make the
         | happy path run quickly. But those interpreters come with
         | caveats, usually of the form "if you muck around with the
         | internals of classes, by say replacing function definitions at
         | run time, we abandon all attempts to JIT it". People don't
         | typically do such things in Javascript, but as it happens,
         | Python's design with it's meta classes, dynamic types created
         | with "type(...)", and "__new__(..)" almost could be said
         | encourage that coding style. That is, again, a language design
         | choice, and it's one that favours flexibility over speed.
        
       | teo_zero wrote:
       | I don't know Python so well as to propose any meaningful
       | contribution, but it seems to me that most issues would be
       | mitigated by a sort of "final" statement or qualifier, that
       | prohibits any further changes to the underlying data structure,
       | thus enabling all the nice optimizations, tricks and shortcuts
       | that compilers and interpreters can't afford when data is allowed
       | to change shape under their feet.
        
         | Fraterkes wrote:
         | I assume people dislike those kinds of solutions because the
         | extreme dynamism is used pretty rarely in a lot of meat and
         | potatoes python scripst. So a lot of "regular" python scripts
         | would have to just plaster "final" everywhere to make it as
         | fast as it can be.
         | 
         | At that point youd maybe want to have some sort of broader way
         | to signify which parts of your script are dynamic. But then,
         | youd have a language that can be dynamic even in how dynamic it
         | is...
        
       | game_the0ry wrote:
       | I know I am going to get some hate for this from the "Python-
       | stans" but..."python" and "performance" should never be
       | associated with each other, and same for any
       | scripting/interpreted programming language. Especially if it has
       | a global interpreter lock.
       | 
       | While performance (however you may mean that) is always a worthy
       | goal, you may need to question your choice of language if you
       | start hitting performance ceilings.
       | 
       | As the saying goes - "Use the right tool for the job." Use case
       | should dictate tech choices, with few exceptions.
       | 
       | Ok, now that I have said my piece, now you can down vote me :)
        
         | danielrico wrote:
         | That's used by some people as excuse to write the most
         | inefficient code.
         | 
         | Ok, you are not competing with c++, but also you shouldn't be
         | redoing all the calculations because you haven't figured the
         | data access pattern..
        
         | ahoka wrote:
         | Have you read the fine article?
        
         | throwaway6041 wrote:
         | > the "Python-stans"
         | 
         | I think the term "Pythonistas" is more widely used
         | 
         | > you may need to question your choice of language if you start
         | hitting performance ceilings.
         | 
         | Developers should also question if a "fast" language like Rust
         | is really needed, if implementing a feature takes longer than
         | it would in Python.
         | 
         | I don't like bloat in general, but sometimes it can be worth
         | spinning up a few extra instances to get to market faster. If
         | Python lets you implement a feature a month earlier, the new
         | sales may even cover the additional infrastructure costs.
         | 
         | Once you reach a certain scale you may need to rewrite parts of
         | your system anyway, because the assumptions you made are often
         | wrong.
        
           | game_the0ry wrote:
           | > Developers should also question if a "fast" language like
           | Rust is really needed...
           | 
           | Agreed.
        
         | wiseowise wrote:
         | Do you get off from bashing on languages or what?
        
         | ActorNightly wrote:
         | >"Use the right tool for the job."
         | 
         | Python + C covers pretty much anything you really ever need to
         | build, unless you are doing something with game engines that
         | require the use of C++/C#. Rust is even more niche.
        
       | crabbone wrote:
       | Again and again, the most important question is "why?" not
       | "how?". Python isn't made to be fast. If you wanted a language
       | that can go fast, you needed to build it into the language from
       | the start: give developers tools to manage memory layout, give
       | developers tools to manage execution flow, hint the compiler
       | about situations that present potential for optimization,
       | restrict dispatch and polymorphism, restrict semantics to fewer
       | interpretations.
       | 
       | Python has none of that. It's a hyper-bloated language with
       | extremely poor design choices all around. Many ways of doing the
       | same thing, many ways of doing stupid things, no way of
       | communicating programmer's intention to the compiler... So why
       | even bother? Why not use a language that's designed by a sensible
       | designer for this specific purpose?
       | 
       | The news about performance improvements in Python just sound to
       | me like spending useful resources on useless goals. We aren't
       | going forward by making Python slightly faster and slightly more
       | bloated, we just make this bad language even harder to get rid
       | of.
        
         | Danmctree wrote:
         | The frustrating thing is that the math and AI support in the
         | python ecosystem is arguably the best. These happen to also be
         | topics where performance is critical and where you want things
         | to be tight.
         | 
         | c++ has great support too but often isn't usable in communities
         | involving researchers and juniors because it's too hard for
         | them. Startup costs are also much higher.
         | 
         | Ans so you're often stuck with python.
         | 
         | We desperately need good math/AI support in faster languages
         | than python but which are easier than c++. c#? Java?
        
       | adsharma wrote:
       | The most interesting part of this article is the link to SPy.
       | Attempts to find a subset of python that could be made
       | performant.
        
         | ajross wrote:
         | Honestly that seems Sisyphean to me. The market doesn't want a
         | "performant subset". The market is very well served by
         | performant languages. The market wants Python's expressivity.
         | The market wants duck typing and runtime-inspectable type
         | hierarchies and mutable syntax and decorators. It loves it.
         | It's why Python is successful.
         | 
         | My feeling is that numba has exactly the right tactic here.
         | Don't try to subset python from on high, give developers the
         | tools[1] so that they can limit _themselves_ to the fast
         | subset, for the code they actually want. And let them make the
         | call.
         | 
         | (The one thing numba completely fails on though is that it
         | insists on using its own 150+MB build of LLVM, so it's not
         | nearly as cleanly deployable as you'd hope. Come on folks, if
         | you use the system libc you should be prepared to use the
         | system toolchain.)
         | 
         | [1] Simple ones, even. I mean, to first approximation you just
         | put "@jit" on the stuff you want fast and make sure it sticks
         | to a single numeric type and numpy arrays instead of python
         | data structures, and you're done.
        
           | zozbot234 wrote:
           | > The market wants duck typing and runtime-inspectable type
           | hierarchies and mutable syntax and decorators. It loves it.
           | 
           | These features have one thing in common: they're only useful
           | for prototype-quality throwaway code, if at all. Once your
           | needs shift to an increased focus on production use and
           | maintainability, they become serious warts. It's not just
           | about performance (though it's obviously a factor too),
           | there's real reasons why most languages don't do this.
        
             | ajross wrote:
             | > These features have one thing in common: they're only
             | useful for prototype-quality throwaway code, if at all.
             | 
             | As a matter of practice: the python community disagrees
             | strongly. And the python community ate the world.
             | 
             | It's fine to have an opinion, but you're not going to
             | change python.
        
               | Philpax wrote:
               | The existence of several type-checkers and Astral's
               | largely-successful efforts to build tooling that pulls
               | Python out of its muck seems to suggest otherwise.
               | 
               | Better things are possible, and I'm hoping that higher
               | average quality of Python code is one of those things.
        
               | adsharma wrote:
               | That assumes python is one monolithic thing and everyone
               | agrees what it is.
               | 
               | True, the view you express here has strong support in the
               | community and possibly in the steering committee.
               | 
               | But there are differing ideas on what python is and why
               | it's successful.
        
               | ajross wrote:
               | > That assumes python is one monolithic thing and
               | everyone agrees what it is.
               | 
               | It's exactly the opposite! I'm saying that python is _BIG
               | AND DIVERSE_ and that attempts like SPy to invent a new
               | (monolithic!) subset language that everyone should use
               | instead are doomed, because it won 't meet the needs of
               | all the yahoos out there doing weird stuff the SPy
               | authors didn't think was important.
               | 
               | It's fine to have "differing ideas on what python is",
               | but if those ideas don't match those of _all_ of the
               | community, and not just what you think are the good
               | parts, it 's not really about what "python" is, is it?
        
           | adsharma wrote:
           | My cursory reading is that SPy is generous in what it
           | accepts.
           | 
           | The subset I've been working with is even narrower. Given my
           | stance on pattern matching, it may not even be a subset.
           | 
           | https://github.com/py2many/py2many/blob/main/doc/langspec.md
        
       | pabe wrote:
       | The SPy demo is really good in showing the the difference in
       | performance between Python and their derivative. Well done!
        
       | hansvm wrote:
       | In the "dynamic" section, it's much worse than the author
       | outlines. You can't even assume that the constant named "10" will
       | point to a value which behaves like you expect the number 10 to
       | behave.
        
         | zahlman wrote:
         | I guess you mean "N". 10 is a literal, not a name. The part "N
         | cannot be assumed to be ten, because that could be changed
         | elsewhere in the code" implies well enough that the change
         | could be to a non-integer value. (For that matter, writing `N:
         | int = 10` does nothing to fix that.)
        
           | hansvm wrote:
           | No, I mean the literal. CPython is more flexible than it has
           | any right to be, and you're free to edit the memory pointed
           | to by the literal 10.
        
             | zahlman wrote:
             | Care to show how you believe this can be achieved, from
             | within Python?
        
               | hansvm wrote:
               | import ctypes            ten = 10       addr = id(ten)
               | class PyLongObject(ctypes.Structure):           _fields_
               | = [               ("ob_refcnt", ctypes.c_ssize_t),
               | ("ob_type", ctypes.c_void_p),               ("ob_size",
               | ctypes.c_ssize_t),               ("ob_digit",
               | ctypes.c_uint32 * 1),           ]       long_obj =
               | PyLongObject.from_address(addr)
               | long_obj.ob_digit[0] = 3       assert 10 == 3
               | # using an auxiliary variable to prevent any inlining
               | # done at the interpreter level before actually querying
               | # the value of the literal `10`       x = 3       assert
               | 10 * x == 9       assert 10 + x == 6
        
               | zahlman wrote:
               | Okay, but this is going out of one's way to view the
               | runtime itself as a C program and connect to it with the
               | FFI. For that matter, the notion that the result of `id`
               | (https://docs.python.org/3/library/functions.html#id)
               | could sensibly be passed to `from_address` is an
               | implementation detail. This is one reason the language
               | suffers from not having a formal specification: it's
               | unclear exactly how much of this madness alternative
               | implementations like PyPy are expected to validate
               | against. But I think people would agree that poking at
               | the runtime's own memory cannot be expected to give
               | deterministic results, and thus the implementation should
               | in fact consider itself free to assume that isn't
               | happening. (After all, we could take that further; e.g.
               | what if we had another process do the dirty work?)
        
               | hansvm wrote:
               | Except, that sort of thing is important in places like
               | gevent, pytest, and numba, and that functionality isn't
               | easy to replace without a lot of additional
               | language/stdlib work (no sane developer would reach for
               | it if other APIs sufficed).
               | 
               | The absurd example of overwriting the literal `10` is
               | "obviously" bad, but your assertion that the interpreter
               | should be able to assume nobody is overwriting its memory
               | isn't borne out in practice.
        
               | zahlman wrote:
               | > Except, that sort of thing is important in places like
               | gevent, pytest, and numba
               | 
               | What, mutating the data representation of built-in types
               | documented to be immutable? For what purpose?
        
       | pu_pe wrote:
       | Python and other high-level languages may actually decrease in
       | popularity with better LLMs. If you are not the one programming
       | it, might as well do it in a more performant language from the
       | start.
        
         | richard_todd wrote:
         | In my workflows I already tend to tell LLMs to write scripts in
         | Go instead of python. The LLM doesn't care about the increased
         | tediousness and verbosity that would drive me to Python, and
         | the result will be much faster.
        
           | Philpax wrote:
           | I saw a short post to this effect here:
           | https://solmaz.io/typed-languages-are-better-suited-for-
           | vibe...
        
       | fumeux_fume wrote:
       | Slow or fast ultimately matter in the context for which you need
       | to use it. Perhaps these are only myths and fairly tales for an
       | incredibly small subset of people who value execution speed as
       | the highest priority, but choose to use Python for
       | implementation.
        
       | Mithriil wrote:
       | > His "sad truth" conclusion is that "Python cannot be super-
       | fast" without breaking compatibility.
       | 
       | A decent case of Python 4.0?
       | 
       | > So, maybe, "a JIT compiler can solve all of your problems";
       | they can go a long way toward making Python, or any dynamic
       | language, faster, Cuni said. But that leads to "a more subtle
       | problem". He put up a slide with a trilemma triangle: a dynamic
       | language, speed, or a simple implementation. You can have two of
       | those, but not all three.
       | 
       | This trilemma keeps getting me back towards Julia. It's less
       | simple than Python, but much faster (mitigated by pre-compilation
       | time), and almost as dynamic. I'm glad this language didn't die.
        
         | Alex3917 wrote:
         | > A decent case of Python 4.0?
         | 
         | I definitely agree with this eventually, but for now why not
         | just let developers set `dynamic=False` on objects and make it
         | opt in? This is how Google handles breaking Angular upgrades,
         | and in practice it works great because people have multiple
         | years to prepare for any breaking changes.
        
         | zahlman wrote:
         | > A decent case of Python 4.0?
         | 
         | I think "Python 4.0" is going to have to be effectively a new
         | language by a different team that simply happens to bear strong
         | syntactic similarities. (And at least part of why that isn't
         | already happening is that everyone keeps getting scared off by
         | the scale of the task.)
         | 
         | Thanks for the reminder that I never got around to checking out
         | Julia.
        
           | olejorgenb wrote:
           | Isn't that kinda what Mojo is?
        
             | zahlman wrote:
             | I haven't tried it, but that matches my understanding,
             | yeah.
             | 
             | Personally I'd be more interested in designing from
             | scratch.
        
         | rirze wrote:
         | If Julia fixes it package manager problems (does it still take
         | a while to load imports?), I think it could become popular.
        
         | rybosome wrote:
         | Yeah, this is a case of "horses for courses", as you suggest.
         | 
         | I love Python. It's amazing with uv; I just implemented a
         | simple CLI this morning for analyzing data with inline
         | dependencies that's absolutely perfect for what I need and is
         | extremely easy to write, run, and tweak.
         | 
         | Based on previous experience, I would not suggest Python should
         | be used for an API server where performance - latency,
         | throughput - and scalability of requests is a concern. There's
         | lots of other great tools for that. And if you need to write an
         | API server and it's ok not to have super high performance, then
         | yeah Python is great for that, too.
         | 
         | But it's great for what it is. If they do make a Python 4.0
         | with some breaking changes, I hope they keep the highly
         | interpreted nature such that something like Pydantic continues
         | to work.
        
       | taeric wrote:
       | Is amusing to see the top comment on the site be about how Common
       | LISP approached this. And hard not to agree with it.
       | 
       | I don't understand how we had super dynamic systems decades ago
       | that were easier to optimize than people care to understand.
       | Heaven help folks if they ever get a chance to use Mathematica.
        
       | tuna74 wrote:
       | In computing terms, saying something is "slow" is kind of
       | pointless. Saying something is "effective" or "low latency"
       | provides much more information.
        
       | Redoubts wrote:
       | Wonder if mojo has gotten anywhere further, since they're trying
       | to bring speed while not sacrificing most of the syntax
       | 
       | https://docs.modular.com/mojo/why-mojo/#a-member-of-the-pyth...
        
       | actinium226 wrote:
       | A lot of the examples he gives, like the numpy/calc function, are
       | easily converted to C/C++/Rust. The article sort of dismisses
       | this at the start, and that's fine if we want to focus on the
       | speed of Python itself, but it seems like both the only solution
       | and the obvious solution to many of the problems specified.
        
       | lkirk wrote:
       | For me, in my use of Python as a data analysis language, it's not
       | python's speed that is an annoyance or pain point, it's the
       | concurrency story. Julia's built in concurrency primatives are
       | much more ergonomic in my opinion.
        
       | 1vuio0pswjnm7 wrote:
       | "He started by asking the audience to raise their hands if they
       | thought "Python is slow or not fast enough";"
       | 
       | Wrong question
       | 
       | Maybe something like, "Python startup time is as fast as other
       | interpreters"
       | 
       | Comparatively, Python (startup time) is slow(er)
        
       | ehsantn wrote:
       | The article highlights important challenges regarding Python
       | performance optimization, particularly due to its highly dynamic
       | nature. However, a practical solution involves viewing Python
       | fundamentally as a Domain Specific Language (DSL) framework,
       | rather than purely as a general-purpose interpreted language.
       | DSLs can effectively be compiled into highly efficient machine
       | code.
       | 
       | Examples such as Numba JIT for numerical computation, Bodo
       | JIT/dataframes for data processing, and PyTorch for deep learning
       | demonstrate this clearly. Python's flexible syntax enables
       | creating complex objects and their operators such as array and
       | dataframe operations, which these compilers efficiently transform
       | into code approaching C++-level performance. DSL operator
       | implementations can also leverage lower-level languages such as
       | C++ or Rust when necessary. Another important aspect not
       | addressed in the article is parallelism, which DSL compilers
       | typically handle quite effectively.
       | 
       | Given that data science and AI are major use cases for Python,
       | compilers like Numba, Bodo, and PyTorch illustrate how many
       | performance-critical scenarios can already be effectively
       | addressed. Investing further in DSL compilers presents a
       | practical pathway to enhancing Python's performance and
       | scalability across numerous domains, without compromising
       | developer usability and productivity.
       | 
       | Disclaimer: I have previously worked on Numba and Bodo JIT.
        
         | echoangle wrote:
         | Was this comment written by an LLM?
        
       ___________________________________________________________________
       (page generated 2025-08-06 23:01 UTC)