[HN Gopher] Python 3.13 Gets a JIT
       ___________________________________________________________________
        
       Python 3.13 Gets a JIT
        
       Author : todsacerdoti
       Score  : 850 points
       Date   : 2024-01-09 08:35 UTC (14 hours ago)
        
 (HTM) web link (tonybaloney.github.io)
 (TXT) w3m dump (tonybaloney.github.io)
        
       | milliams wrote:
       | Brandt gave a talk about this at the CPython Core Developer
       | Sprint late last year https://www.youtube.com/watch?v=HxSHIpEQRjs
        
       | ageitgey wrote:
       | For the lazy who just want to know if this makes Python faster
       | yet, this is foundational work to enable later improvements:
       | 
       | > The initial benchmarks show something of a 2-9% performance
       | improvement.
       | 
       | > I think that whilst the first version of this JIT isn't going
       | to seriously dent any benchmarks (yet), it opens the door to some
       | huge optimizations and not just ones that benefit the toy
       | benchmark programs in the standard benchmark suite.
        
         | Topfi wrote:
         | Honestly, 2-9% already seems like a very signficant
         | improvement, especially since as they mention "remember that
         | CPython is already written in C". Whilst it's great to look at
         | the potential for even greater gains by building upon this
         | work, I feel we shouldn't undersell what's been accomplished.
        
           | adastra22 wrote:
           | What is being accomplished then?
        
             | gray_-_wolf wrote:
             | 2-9%
        
           | bomewish wrote:
           | Also recall that a 50% speed improvement in SQLite was caused
           | by 50-100 different optimisations that each eeked out 0.5-1%
           | speedups. On phone now don't have the ref but it all adds up.
        
             | boxed wrote:
             | Many small improvements is the way to go in most
             | situations. It's not great clickbait, but we should
             | remember that we got from a single cell at some time to
             | humans through many small changes. The world would be a lot
             | better if people just embraced the grind of many small
             | improvements...
        
             | toyg wrote:
             | Marginal gains.
             | https://www.bbc.co.uk/news/magazine-34247629
        
             | Akronymus wrote:
             | I tried searching for that article because I vaguely recall
             | it, but can't find it either. But yeah, a lot of small
             | improvements add up. Reminds me of this talk:
             | https://www.youtube.com/watch?v=NZ5Lwzrdoe8
        
               | Topfi wrote:
               | Here is a source for the SQLite case:
               | https://topic.alibabacloud.com/a/sqlite-387-a-large-
               | number-o...
        
               | Akronymus wrote:
               | That looks like blogspam to me, rather than an actual
               | source.
        
               | formerly_proven wrote:
               | https://sqlite-
               | users.sqlite.narkive.com/CVRvSKBs/50-faster-t...
        
             | IshKebab wrote:
             | That's true, and Rust compiler speed has seen similar
             | speedups from lots of 1% improvements.
             | 
             | But even if you can get a 2x improvement from lots of 1%
             | improvements (if you work really really hard), you're never
             | going to get a 10x improvement.
             | 
             | Rust is never going to compile remotely as quickly as Go.
             | 
             | Python is never going to be remotely as fast as Rust, C++,
             | Go, Java, C#, Dart, etc.
        
               | inglor_cz wrote:
               | Does it matter?
               | 
               | Trains are never going to beat jets in pure speed. But in
               | certain scenarios, trains make a lot more sense to use
               | than jets, and in those scenarios, it is usually
               | preferable having a 150 mph train to a 75 mph train.
               | 
               | Looking at the world of railways, high-speed rail has
               | attracted a lot more paying customers than legacy
               | railways, even though it doesn't even try to achieve
               | flight-like speeds.
               | 
               | Same with programming languages, I guess.
        
               | fl0ki wrote:
               | What is the programming analogy here?
               | 
               | Two decades ago, you could (as e.g. Paul Graham did at
               | the time) argue that dynamically typed languages can get
               | your ideas to market faster so you become viable and
               | figure out optimization later.
               | 
               | It's been a long time since that argument held. Almost
               | every dynamic programming language still under active
               | development is adding some form of gradual typing because
               | the maintainability benefits alone are clearly
               | recognized, though such languages still struggle to
               | optimize well. Now there are several statically typed
               | languages to choose from that get those maintainability
               | benefits up-front and optimize very well.
               | 
               | Different languages can still be a better fit for
               | different projects, e.g. Rust, Go, and Swift are all
               | statically typed compiled languages better fit for
               | different purposes, but in your analogy they're all jets
               | designed for different tactical roles, none of them are
               | "trains" of any speed.
               | 
               | Analogies about how different programming languages are
               | like different vehicles or power tools or etc go way back
               | and have their place, but they have to recognize that
               | sometimes one design approach largely supersedes another
               | for practical purposes. Maybe the analogy would be
               | clearer comparing jets and trains which each have their
               | place, to horse-drawn carriages which still exist but are
               | virtually never chosen for their functional benefits.
        
               | inglor_cz wrote:
               | I cut my teeth on C/C++, and I still develop the same
               | stuff faster in Python, with which I have _less_ overall
               | experience by almost 18 years. Python is also much easier
               | to learn than, say, Rust, or the current standard of C++
               | which is a veritable and intimidating behemoth.
               | 
               | In many domains, it doesn't really matter if the
               | resulting program runs in 0.01 seconds or 0.1 seconds,
               | because the dominant time cost will be in user input, DB
               | connection etc. anyway. But it matters if you can crank
               | out your basic model in a week vs. two.
        
               | fl0ki wrote:
               | > Python is also much easier to learn than, say, Rust
               | 
               | I don't doubt it, but learning is only the first step to
               | using a technology for a series of projects over years or
               | even decades, and that step doesn't last that long.
               | 
               | People report being able to pick up Rust in a few weeks
               | and being very productive. I was one of them, if you
               | already got over the hill that was C++ then it sounds
               | like you would be too. The point is that you and your
               | team stay that productive as the project gets larger,
               | because you can all enforce invariants for yourselves
               | rather than have to carry their cognitive load and make
               | up the extra slack with more testing that would be
               | redundant with types.
               | 
               | Outside of maybe a 3 month internship, when is it
               | worthwhile to penalize years of software maintenance to
               | save a few weeks of once-off up-front learning? And it's
               | not like you save it completely, writing correct Python
               | still takes some learning too, e.g. beginners easily get
               | confused about when mutable data structures are silently
               | being shared and thus modified when they don't expect it.
               | People who are already very comfortable with Python
               | forget this part of their own learning curve, just like
               | people very comfortable with Rust forget their first
               | borrow check header scratcher.
               | 
               | I never made a performance argument in this thread so I'm
               | not sure why 0.01 or 0.1 seconds matters here. Even the
               | software that got you into a commercial market has to be
               | maintained once you get there. Ask Meta how they feel
               | about the PHP they're stuck with, for example.
        
           | spacechild1 wrote:
           | > "remember that CPython is already written in C"
           | 
           | What is this supposed to say? Most scripting language
           | interpreters are written in low level languages (or
           | assembly), but that alone doesn't say anything about the
           | performance of the language itself.
        
             | nertirs wrote:
             | This means, that a lot of python libraries like polars or
             | tensorflow are written not in python.
             | 
             | So python programs, that already spend most of its cpu time
             | running these libraries code, won't see much of an impact.
        
               | gh02t wrote:
               | Isn't the point that if pure Python was faster they
               | wouldn't need to be written in other [compiled]
               | languages? Having dealt with Cython it's not bad, but if
               | I could write more of my code in native Python my
               | development experience would be a lot simpler.
               | 
               | Granted we're still very far from that and probably won't
               | ever reach it, but there definitely seems to be a lot of
               | progress.
        
               | __MatrixMan__ wrote:
               | Since Nim compiles to C, a middle step worth being aware
               | of is Nim + nimporter which isn't anywhere near "just
               | python" but is (maybe?) closer than "compile a C binary
               | and call it from python".
               | 
               | Or maybe it's just syntactic sugar around that. But sugar
               | can be nice.
        
             | eequah9L wrote:
             | I think they mean that a lot of runtime of any benchmark is
             | going to be spent in the C bits of the standard library,
             | and therefore not subject to the JIT. Only the glue code
             | and the bookkeeping or whatnot that the benchmark
             | introduces would be improved by the JIT. This reduces the
             | impact that the JIT can make.
        
         | blagie wrote:
         | From the write-up, I honestly don't understand how this paves
         | the way. I don't see an architectural path from a cut-and-paste
         | JIT to something optimizing. That's the whole point of a cut-
         | and-paste JIT.
        
           | lumpa wrote:
           | There's a lot of effort going on to improve CPython
           | performance, with optimization tiers, etc. It seems the JIT
           | is how at least part of that effort will materialize:
           | https://github.com/python/cpython/issues/113710
           | 
           | > We're getting a JIT. Now it's time to optimize the traces
           | to pass them to the JIT.
        
           | guenthert wrote:
           | Isn't it the case that Python allows for type specifier (type
           | hints) since 3.5, albeit the CPython interpreter ignores
           | them? The JIT might take advantage of them, which ought to
           | improve performance significantly for some code.
           | 
           | That what makes Python flexible is what makes it slow.
           | Restricting the flexibility were possible offers
           | opportunities to improve performance (and allows for tools
           | and humans to spot errors more easily).
        
             | a-french-anon wrote:
             | Isn't CL a good counter-example to that "dynamism
             | inherently stunts performances" mantra?
        
               | guenthert wrote:
               | To the contrary. In CL some flexibility was given up
               | (compared to other LISP dialects) in favor of enabling
               | optimizing compilers, e.g. the standard symbols cannot be
               | reassigned (also preserving the sanity of human readers).
               | CL also offers what some now call 'gradual typing', i.e.
               | optional type declarations. And remaining flexibility,
               | e.g. around the OO support, limits how well the compiler
               | can optimize the code.
        
               | Joker_vD wrote:
               | But type declarations in Python are not required to be
               | correct, are they? You are allowed to write
               | def twice(x: int) -> int:             return x + x
               | print(twice("nope"))
               | 
               | and it should print "nopenope". Right?
        
               | Ringz wrote:
               | Yep. Therefore it's better to                  def
               | twice(x: int) -> int:        if not isinstance(x, int):
               | raise TypeError("Expected x to be an int, got " +
               | str(type(x)))         return x + x
        
               | jhardy54 wrote:
               | This can have substantial performance implications, not
               | to mention DX considerations.
        
               | Ringz wrote:
               | Of course, this is not a good example of good, high-
               | performance code, only an answer to the specific
               | question... the questioner certainly also knows MyPy.
        
               | Joker_vD wrote:
               | I actually don't know anything about MyPy, only that it
               | exists. Does it run that example correctly, that is, does
               | it print "nopenope"? Because I think it's the correct
               | behaviour, type hints should not actually affect
               | evaluation (well, beyond the fact that they must be names
               | that are visible in the scopes thay're used in,
               | obviously), altough I could be wrong.
               | 
               | Besides, my point was that one of the reasons why
               | languages with (sound-ish) static types manage to have
               | better performance because they can omit all of those
               | run-time type checks (and the supporting machinery)
               | because they'd never fail. And if you have to put those
               | explicit checks, then the type hints are actually
               | entirely redundant: e.g. Erlang's JIT ignores type specs,
               | it instead looks at the type guards in the code to
               | generate specialized code for the function bodies.
        
               | sgerenser wrote:
               | Surely this is the job for a linter or code generator (or
               | perhaps even a hypothetical 'checked' mode in the
               | interpreter itself)? Ain't nobody got time to add manual
               | type checks to every single function.
        
               | Ringz wrote:
               | Of course not. That's what MyPy is for. It was only about
               | the answer to exactly this question in this function.
        
               | adhamsalama wrote:
               | Or use mypy.
        
               | VagabundoP wrote:
               | The Python language server in Visual Studio Code will
               | catch this if type checking is turned on, but by default,
               | in CPython, that code will just work.
        
               | kazinator wrote:
               | Standard symbols being reassigned also breaks macros.
        
             | formerly_proven wrote:
             | You can't really on type annotations to help interpret the
             | code.
        
             | Difwif wrote:
             | AFAIK good JITs like V8 can do runtime introspection and
             | recompile on the fly if types change. Maybe using the type
             | hints will be helpful but I don't think they are necessary
             | for significant improvement.
        
               | amelius wrote:
               | Are there any benchmarks that give an idea of how much
               | this might improve Python's speed?
        
               | mike_hearn wrote:
               | Well, GraalPython is a Python JIT compiler which can
               | exploit dynamically determined types, and it advertises
               | 4.3x faster, so it's possible to do drastically better
               | than a few percent. I think that's state of the art but
               | might be wrong.
               | 
               | That's for this benchmark:
               | 
               | https://pyperformance.readthedocs.io/
               | 
               | Note that this is with a relatively small investment as
               | these things go, the GraalPython team is about ~3 people
               | I guess, looking at the GH repo. It's an independent
               | implementation so most of the work went into being
               | compatible with Python including native extensions (the
               | hard part).
               | 
               | But this speedup depends a lot on what you're doing. Some
               | types of code can go much faster. Others will be slower
               | even than CPython, for example if you want to sandbox the
               | native code extensions.
        
               | amelius wrote:
               | This is great info, thanks!
        
               | pletnes wrote:
               | Pypy is a different JIT that gives anything from
               | slower/same to 100x speedup depending on the benchmark.
               | They give a geometric mean of 4.8x speedup across their
               | suite of benchmarks. https://speed.pypy.org/
        
               | cuchoi wrote:
               | Doesn't Python already do this?
               | https://www.youtube.com/watch?v=shQtrn1v7sQ
        
             | dataangel wrote:
             | I doubt it with a copy-and-patch JIT, not the way they work
             | now. I'm a serious mypy/python-static-types user and as is
             | they currently wouldn't allow you to do much optimization
             | wise.
             | 
             | - All integers are still big integers
             | 
             | - Use of the typing opt-out 'Any' is very common
             | 
             | - All functions/methods can still be overwritten at runtime
             | 
             | - Fields can still be added and removed from objects at
             | runtime
             | 
             | The combination basically makes it mandatory to not use
             | native arithmetic, allocate everything on the heap, and
             | need multiple levels of indirection for looking up any
             | variable/field/function. CPU perf nightmare. You need a
             | real optimizing JIT to track when integers are in a narrow
             | range and things aren't getting redefined at runtime.
        
             | tekknolagi wrote:
             | Sort of! But also not really. If you want to get into this,
             | I wrote a post about this:
             | https://bernsteinbear.com/blog/typed-python/
        
           | Someone wrote:
           | It should be fairly easy to add instruction fusing, where
           | they recognize often-used instruction pairs, combine their C
           | code, and then let the compiler optimize the combined code.
           | Combining _LOAD_CONST_ with the instruction following it if
           | that instruction pops the const from the stack seems an easy
           | win, for example.
        
             | ncruces wrote:
             | If it was that easy, you'd do that in the interpreter and
             | proportionally reduce interpretation overhead.
        
               | Someone wrote:
               | In the interpreter, I don't think it wouldn't reduce
               | overhead much, if at all. You'd still have to recognize
               | the two byte codes, and your interpreter would spend
               | additional time deciding, for most byte code pairs, that
               | it doesn't know how to combine them.
               | 
               | With a compiler, that part is done once and, potentially,
               | run zillions of times.
        
               | ncruces wrote:
               | If fusing a certain pair would significantly improve
               | performance of most code, you'd just add that fused
               | instruction to your bytecode and let the C compiler
               | optimize the combined code in the interpreter. I have to
               | assume CPython as already done that for all the low
               | hanging fruit.
               | 
               | In fact, for such a fused instruction to be optimized
               | that way on a copy-and-patch JIT it'd need to exist as a
               | new bytecode in interpreter. A JIT that fuses
               | instructions is no longer a copy-and-patch JIT.
               | 
               | A copy-and-patch JIT reduces interpretation overhead by
               | making sure the branches in the executed machine code are
               | the branches in the code to be interpreted, not branches
               | in the interpreter.
               | 
               | This is make a huge difference in more naive
               | interpreters, not so much in an heavily optimized
               | threaded-code interpreter.
               | 
               | The 10% is great, and nothing to sneeze at for a first
               | commit. But I'd actually like some realistic analysis of
               | next steps for improvement, because I'm skeptical
               | instruction fusing and other things being hand waved are
               | it. Certainly not on a copy-and-patch JIT.
               | 
               | For context: I spent significant effort trying to add
               | such instruction fusing to a simple WASM AOT compiler and
               | got nowhere (the equivalent of constant loading was
               | precisely one of the pairs). Only moving to a much
               | smarter JIT (capable of looking at whole basic blocks of
               | instructions) started making a difference.
        
           | londons_explore wrote:
           | > . I don't see an architectural path from a cut-and-paste
           | JIT to something optimizing.
           | 
           | One approach used in V8 is to have a dumb-but-very-fast JIT
           | (ie. this), and keep counters of how often each block of code
           | runs (perhaps actual counters, perhaps using CPU sampling
           | features), and then any block of code running more than a few
           | thousand times run through a far more complex yet slower
           | optimizing jit.
           | 
           | That has the benefit that the 0.2% of your code which uses
           | 95% of the runtime is the only part that has to undergo the
           | expensive optimization passes.
        
             | Sesse__ wrote:
             | Note that V8 didn't have a dumb-but-very-fast JIT
             | (Sparkplug) until 2021; the interpreter (Ignition) did that
             | block counting and sent it straight to the optimizing JIT
             | (TurboFan).
             | 
             | V8 pre-2021 (i.e., only Ignition+TurboFan) was
             | significantly faster than current CPython is, and the full
             | current four-tier bundle
             | (Ignition+Sparkplug+Maglev+TurboFan) only scores roughly
             | twice as good on Speedometer as pure Ignition does.
             | (Ignition+Sparkplug is about 40% faster than Ignition
             | alone; compare that "dumbness" with CPython's 2-9%.) The
             | relevant lesson should be that things like very carefully
             | designed value representation and IR is a much more
             | important piece of the puzzle than having as many tiers of
             | compilation as possible.
        
               | uluyol wrote:
               | In case anyone is interested, V8 pre-ignition/TurboFan
               | had different tiers [1]: full-codegen (dumb and fast) and
               | crankshaft (optimizing). It's interesting to see how
               | these things change over time.
               | 
               | [1]: https://v8.dev/blog/ignition-interpreter
        
             | dataangel wrote:
             | > keep counters of how often each block of code runs ...
             | and then any block of code running more than a few thousand
             | times run through a far more complex yet slower optimizing
             | jit.
             | 
             | That's just all JITs. Sometimes its counters for going from
             | interpreter -> JIT rather than levels of JITs, but this
             | idea is as old as JITs.
        
           | fulafel wrote:
           | Support for generating machine code at all seems like a
           | necessary building block to me and probably is quite a bit of
           | effort to work on top of a portable interpreter code base.
        
         | lifthrasiir wrote:
         | An important context here is that the same code was reused for
         | interpreter and JIT implementations (that's a main selling
         | point for copy-and-patch JIT). In the other words, this 2--9%
         | improvement mostly represents the core interpreter overhead
         | that JIT should significant reduce. It was even possible that
         | JIT itself might have no performance impact by itself, so this
         | result is actually very encouraging; any future opcode
         | specialization and refinement should directly translate to a
         | measurable improvement.
        
           | formerly_proven wrote:
           | Copy&patch seems not much worse than compiling pure Python
           | with Cython, which roughly corresponds to "just call whatever
           | CPython API functions the bytecode interpreter would call for
           | this bunch of Python", so that's roughly a baseline for how
           | much overhead you get from the interpeter bit.
        
             | lifthrasiir wrote:
             | There is no reason to use copy-and-patch JIT if that were
             | the case, because the good old threaded interpreter would
             | have been fine. There are other optimization works in
             | parallel with this JIT effort, including finer-grained
             | micro operations (uops) that can replace usual opcodes at
             | higher tiers. Uops themselves can be used without JIT, but
             | the interpreter overhead is proportional to the number of
             | (u)ops executed and would be too large for uops. The hope
             | is that copy-and-patch JIT combined with uops have to be
             | much faster than threaded code.
        
               | ncruces wrote:
               | A threaded interpreter still has one branch per bytecode
               | instruction; a copy-and-patch JIT removes this overhead.
        
         | vanderZwan wrote:
         | You're right, and in this case "foundational work" even
         | undersells how minimal this work really is compared to the
         | results it already gets.
         | 
         | I recommend that people watch Brandt Bucher's _" A JIT Compiler
         | for CPython"_ from last year's CPython Core Developer
         | Sprint[0]. It gives a good impression of the current
         | implementation and its limitations, and some hints at what may
         | or may not work out. It also indirectly gives a glimpse into
         | the process of getting this into Python through the exchanges
         | during the Q&A discussion.
         | 
         | One thing to especially highlight is that this copy-and-patch
         | has a much, much lower implementation complexity for the
         | maintainers, as a lot of the heavy lifting is offloaded to
         | LLVM.
         | 
         | Case in point: as of the talk this was all just Brandt Bucher's
         | work. The implementation at the time was ~700 lines of
         | "complex" Python, ~100 lines of "complex" C, plus of course the
         | LLVM dependency. This produces ~3000 lines of "simple"
         | generated C, requires an additional ~300 lines of "simple"
         | hand-written C to come together, and _no further dependencies_
         | (so no LLVM necessary to run the JIT. Also  "complex" and
         | "simple" qualifiers are Bucher's terms, not mine).
         | 
         | Another thing to note is that these initial performance
         | improvements are _just from getting this first version of the
         | copy-and-patch JIT to work at all_ , without really doing any
         | further fine-tuning or optimization.
         | 
         | This may have changed a bit in the months since, but the
         | situation is probably still comparable.
         | 
         | So if one person can get this up and running in a few klocs,
         | most of which are generated, I think it's reasonable to have
         | good hopes for its future.
         | 
         | [0] https://www.youtube.com/watch?v=HxSHIpEQRjs
        
         | sylware wrote:
         | I removed the SDKs of some big (big for the wrong reasons) open
         | source projects which generates a lot of code using python3
         | scripts.
         | 
         | In those custom SDKs, I do generate all the code at the start
         | of the build, which takes a significant amount of time for
         | mostly non-pertinent anymore/inappropiately done code
         | generation.. I will really feel python3 speed improvement for
         | those builds.
        
         | attractivechaos wrote:
         | I wouldn't be so enthusiastic. Look at other languages that
         | have JIT now: Ruby and PHP. After years of efforts, they are
         | still an order of magnitude slower than V8 and even PyPy [1].
         | It seems to me that you need to design a JIT implementation
         | from ground up to get good performance - V8, Dart and LuaJIT
         | are like this; if you start with a pure interpreter, it may be
         | difficult to speed it up later.
         | 
         | [1] https://github.com/attractivechaos/plb2
        
           | vlovich123 wrote:
           | PyPy is designed from the ground up and is still slower than
           | V8 AFAIK. Don't forget that v8 has enormous amounts of
           | investment from professionally paid developers whereas PyPy
           | is funded by government grants. Not sure about Ruby & PHP and
           | it's entirely possible that the other JIT implementations are
           | choosing simplicity of maintenance over eking out every
           | single bit of performance.
           | 
           | Python also has structural challenges like native extensions
           | (don't exist in JavaScript) where the API forces slow code or
           | massive hacks like avoiding the C API at all costs (if I
           | recall correctly I read that's being worked on) and the GIL.
           | 
           | One advantage Python had is the ability to use multiple cores
           | way before JS but the JS ecosystem remained single threaded
           | longer & decided to use message passing instead to build
           | WebWorkers which let the JIT remain fast.
        
             | attractivechaos wrote:
             | PyPy is only twice as slow as v8 and is about an order of
             | magnitude faster than CPython. It is quite an achievement.
             | I would be very happy if CPython could get this performance
             | but I doubt.
        
         | chaxor wrote:
         | Anyone know if there will be any better tools for cross-
         | compiling python projects?
         | 
         | The package management and build tools for python have been so
         | atrociously bad (environments add far too much complexity to
         | the ecosystem) that it turns many developers away from the
         | language altogether. A system like Rust's package management,
         | build tools, and cross compilation capability is an enormous
         | draw, even without the memory safety. The fact that it
         | _actually works_ (because of the package management and build
         | tools) is the main reason to use the language really. Python
         | used to do that ~10 years ago. Now _absolutely nothing_ works.
         | It takes weeks to get simple packages working, only can do
         | anything under extremely brittle conditions that nullify the
         | project you 're trying to use this other package for, etc.
         | 
         | If python could ever get it's act together and make better
         | package management, and allow for cross-compiling, it could
         | make a big difference. (I am aware of the very basic fact that
         | it's interpreted rather than compiled yada yada - there are
         | still ways to make executables, they are just awful). Since
         | python is data science centric, it would be good to have decent
         | data management capabilities too, but perhaps that could be
         | after fundamental problem are dealt with.
         | 
         | I tried looking at mojo, but it's not open source, so I'm quite
         | certain that kills any hope of it ever being useful at all to
         | anyone. The fact that I couldn't even install it without making
         | an account made me run away as fast as possible.
        
           | simonw wrote:
           | "It takes weeks to get simple packages working"
           | 
           | Can you expand on what you mean by that? I have trouble
           | imagining a Python packaging problem that takes weeks to
           | resolve - I'd expect them to either be resolvable in
           | relatively short order or for them to prove effectively
           | impossible such that people give up.
        
             | chaxor wrote:
             | - Trying to figure out what versions the scripts used and
             | specifying them in a new poetry project - Realizing some
             | OS-dependent software is needed so making a docker
             | file/docker-compose.yml - Getting some of it working in the
             | container with a poetry environment - Realizing that
             | _other_ parts of the code work with _other_ versions, so
             | making a different poetry environment for those parts -
             | Trying to tie this package /container as a dependency of
             | another project - Oh actually, this is a dependency of a
             | dependency - How do you call a function from a package
             | running in a container with multiple poetry environments in
             | a package? - What was I doing again? - 2 weeks have passed
             | trying to get this to work, perhaps I'll just do something
             | else
             | 
             | Rinse and repeat.
             | 
             | -\\_(tsu)_/- That's python!
        
           | riperoni wrote:
           | I can't answer your initial question, but I do like to pile
           | onto the package management points.
           | 
           | Package consumption sucks so bad, since the sensible way of
           | using are virtual envs where you copy all dependencies. Then
           | for freezing venvs or dumping package versions, so you can
           | port your project to a different system, doesn't consider
           | only packages actually used/imported in code, but it just
           | dumps everything in the venv. The fact you need external
           | tools for this is frustrating.
           | 
           | Then there is package creation. Legacy vs modern approach,
           | cryptic __init__ files, multiple packaging backends, endless
           | sections in pyproject.toml, manually specifying dependencies
           | and dev-dependencies, convoluted ways of getting package
           | metadata actually in code without having it in two places
           | (such as CLI programs with --version).
           | 
           | Cross compilation really would be a nice feature to simply
           | distribute a single file executable. I haven' tested it, but
           | a Linux system with Wine should in theory be capable of
           | "cross" compiling between Linux and Windows.
           | 
           | Still, like you, as a beginning I would prefer a sensible
           | package management and package creation process.
        
           | mcoliver wrote:
           | Have you taken a look at Nuitka with GitHub actions for cross
           | compilation? https://github.com/Nuitka/Nuitka-Action
        
       | eviks wrote:
       | > The initial benchmarks show something of a 2-9% performance
       | improvement. You might be disappointed by this number, especially
       | since this blog post has been talking about assembly and machine
       | code and nothing is faster than that right?
       | 
       | Indeed, reading the blog post build much higher expectations
        
         | G3rn0ti wrote:
         | Just running machine code itself does not make a program
         | magically faster. It's all about the amount of work the machine
         | code is doing.
         | 
         | For example, if the JIT compiler realizes the program is adding
         | two integers it could potentially replace the code with two
         | MOVs and a single ADD. However, what about the error handling
         | in the case of an overflow? Python switches to its internal
         | BigInt representation in this case and cannot rely on
         | architecture specific instructions alone once the result gets
         | too large to fit into a register.
         | 
         | Modern programming languages are all about trading performance
         | for convenience and that is what makes them slow -- not because
         | they are running an interpreter and not compiling to machine
         | code.
        
       | fer wrote:
       | The bit everyone wants:
       | 
       | > The initial benchmarks show something of a 2-9% performance
       | improvement.
       | 
       | Which is underwhelming (as mentioned in the article), especially
       | if we look at PyPy[0]. But it's a step forward nonetheless.
       | 
       | [0] https://speed.pypy.org/
        
         | woadwarrior01 wrote:
         | > At the moment, the JIT is only used if the function contains
         | the JUMP_BACKWARD opcode which is used in the while statement
         | but that will change in the future.
         | 
         | It's a bit less underwhelming if you consider that only
         | function objects with loops are being JITed. nb: for loops in
         | Python also use the JUMP_BACKWARD op.
        
         | lifthrasiir wrote:
         | PyPy was never able to get fast enough to replace CPython in
         | spite of its lack of compatible C API. CPython is trying to
         | move fast without breaking C API, and 2--9% improvement is in
         | fact very encouraging for that and other reasons (see my other
         | comment).
        
       | cryptos wrote:
       | I always wondered how Python can be one of the world's most
       | popular languages without anyone (company) stepping up and make
       | the runtime as fast as modern JavaScript runtimes.
        
         | tomwphillips wrote:
         | Because enough users find the performance sufficient.
        
         | rfoo wrote:
         | Because the reason why Python is one of the world's most
         | popular language (a large set of scientific computing C
         | extensions) is bound to every implementation details of the
         | interpreter itself.
        
         | fer wrote:
         | Easy and the number-crunching libs are optimized away in
         | (generally) C.
        
           | PartiallyTyped wrote:
           | and FORTRAN.
        
         | est31 wrote:
         | Python is already fast where it matters: often, it is just used
         | to integrate existing C/C++ libraries like numpy or pytorch. It
         | is more an integration language than one where you write your
         | heavy algorithms in.
         | 
         | For JS, during the time that it received its JITs, there was no
         | cross platform native code equivalent like wasm yet. JS had to
         | compete with plugins written in C/C++ however. There was also
         | competition between browser vendors, which gave the period the
         | name "browser wars". Nowadays at least, the speed improvements
         | for the end user thanks to JIT aren't also _that_ great, Apple
         | provides a mode to turn off JIT entirely for security.
        
           | ephimetheus wrote:
           | I think usually the term "browser wars" refers to the time
           | when Netscape and Microsoft were struggling for dominance,
           | which concluded in 2001.
           | 
           | JavaScript JITs only emerged around 2008 with SpiderMonkey's
           | TraceMonkey, JavaScriptCore's SquirrelFish Extreme, and V8's
           | original JIT.
        
             | lifthrasiir wrote:
             | There were multiple browser wars, otherwise you wouldn't
             | need -s there ;-)
        
           | nyanpasu64 wrote:
           | Having recently implemented parallel image rendering in
           | corrscope (https://github.com/corrscope/corrscope/pull/450),
           | I can say that friends don't let friends write performance-
           | critical code in Python. Depending on prebuilt C++ libraries
           | hampers flexibility (eg. you can't customize the memory
           | management or rasterization pipeline of matplotlib). Python's
           | GIL inhibits parallelism within a process, and the workaround
           | of multiprocessing and shared memory is awkward, has
           | inconsistencies between platforms, and loses performance (you
           | can't get matplotlib to render directly to an inter-process
           | shared memory buffer, and the alternative of copying data
           | from matplotlib's framebuffer to shared memory wastes CPU
           | time).
           | 
           | Additionally a lot of the libraries/ecosystem around shared
           | memory (https://docs.python.org/3/library/multiprocessing.sha
           | red_mem...) seems poorly conceived. If you pre-open shared
           | memory in a ProcessPoolExecutor's initializer functions, you
           | can't close them when the worker process exits (which _might_
           | be fine, nobody knows!), but if you instead open and close a
           | shared memory segment on every executor job, it _measurably_
           | reduces performance, presumably from memory mapping overhead
           | or TLB /page table thrashing.
        
             | mkesper wrote:
             | That's why optional GIL will be so important.
        
             | ngrilly wrote:
             | What would you use instead of Python?
        
               | pas wrote:
               | Cython? :o
        
             | pas wrote:
             | > Depending on prebuilt C++ libraries hampers flexibility
             | (eg. you can't customize the memory management or
             | rasterization pipeline of matplotlib).
             | 
             | But what is the counterfactual? Implementing the whole
             | thing in Python? It seems much more work than
             | forking/fixing matplotlib.
        
             | amelius wrote:
             | > Python's GIL inhibits parallelism within a process, and
             | the workaround of multiprocessing and shared memory is
             | awkward, has inconsistencies between platforms, and loses
             | performance
             | 
             | Well, imho the biggest problem with this approach to
             | paralellism is that you're stepping out of the Python world
             | with gc'ed objects etc. and into a world of ctypes and
             | serialization. It's like you're not even programming Python
             | anymore, but more something closer to C with the speed of
             | an interpreted language.
        
             | zbentley wrote:
             | > If you pre-open shared memory in a ProcessPoolExecutor's
             | initializer functions, you can't close them when the worker
             | process exits
             | 
             | That's quite surprising to learn, as I didn't think the
             | initializer ran in a specialized context (like a
             | pthread_atfork postfork hook in the child).
             | 
             | What happens when you try to close an initializer-allocated
             | SharedMemory object on worker exit?
        
               | nyanpasu64 wrote:
               | ProcessPoolExecutor doesn't let you supply a callback to
               | run on worker process exit, only startup. Perhaps I
               | could've looked for and tried something like atexit
               | (https://docs.python.org/3/library/atexit.html)? In any
               | case I don't want to touch my code at the moment until I
               | regain interest or hear of resource exhaustion, since "it
               | works".
        
         | xiphias2 wrote:
         | Billions of dollars of product decisions use JS benchmark speed
         | as one of the standard benchmarks to base buying decision on
         | (for a good reason).
         | 
         | For machine learning speed compiling to the right CUDA / OpenCL
         | kernel is much more crucial, so there's where the money goes.
        
         | CJefferson wrote:
         | A big part of what made Python so successful was how easy it
         | was to extend with C modules. It turns out to be very hard to
         | JIT Python without breaking these, and most people don't want a
         | Python that doesn't support C extension modules.
         | 
         | The JavaScript VMs often break their extensions APIs for speed,
         | but their users are more used to this.
        
           | toyg wrote:
           | JS doesn't really have the tradition of external modules that
           | Python has, for a long time it only really existed inside the
           | browser.
        
           | amelius wrote:
           | On the other hand, rewriting the C modules and adapting them
           | to a different C API is very straightforward after you've
           | done 1 or 2 of such modules. Perhaps it's even something that
           | could be done by training an LLM like Copilot.
        
             | Redoubts wrote:
             | That's breakage you'd have to tread carefully on; and given
             | the 2to3 experience, there would have to be immediate
             | reward to entice people to undertake the conversion. No
             | one's interested in even minor code breakage for minor
             | short-term gain.
        
           | Pxtl wrote:
           | Which is why I'm shocked that Python's big "we're breaking
           | backwards compatibility" release (Python 3) was mostly just
           | for Unicode strings. It seems like the C API and the various
           | __builtins__ introspection API thingies should've been the
           | real focus on breaking backwards compatibility so that Python
           | would have a better future for improvements like this.
        
         | AndrewDucker wrote:
         | Always interested in replies to this kind of comment, which
         | basically boil down to "Python is so slow that we have to write
         | any important code in C. And this is somehow a good thing."
         | 
         | I mean, it's great that you _can_ write some of your code in C.
         | But wouldn 't it be great if you could just write your
         | libraries in Python and have them still be really fast?
        
           | bdd8f1df777b wrote:
           | Yes, but not so good when the JIT-ed Python can no longer
           | reference those fast C code others have written. Every Python
           | JIT project so far has suffered from incompatibility with
           | some C-base Python extension, and users just go back to the
           | slow interpreter in those cases.
        
             | AndrewDucker wrote:
             | "not so good when the JIT-ed Python can no longer reference
             | those fast C code others have written"
             | 
             | I don't see an indication in the article that that's the
             | case. Am I missing something?
        
               | kragen wrote:
               | this was a big obstacle for pypy specifically
               | 
               | https://www.pypy.org/posts/2011/05/numpy-follow-
               | up-692862769...
               | 
               | https://doc.pypy.org/en/latest/faq.html#what-about-numpy-
               | num...
               | 
               | i'm not sure what version they gave up at
        
           | dagw wrote:
           | _But wouldn 't it be great if you could just write your
           | libraries in Python_
           | 
           | Everybody obviously wants that. The question is are you
           | willing to lose what you have in order to hopefully,
           | eventually, get there. If Python 3 development stopped and
           | Python 4 came out tomorrow and was 5x faster than python 3
           | and a promise of being 50-100x faster in the future, but you
           | have to rewrite all the libraries that use the C API, it
           | would probably be DOA and kill python. People who want a
           | faster 'almost python' already have several options to choose
           | from, none of which are popular. Or they use Julia.
        
             | AndrewDucker wrote:
             | Why are you assuming that they'd have to rewrite all of
             | their libraries? I don't see anything in the article that
             | says that.
        
               | dagw wrote:
               | The reason this approach is so much slower than some of
               | the other 'fast' pythons out there that have come before
               | is that they are making sure you don't have to rewrite a
               | bunch of existing libraries.
               | 
               | That is the problem with all the fast python
               | implementations that have come before. Yes, they're
               | faster than 'normal' python in many benchmarks, but they
               | don't support the entire current ecosystem. For example
               | Instagram's python implementation is blazing fast for
               | doing exactly what Instagram is using python for, but is
               | probably completely useless for what I'm using python
               | for.
        
               | AndrewDucker wrote:
               | Aaah, so it's not _this_ approach that you 're saying is
               | an issue, it's the ones that significantly change Python.
               | Gotcha, that makes sense. Thank you.
        
           | el_oni wrote:
           | it depends what speed is most important to you.
           | 
           | When i was a scientist, speed was getting the code written
           | during my break, and if it took all afternoon to run that's
           | fine because i was in the lab anyway.
           | 
           | Even as i moved more into the software engineer direction,
           | and started profiling code more, most of the bottlenecks come
           | from things like "creating objects on every incovation rather
           | than pooling them", "blocking IO", "using a bad algorithm" or
           | "using the wrong datasctructure for the task". problems that
           | exist in every language, though "bad algorithm" or "using the
           | wrong datasctructure" might matter less in a faster language
           | you're still leaving performance on the table.
           | 
           | > "Python is so slow that we have to write any important code
           | in C. And this is somehow a good thing."
           | 
           | The good thing is that python has a very vibrant ecosystem
           | filled with great libraries, so we don't have to write it in
           | C, because somebody else has. We can just benefit from that
           | when the situation calls for it
        
           | JodieBenitez wrote:
           | Between writing C code and writing Python code, there is also
           | Cython.
           | 
           | But sure, I'm all for removing build steps and avoiding yet
           | another layer.
        
           | dannymi wrote:
           | >I mean, it's great that you can write some of your code in
           | C. But wouldn't it be great if you could just write your
           | libraries in Python and have them still be really fast?
           | 
           | That really depends.
           | 
           | To make the issue clear, let's think about a similar
           | situation:
           | 
           | bash is nice because you can plug together inputs and outputs
           | of different sub-executables (like grep, sed and so on) and
           | have a big "functional" pipeline deliver the final result.
           | 
           | Your idea would be "wouldn't it be great if you could just
           | write your libraries in bash and have them still be really
           | fast?". Not if you make bash into C, tanking productivity.
           | And _definitely_ not if that new bash can 't run the old grep
           | anymore (which is what usually is implied by the proposal in
           | the case of Python).
           | 
           | Also, I'm fine with not writing my search engines, databases
           | and matrix multiplication algorithm implementations in bash,
           | really. So are most other people, I suspect.
           | 
           | Also, many proposals would weaken Python-the-language so it's
           | not as expressive anymore. But I _want_ it to stay as dynamic
           | as it is. It 's nice as a scripting language about 30 levels
           | above bash.
           | 
           | As always, there are tradeoffs. Also with this proposal there
           | will be tradeoffs. Are the tradeoffs worth it or not?
           | 
           | For the record, rewriting BLAS in Python (or anything else),
           | even if the result was faster (!), would be a phenomenally
           | bad idea. It would just introduce bugs, waste everyone's
           | time, essentially be a fork of BLAS. There's no upside I can
           | see that justifies it.
        
           | pdpi wrote:
           | languages don't need to all be good at the same thing. Python
           | currently excels as a glue language you use to write drivers
           | for modules written in lower-level languages, which is a
           | niche that (afaik) nobody else seems to fill right now.
           | 
           | While I'm all for making Python itself faster, it would be a
           | shame to lose the glue language par excellence.
        
           | hot_gril wrote:
           | Pure JS libs are more portable. In Python, portability
           | doesn't matter as much.
        
           | yowlingcat wrote:
           | > basically boil down to "Python is so slow that we have to
           | write any important code in C. And this is somehow a good
           | thing."
           | 
           | I think that's a pretty ignorant interpretation. Python has
           | been built to have a giant ecosystem of useful, feature-
           | complete, stable, well built code that has been used for
           | decades and for which there is no need to reinvent the wheel.
           | If that already describes the universe of libraries that you
           | /need/ to be extremely fast and the rest of your code is IO
           | limited and not CPU limited, why reinvent the wheel?
           | 
           | That makes your comment even more inaccurate because you
           | likely don't need to write any "important" (which you are
           | stretching to mean "fast") code in C -- you utilize existing
           | off the shelf fast libraries that are written in Fortran,
           | CUDA, C, Rust or any other language a pre-existing ecosystem
           | was built in.
           | 
           | Try and think of a language that has mature capabilities for
           | domains as far away as what Django solves for, what pandas
           | solves for, what pytorch solves for, and still has fantastic
           | tooling like jupyter and streamlit. I can't think of any
           | other language that has the combined off the shelf breadth
           | and depth of Python. I don't want to have to write fast code
           | in any language unless forced to, because the vast majority
           | of the time I can customize a great off the shelf package and
           | only write the remaining 1% of glue. I can't see why a
           | professional engineer would 99% of the time would need to
           | take a remotely different approach.
        
         | yellowstuff wrote:
         | There have been several attempts. For example, Google tried to
         | introduce a JIT in 2011 with a project named Unladen Swallow,
         | but that ended up getting abandoned.
        
           | pansa2 wrote:
           | Unladen Swallow was massively over-hyped. It was talked about
           | as though Google had a large team writing "V8 for Python",
           | but IIRC it was really just an internship project.
        
             | kragen wrote:
             | well, there were a couple of guys working on it
        
         | dagw wrote:
         | _anyone (company) stepping up and make the runtime as fast as
         | modern JavaScript runtimes._
         | 
         | There are a lot of faster python runtimes out there. Both
         | Google and Instagram/Meta have done a lot of work on this,
         | mostly to solve internal problems they've been having with
         | python performance. Microsoft has also done work on parallel
         | python. There's PyPy and Pythran and no doubt several others.
         | However none of these attempts have managed to be 100%
         | compatible with the current CPython (and more importantly the
         | CPython C API), so they haven't been considered as
         | replacements.
         | 
         | JavaScript had the huge advantage that there was very little
         | mission critical legacy JavaScript code around they had to take
         | into consideration, and no C libraries that they had to stay
         | compatible with. Meaning that modern JavaScript runtime teams
         | could more or less start from scratch. Also the JavaScript
         | world at the time were a lot more OK with different JavaScript
         | runtimes not being 100% compatible with each other. If you
         | 'just' want a faster python runtime that supports most of
         | python and many existing libraries, but are OK with having to
         | rewrite some your existing python code or third party libraries
         | to make it work on that runtime, then there are several to
         | choose from.
        
           | skriticos2 wrote:
           | JS also had the major advantage of being sandboxed by design,
           | so they could work from there. Most of the technical legacy
           | centered around syntax backwards compatibility, but it's all
           | isolated - so much easier to optimize.
           | 
           | Python with it's C API basically gives you the keys to the
           | kingdom on a machine code level. Modifying something that has
           | an API to connect to essentially anything is not an easy
           | proposition. Of course, it has the advantage that you can
           | make Python faster by performance analysis and moving the
           | expensive parts to optimized C code, if you have the
           | resources.
        
           | mike_hearn wrote:
           | Google/Instagram have done bits, but the company that's done
           | the most serious work on Python performance is actually
           | Oracle. GraalPython is a meaningfully faster JIT (430% faster
           | vs 7% for this JITC!) and most importantly, it can utilize at
           | least some CPython modules.
           | 
           | They test it against the top 500 modules on PyPI and it's
           | currently compatible with about half:
           | 
           | https://www.graalvm.org/python/compatibility/
           | 
           | But investment continues. It has some other neat features too
           | like sandboxing and the ability to make single-binary
           | programs.
           | 
           | The GraalPython guys are working on the HPy effort as well,
           | which is an attempt to give Python a properly specified and
           | engine-neutral extension API.
        
           | Pxtl wrote:
           | Node.js and Python 3 came out at around the same time. Python
           | had their chance to tell all the "mission critical legacy
           | code" that it was time to make hard changes.
        
             | dagw wrote:
             | As much as I would have loved to see some more 'extreme'
             | improvements to python, given how the python community
             | reacted to the relatively minor changes that python 3
             | brought, anything more extreme would very likely have
             | caused a Perl 6 style situation and quite possibly have
             | killed the language.
        
         | albertzeyer wrote:
         | In lots of applications, all the computations already happen
         | inside native libraries, e.g. Numpy, PyTorch, TensorFlow, JAX
         | etc.
         | 
         | And if you have a complicate computation graph, there are
         | already JITs on this level, based on Python code, e.g. see
         | torch.compile, or TF XLA (done by default via tf.function),
         | JAX, etc.
         | 
         | It's also important to do JIT on this level, to really be able
         | to fuse CUDA ops, etc. A generic Python JIT probably cannot
         | really do this, as this is CUDA specific, or TPU specific, etc.
        
         | el_oni wrote:
         | I think the thing with python is that it's always been "fast
         | enough" and if not you can always reach out to natively
         | implemented modules. On the flipside javascript was the main
         | language embedded in web browsers.
         | 
         | There has been a lot of competition to make browsers fast.
         | Nowadays there are 3 main JS engines, V8 backed by google,
         | JavaScriptCore backed by apple, and spidermonkey backed by
         | mozilla.
         | 
         | If python had been the language embedded into web browsers,
         | then maybe we would see 3 competing python engines with crazy
         | performance.
         | 
         | The alternative interpreters for python have always been a bit
         | more niche than Cpython, but now that Guido works at microsoft
         | there has been a bit more of a push to make it faster
        
         | PartiallyTyped wrote:
         | Meta has actually been doing that -- helping improve python's
         | speed -- with things like [1,2]
         | 
         | [1] https://peps.python.org/pep-0703/
         | 
         | [2] https://news.ycombinator.com/item?id=36643670
        
         | JodieBenitez wrote:
         | Because it's already fast enough for most of us ? Anecdote, but
         | I've had my share of slow things in Javascript that are _not_
         | slow in Python. Try to generate a SHA256 checksum for a big
         | file in the browser...
         | 
         | Good to see progress anyways.
        
           | jampekka wrote:
           | Python's SHA256 is written in C. And I'd quess Web Crypto API
           | for JS is in the same ballbark.
           | 
           | SHA256 in pure Python would be unusably slow. In Javascript
           | it would be at least usably slow.
           | 
           | Javascript is fast. Browsers are fast.
        
             | Scarblac wrote:
             | The point of Python is quickly integrating a very wide
             | range of fast libraries written in other languages though,
             | you can't ignore that performance just because it's not
             | written in Python.
        
             | JodieBenitez wrote:
             | Have you tried to generate a SHA256 checksum for a file in
             | the browser, no matter what crypto lib or api is available
             | to you ? Have you tried to generate it using Python
             | standard lib ?
             | 
             | I did, and doing it in the browser was so bad that it was
             | unusable. I suspect that it's not the crypto that's slow
             | but the file reading. But anyway...
             | 
             | > SHA256 in pure Python would be unusably slow
             | 
             | None would do that because:
             | 
             | > Python's SHA256 is written in C
             | 
             | Hence why comparing "pure python" to "pure javascript" is
             | mostly irrelevant for most day to day tasks, like most
             | benchmarks.
             | 
             | > Javascript is fast. Browsers are fast.
             | 
             | Well, no they were not for my use case. Browsers are
             | _really slow_ at generating file checksums.
        
               | adastra22 wrote:
               | The Pytthon standard lib calls out to hand optimized
               | assembly language versions of the crypto algos. It is of
               | no relevance to a JIT-vs-interpreted debate.
        
               | masklinn wrote:
               | It absolutely _is_ relevant to the  "python is slow reee"
               | nonsense tho, which is the subject. Python-the-language
               | being slow is not relevant for a lot of the users,
               | because even if they don't know they use Python mostly as
               | a convenient interface to huge piles of native code which
               | does the actual work.
               | 
               | And as noted upthread that's a significant part of the
               | uptake of Python in scientific fields, and why pypy
               | despite the heroic work that's gone into it is often a
               | non-entity.
        
               | jampekka wrote:
               | Python is slow, reee.
               | 
               | This is a major problem in scientific fields. Currently
               | there are sort of "two tiers" of scientific programmers:
               | ones who write the fast binary libraries and ones that
               | use these from Python (until they encounter e.g. having
               | to loop and they are SOL).
               | 
               | This is known as the two language problem. It arises from
               | Python being slow to run and compiled languages being bad
               | to write. Julia tries to solve this (but fails due to
               | implementation details). Numba etc try to hack around it.
               | 
               | Pypy is sadly vaporware. The failure from the beginning
               | was not supporting most popular (scientific) Python
               | libraries. It nowadays kind of does, but is brittle and
               | often hard to set up. And anyway Pypy is not very fast
               | compared to e.g. V8 or SpiderMonkey.
               | 
               | Reee.
        
               | pas wrote:
               | The major problem in scientific fields is not this, but
               | the amount of incompetence and the race-to-the-bottom
               | environment which enables it. Grant organizations don't
               | demand rigor and efficiency, they demand shiny papers.
               | And that's what we get. With god awful code and very
               | questionable scientific value.
        
               | jampekka wrote:
               | There are such issues, but I don't think they are a very
               | direct cause of the two language problem.
               | 
               | And even these issues are part of the greater problem of
               | late stage capitalism that in general produces god awful
               | stuff with questionable value. E.g. vast majority of
               | industry code is such.
        
               | JodieBenitez wrote:
               | > Julia tries to solve this (but fails due to
               | implementation details)
               | 
               | Care to list some of those details ? (I have zero
               | knowledge in Julia)
        
               | jampekka wrote:
               | This is quite a good intro:
               | https://viralinstruction.com/posts/badjulia/
        
               | affinepplan wrote:
               | fyi: the author of that post is a current Julia user and
               | intended the post as counterpoint to their normally
               | enthusiastic endorsements. so while it is a good intro to
               | some of the shortfalls of the language, I'm not sure the
               | author would agree that Julia has "failed" due to these
               | details
        
               | jampekka wrote:
               | Yes, but it's a good list of the major problems, and
               | laudable for a self-professed "stan" to be upfront about
               | them.
               | 
               | It's my assesment that the problems listed in there are a
               | cause why Julia will not take off and we're largely stuck
               | with Python for the foreseeable future.
        
               | adgjlsfhk1 wrote:
               | It is worth noting that the first of the reasons
               | presented is significantly improved in Julia 1.9 and 1.10
               | (released ~8 months and ~1 month ago). The time for
               | `using BioSequences, FASTX` on 1.10 is down to 0.14
               | seconds on my computer (from 0.62 seconds on 1.8 when the
               | blog post was published).
        
               | jampekka wrote:
               | TTFX is indeed getting a lot better. But e.g. "using
               | DynamicalSystems" is still over 5 seconds.
               | 
               | There is something big going on in caching the binaries,
               | so there's a chance the TTFX will get workable.
        
               | adastra22 wrote:
               | There is pleeeenty of mission critical stuff written in
               | Python, for which interpreter speed is a primary concern.
               | This has been true for decades. Maybe not in your
               | industry, but there are other Python users.
        
               | jiripospisil wrote:
               | Just for giggles I tried this and I'm getting ~200ms when
               | reading and hashing 50MB file in the browser (Chromium
               | based) vs ~120ms using Python 3.11.6.
               | 
               | https://gist.github.com/jiripospisil/1ae8b877b1c728536e38
               | 2fc...
               | 
               | https://jsfiddle.net/yebdnz6x/
        
               | JodieBenitez wrote:
               | Not so bad compared to what I tried a few years ago.
               | Might finally be _usable_ for us...
        
               | jampekka wrote:
               | All major browsers have supported it for over eight
               | years. Maybe the problem was between the seat and the
               | keyboard?
               | 
               | https://caniuse.com/mdn-api_crypto_subtle
        
               | JodieBenitez wrote:
               | Maybe 8 years is not much in a career ? Maybe we had to
               | support one of those browsers that did not support it ?
               | Maybe your snarky comment is out of place ? And even to
               | this day it's still significantly slower than Python
               | stdlib according to the tester. So much for "why python
               | not as fast as js, python is slow, blah blah blah".
        
               | ptx wrote:
               | I thought that perhaps the difference could be due to the
               | JavaScript version having to first read the entire file
               | before getting started on hashing it , whereas the Python
               | does it incrementally (which the browser API doesn't
               | support [0]). But changing the Python version to work
               | like the JavaScript version doesn't make a big
               | difference: 30 vs 35 ms (with a ~50 MB file) on my
               | machine.
               | 
               | The slowest part in the JavaScript version seems to be
               | reading the file, accounting for 70-80% of the runtime in
               | both Firefox and Chromium.
               | 
               | [0] https://github.com/w3c/webcrypto/issues/73
        
               | lambda_garden wrote:
               | > Have you tried to generate a SHA256 checksum for a file
               | in the browser
               | 
               | Have you tried to do this in Python?
               | 
               | A Node comparison would be more appropriate.
        
         | tgv wrote:
         | Teaching. So many colleges/unis I know teach "Introduction to
         | Programming" with Python these days, especially to non-CS
         | students/pupils.
        
           | bigfishrunning wrote:
           | I think python is very well suited to people who do
           | computation in Excel spreadsheets. For actual CS students,
           | I'd rather see something like scheme be a first language (but
           | maybe I'm just an old person)
        
             | hot_gril wrote:
             | They do both Python and Scheme in the same Berkeley intro
             | to CS class. But I think the point of Scheme is more to
             | expand students' thinking with a very different language.
             | The CS fundamentals are still covered more in the Python
             | part of the course.
        
             | VagabundoP wrote:
             | Its even _in_ Excel nowadays!!
        
         | fractalb wrote:
         | I still scratch my head why it's not installed by default on
         | Windows.
        
         | Yasuraka wrote:
         | You might want to checkout Mojo, which is not a runtime but a
         | different language, but also designed to be a superset of
         | Python. Beware though that it's not yet open source, which is
         | slated for this Q1
         | 
         | https://docs.modular.com/mojo/manual/
         | 
         | edit: The main point I forgot to mention - it aims to compete
         | with "low-level" languages like C and Rust in performance
        
         | FergusArgyll wrote:
         | Because it doesn't useGrossCamelCaseAsOften
        
         | IshKebab wrote:
         | Two reasons:
         | 
         | 1. Javascript is a less dynamic language than Python and
         | numbers are all float64 which makes it a lot easier to make
         | fast.
         | 
         | 2. If you want to run fast code on the web you only have one
         | option: make Javascript faster. (Ok we have WASM now but that
         | didn't exist at the time of the Javascript Speed wars.) If you
         | want to run fast code on your desktop you have a _MUCH_ easier
         | option: don 't use Python.
        
           | soulbadguy wrote:
           | > Javascript is a less dynamic language than Python
           | 
           | I have seen this mentioned multiple times, someone as a good
           | reference explaining what makes python more dynamic than JS ?
        
         | bjackman wrote:
         | JavaScript has to be fast because its users were traditionally
         | captive on the platform (it was the only language in the
         | browser).
         | 
         | Python's users can always swap out performance critical
         | components to another language. So Python development delivered
         | more when it focussed on improving strengths rather than
         | mitigating weaknesses.
         | 
         | In a way, Python being slow is just a sign of a healthy
         | platform ecosystem allowing comparative advantages to shine.
        
         | hot_gril wrote:
         | New runtimes like NodeJS have expanded JS beyond web, and JS's
         | syntax has improved the past several years. But before that
         | happened, Python on its own was way easier for non-web scripts,
         | web servers, and math/science/ML/etc. Optimized native libs and
         | ecosystems for those things got built a lot earlier around
         | Python, in some cases before NodeJS even existed.
         | 
         | Python's syntax is still nicer for mathy stuff, to the point
         | where I'd go into job coding interviews using Python despite
         | having used more JS lately. And I'm comparing to JS because
         | it's the closest thing, while others like Java are/were far
         | more cumbersome for these uses.
        
       | p4bl0 wrote:
       | This was a fantastic, very clear, write-up on the subject. Thanks
       | for sharing!
       | 
       | If the further optimizations that this change allows, as
       | explained at the end of this post, are covered as well as this
       | one, it promises to be a _very_ interesting series of blog posts.
        
       | jagaerglad wrote:
       | Can someone explain what a JIT compiler means in the case of an
       | interpreted language?
        
         | pjmlp wrote:
         | Basically a JIT (Just In Time), is also known as a dynamic
         | compiler.
         | 
         | It is an approach that traces back to original Lisp and BASIC
         | systems, among others lesser kwown ones.
         | 
         | The compiler is part of the language runtime, and code gets
         | dynamically compiled into native code.
         | 
         | Why is this a good approach?
         | 
         | It allows for experiences that are much harder to implement in
         | languages that tradicionally compile straight to native code
         | like C (note there are C interpreters).
         | 
         | So you can have an interpreter like experience, and code gets
         | compiled to native code before execution on the REPL, either
         | straight away, or after the execution gets beyond a specific
         | threshold.
         | 
         | Additionally, since dynamic languages per definition can change
         | all the time, a JIT can profit from code instrumentation, and
         | generate machine code that takes into account the types
         | actually being used, something that an AOT approach for a
         | dynamic language cannot predit, thus optimizations are hardly
         | an option in most cases.
        
       | LewisVerstappen wrote:
       | Great article, but small typo when the author says "copy-any-
       | patch JIT"
        
         | ericvsmith wrote:
         | That's not a typo, that's the name of the technique.
        
           | kragen wrote:
           | i think it's 'copy-and-patch'
        
             | ericvsmith wrote:
             | D'oh! Of course you're correct. I skipped over "any", and
             | focused on "patch". Sorry about that.
        
               | kragen wrote:
               | no harm done :)
        
       | cqqxo4zV46cp wrote:
       | Unfortunate to see a couple of comments here drive-by pulling out
       | the "x% faster" stat whilst minimising the context. This is a big
       | deal and it's effectively a given that this'll pave the way for
       | further enhancements.
        
         | kragen wrote:
         | maybe, maybe not. time will tell. ahead-of-time compilation is
         | even better known for improving performance and yet perl's
         | compile-to-c backend turned out to fail to do that
        
           | pjmlp wrote:
           | Ahead-of-time compilation is a bad solution for dynamic
           | languages, so that is an expected outcome for Perl.
           | 
           | The base line should be how heavily dynamic languages like my
           | favourite set, Smalltalk, Common Lisp, Dylan, SELF,
           | NewtonScript, ended up gaining from JIT, versus the original
           | interpreters, while being in the genesis of many relevant
           | papers for JIT research.
        
             | kragen wrote:
             | when i wrote ur-scheme one of the surprising things i
             | learned from it was that ahead-of-time compilation worked
             | amazingly well for scheme. scheme is ruthlessly monomorphic
             | but i was still doing a type check on every primitive
             | argument
             | 
             | i didn't realize they ever jitted newtonscript
        
               | pjmlp wrote:
               | NewtonScript 2.0 introduced a mechanism to manually JIT
               | code, functions marked as native get compiled into
               | machine code.
               | 
               | Had the Newton not been canceled, probably there would be
               | an evolution from that support.
               | 
               | See "Compiling Functions for Speed"
               | 
               | https://www.newted.org/download/manuals/NewtonToolkitUser
               | sGu...
        
               | kragen wrote:
               | this is great, thanks! but it sounds like it was an aot
               | compiler, not a jit compiler; for example, it explains
               | that a drawback of compiling functions to native code is
               | that they use more memory, and that the compiler still
               | produces bytecode for the functions it compiles natively,
               | unless you suppress the bytecode compilation in project
               | settings
        
               | pjmlp wrote:
               | Yeah, I guess if one wants to go more technical, I see it
               | as the first step of a JIT that didn't had the
               | opportunity to evolve due to market decisions.
        
               | kragen wrote:
               | i guess if they had, we would know whether a jit made
               | newtonscript faster or slower, but they didn't, so we
               | don't. what we do know is that an aot compiler sometimes
               | made newtonscript faster (though maybe only if you added
               | enough manifest static typing annotations to your source
               | code)
               | 
               | that seems closer to the opposite of what you were saying
               | in the point on which we were in disagreement?
        
               | pjmlp wrote:
               | I guess my recolection regarding NewtonScript wasn't
               | correct, if you prefer that I put it like that, however I
               | am quite certain in regards to the other languages in my
               | list.
        
               | kragen wrote:
               | i agree that the other languages gained a lot for sure
               | 
               | maybe i should have said that up front!
               | 
               | except maybe common lisp; all the implementations i know
               | are interpreted or aot-compiled (sometimes an expression
               | at a time, like sbcl), but maybe there's a jit-compiled
               | one, and i bet it's great
               | 
               | probably with enough work python could gain a similar
               | amount. it's possible that work might get done. but it
               | seems likely that it'll have to give up things like
               | reference-counting, as smalltalk did (which most of the
               | other languages never had)
        
               | lispm wrote:
               | Note that interpreter in the Lisp world by default has a
               | different meaning.
               | 
               | A "Lisp interpreter" runs Lisp source in the form of
               | s-expressions. That's what the first Lisp did.
               | 
               | A "Lisp compiler" compiles Lisp source code to native
               | code, either directly or with the help of a C compiler or
               | an assembler. A Lisp compiler could also compile source
               | code to byte code. In some implementations this byte code
               | can be JIT compiled (ABCL, CLISP, ...).
               | 
               | The first Lisp provided a Lisp to assembly compiler,
               | which compiled Lisp code to assembly code, which then
               | gets compiled to machine code. That machine code could be
               | loaded into Lisp and functions then could be native
               | machine code.
               | 
               | The Newton Toolkit could compile type declared functions
               | to machine code. That's something most Common Lisp
               | compilers do, sometimes by default (SBCL, CCL, ... by
               | default directly compile source code to machine code).
               | 
               | SBCL:                   * (defun add (a b) (declare
               | (fixnum a b) (optimize (speed 3))) (+ a b))         ADD
               | * (disassemble #'add)         ; disassembly for ADD
               | ; Size: 104 bytes. Origin: #x7006E1789C
               | ; ADD         ; 89C:       0000018B         ADD NL0, NL0,
               | NL1         ; 8A0:       0A0000AB         ADDS R0, NL0,
               | NL0         ; 8A4:       E7010054         BVC L1
               | ; 8A8:       BD2A00B9         STR WNULL, [THREAD, #40]
               | ; pseudo-atomic-bits         ; 8AC:       BC7A47A9
               | LDP TMP, LR, [THREAD, #112]     ; mixed-tlab.{free-
               | pointer, end-addr}         ; 8B0:       8A430091
               | ADD R0, TMP, #16         ; 8B4:       5F011EEB
               | CMP R0, LR         ; 8B8:       E8010054         BHI L2
               | ; 8BC:       AA3A00F9         STR R0, [THREAD, #112]
               | ; mixed-tlab         ; 8C0: L0:   8A3F0091         ADD
               | R0, TMP, #15         ; 8C4:       3E2280D2         MOVZ
               | LR, #273         ; 8C8:       9E0300A9         STP LR,
               | NL0, [TMP]         ; 8CC:       BF3A03D5         DMB
               | ISHST         ; 8D0:       BF2A00B9         STR WZR,
               | [THREAD, #40]          ; pseudo-atomic-bits         ;
               | 8D4:       BE2E40B9         LDR WLR, [THREAD, #44]
               | ; pseudo-atomic-bits         ; 8D8:       5E0000B4
               | CBZ LR, L1         ; 8DC:       200120D4         BRK #9
               | ; Pending interrupt trap         ; 8E0: L1:   FB031AAA
               | MOV CSP, CFP         ; 8E4:       5A7B40A9         LDP
               | CFP, LR, [CFP]         ; 8E8:       BF0300F1         CMP
               | NULL, #0         ; 8EC:       C0035FD6         RET
               | ; 8F0:       E00120D4         BRK #15
               | ; Invalid argument count trap         ; 8F4: L2:
               | 1C0280D2         MOVZ TMP, #16         ; 8F8:
               | 0AFBFF58         LDR R0, #x7006E17858            ; SB-
               | VM::ALLOC-TRAMP         ; 8FC:       40013FD6         BLR
               | R0         ; 900:       F0FFFF17         B L0         NIL
               | 
               | I've entered a function and it gets ahead of time
               | compiled to non-generic machine code.
               | 
               | Calling the function ADD with the wrong numeric arguments
               | is an error, which will be detected both a compile and at
               | runtime.                   * (add 3.0 2.0)
               | debugger invoked on a TYPE-ERROR @7006E17898 in thread
               | #<THREAD "main thread" RUNNING {70088224A3}>:
               | The value             3.0           is not of type
               | FIXNUM           when binding A
               | 
               | Redefinition of + will do nothing to the code. The
               | addition is inlined machine code.
        
             | lispm wrote:
             | JIT compilation is rare in Common Lisp. I wouldn't think
             | that Dylan implementations used JIT compilation.
             | 
             | Apple's Dylan IDE and compiler was implemented in Macintosh
             | Common Lisp (MCL). MCL then was not a part of the Dylan
             | runtime.
             | 
             | I would think that Open Dylan (the Dylan implementation
             | originally from Harlequin) can also generate LLVM bitcode,
             | but I don't know if that one can be JIT executed.
             | Possibly...
        
               | kragen wrote:
               | are there any cl implementations that use jit? there are
               | a lot of cl implementations so i assumed there must be
               | one
        
               | lispm wrote:
               | ABCL runs on the JVM. It generates JVM byte code, which
               | then can be JIT compiled by the JVM.
               | 
               | CLISP has a byte code machine, for which a JIT can be
               | used.
               | 
               | There might be others.
        
           | cube2222 wrote:
           | > ahead-of-time compilation is even better known for
           | improving performance
           | 
           | Not necessarily, not for dynamic languages.
           | 
           | With very dynamic languages you can make only very limited
           | assumptions about e.g. function argument types, which lead
           | you to compiled functions that have to handle any possible
           | case.
           | 
           | A JIT compiler can notice that the given function is almost
           | always (or always) used to operate on a pair of integers, and
           | do a vastly superior specialized compilation, with guards to
           | fallback on the generic one. With extensive inlining, you can
           | also deduplicate a lot of the guards.
        
             | kragen wrote:
             | yes, that is true. but aot compilers never make things
             | _slower_ than interpretation, and they can afford more
             | expensive optimizations
             | 
             | also, even mature jit compilers often only make limited
             | improvements; jython has been stuck at near-parity with
             | cpython's terrible performance for decades, for example,
             | and while v8 was an enormous improvement over old
             | spidermonkey and squirrelfish, after 15 years it's still
             | stuck almost an order of magnitude slower than c
             | https://benchmarksgame-
             | team.pages.debian.net/benchmarksgame/... which is
             | (handwaving) like maybe a factor of 2 or 3 slower than self
             | 
             | typically when i can get something to work using numpy it's
             | only about a factor of 5 slower than optimized c, purely
             | interpretively, which is competitive with v8 in many cases.
             | luajit, by contrast, is goddam alien technology from the
             | future
             | 
             | with respect to your intxint example, if an intxint
             | specialization is actually vastly superior, for example
             | because the operation you're applying is something like +
             | or *, an aot compiler can _also_ insert the guard and
             | inline the single-instruction implementation, and it can
             | _also_ do extensive inlining and even specialization
             | (though that 's rare in aots and common in jits). it can
             | insert the guards because if your monomorphic sends of +
             | are always sending + to a rational instance or something,
             | the performance gain from eliminating megamorphic dispatch
             | is comparatively slight, and the performance _loss_ from
             | inserting a static hardcoded guess of integer math before
             | the megamorphic dispatch is also comparatively slight,
             | though nonzero
             | 
             | this can fall down, of course, when your arithmetic
             | operations are polymorphic over integer and floating-point,
             | or over different types of integers; but it often works far
             | better than it has any right to. in most code, most
             | arithmetic and ordered comparison is integers, most array
             | indexing is arrays, most conditionals are on booleans (and
             | smalltalk actually hardcodes that in its bytecode
             | compiler). this depends somewhat on your language design,
             | of course; python using the same operator for indexing
             | dicts, lists, and even strings hurts it here
             | 
             | meanwhile, back in the stop-hitting-yourself-why-are-you-
             | hitting-yourself department, fucking cpython is allocating
             | its integers on the heap and motherfucking reference-
             | counting them
        
               | ptx wrote:
               | There is already an AOT compiler for Python: Nuitka[0].
               | But I don't think it's much faster.
               | 
               | And then there is mypyc[1] which uses mypy's static type
               | annotations but is only slightly faster.
               | 
               | And various other compilers like Numba and Cython that
               | work with specialized dialects of Python to achieve
               | better results, but then it's not quite Python anymore.
               | 
               | [0] https://nuitka.net/
               | 
               | [1] https://github.com/python/mypy/tree/master/mypyc
        
               | kragen wrote:
               | thanks, i'd forgotten about nuitka and didn't know about
               | mypyc!
        
               | actionfromafar wrote:
               | Check out:
               | 
               | https://shedskin.github.io/
               | 
               | Python to C++ translation
        
               | ngrilly wrote:
               | I so much agree with your comment on memory allocation.
               | Everybody is focusing on JIT, but allocating everything
               | on the heap, with no possibility to pack multiple values
               | contiguously in a struct or array, will still be a
               | problem for performance.
        
               | vanderZwan wrote:
               | > _fucking cpython is allocating its integers on the heap
               | and motherfucking reference-counting them_
               | 
               | And here I thought that it was shocking to learn that v8
               | allocates doubles on the heap recently. (I mean, I'm not
               | a compiler writer, I have no idea how hard it would be to
               | avoid this, but it feels like mandatory boxed floats
               | would hurt performance a lot)
        
               | kragen wrote:
               | nanboxing as used in spidermonkey
               | (https://piotrduperas.com/posts/nan-boxing) is a possible
               | alternative, but i think v8 works pretty hard to not use
               | floats, and i don't think local-variable or temporary
               | floats end up on the heap in v8 the way they do in
               | cpython. i'm not that familiar with v8 tho (but i'm
               | pretty sure it doesn't refcount things)
        
               | vanderZwan wrote:
               | > _i think v8 works pretty hard to not use floats_
               | 
               | Correct, to the point where at work a colleague and I
               | actually have looked into how to force using floats even
               | if we initiate objects with a small-integer number (the
               | idea being that ensuring our objects having the correct
               | hidden class the first time might help the JIT, and
               | avoids wasting time on integer-to-float promotion in
               | tight loops). Via trial and error in Node we figured that
               | using -0 as a number literal works, but (say) 1.0 does
               | not.
               | 
               | > _i don 't think local-variable or temporary floats end
               | up on the heap in v8 the way they do in cpython_
               | 
               | This would also make sense - v8 already uses pools to re-
               | use common temporary object shapes in general IIRC, I see
               | no reason why it wouldn't do at least that with heap-
               | allocated doubles too.
        
               | kragen wrote:
               | so then the remaining performance-critical case is where
               | you have a big array of floats you're looping over. in
               | firefox that works fine (one allocation per lowest-level
               | array, not one allocation and unprefetchable pointer
               | dereference per float), but maybe in chrome you'd want to
               | use a typedarray?
        
               | vanderZwan wrote:
               | Maybe, at that point it is basically similar to the
               | struct-of-arrays vs array-of-structs trade-off, except
               | with significantly worse ergonomics and less pay-off.
        
               | IainIreland wrote:
               | As I understand it, V8 keeps track of an ElementsKind for
               | each array (or, more precisely, for the elements of every
               | object; arrays are not special in this sense). If an
               | array only contains floats, then they will all be stored
               | unboxed and inline. See here: https://source.chromium.org
               | /chromium/chromium/src/+/main:v8/...
               | 
               | I assume that integers are coerced to floats in this
               | mode, and that there's a performance cliff if you store a
               | non-number in such an array, but in both cases I'm just
               | guessing.
               | 
               | In SpiderMonkey, as you say, we store all our values as
               | doubles, and disguise the non-float values as NaNs.
        
               | kragen wrote:
               | thank you for the correction!
        
         | pjmlp wrote:
         | It is a very big deal, as it will finally shift the mentality
         | regarding:
         | 
         | - "C/C++/Fortran libs are Python"
         | 
         | - "Python is too dynamic", while disregarding Smalltalk, Common
         | Lisp, Dylan, SELF, NewtonScript JIT capabilities, all dynamic
         | languages where anything can change at any given moment
        
           | japanman185 wrote:
           | Disregarding the fact that python is an awful programming
           | language for anthing other than jupyter notebooks
        
             | pjmlp wrote:
             | Another one that hasn't seen UNIX scripting in shell
             | languages or Perl, Apache modules, before Python came to
             | be.
        
               | 0x457 wrote:
               | I wrote tons of perl in my life. I would rather keep
               | writing perl than touching python. Every time I see a
               | nice utility and see that it's written in python - tab
               | closed.
        
             | BossingAround wrote:
             | Facts are objective; "Python is awful" is your opinion.
        
             | moffkalast wrote:
             | Ah I'd say the exact opposite, python in general is pretty
             | good but jupyter sucks because the syntax isn't compatible
             | with regular python and I avoid it like the plague.
        
               | smabie wrote:
               | What does a jupyter notebook have to do with python
               | syntax?
        
               | moffkalast wrote:
               | Take the code you find in an average notebook, copy it to
               | a .py text file, run it with python. Does it run? In my
               | experience the answer is usually 'no' because of some
               | extra-ass syntax sugar jupyter has that doesn't exist in
               | python.
        
             | cqqxo4zV46cp wrote:
             | This comment is really just bordering on a rule violation
             | and doesn't add to the conversation at all.
        
           | pdimitar wrote:
           | What do you mean by "it will shift the mentality"? There is
           | no magical JIT that will ever make e.g. the data science
           | Python & C++ amalgamations slower than a pure Python. Likely
           | never happening, too.
           | 
           | Also no mentality shift is expected on the "Python is too
           | dynamic" -- which is a strange thing to say anyway -- because
           | Python is not getting any more static due to these JIT news.
        
             | pjmlp wrote:
             | Python with JIT is faster than Python without JIT.
             | 
             | Having a Python with JIT, in many cases it will be fast
             | enough for most cases.
             | 
             | Data science running CUDA workloads isn't the only use case
             | for Python.
        
               | eesmith wrote:
               | I think Python without a JIT in many cases is already
               | fast enough for most cases.
               | 
               | I don't do data science.
        
               | pjmlp wrote:
               | Sure, for UNIX scripting, for everything else it is
               | plainfully slow.
               | 
               | I know Python since version 1.6, and is my scripting
               | language in UNIX like environments, during my time at
               | CERN, I was one of the CMT build infrastructure build
               | engineer on the ATLAS team.
               | 
               | It was never been the language I would reach for when not
               | doing OS scripting, and usually when a GNU/Linux GUI
               | application happens to be slow as mollasses, it has been
               | written in Python.
        
               | andrewaylett wrote:
               | A Python web service my team maintains, running at a
               | higher request rate and with lower CPU and RAM
               | requirements than most of the Java services I see around
               | us, would like a word with you.
        
               | pjmlp wrote:
               | I guess those Java developers really aren't.
        
               | IggleSniggle wrote:
               | How many requests per second are we talking, ballpark,
               | and what's the workload?
        
               | andrewaylett wrote:
               | ~5k requests/second for the Python service, we tend to go
               | for small instances for redundancy so that's across a few
               | dozen nodes. The workload comparison is unfair to the
               | Java service, if I'm honest :). But we're running Python
               | on single vCPU containers with 2G RAM, and the Java
               | service instances are a _lot_ larger than that.
               | 
               | Flask, gunicorn, low single digit millisecond latency.
               | Definitely optimised for latency over throughput, but not
               | so much that we've replatformed it onto something that's
               | actually designed for low latency :P. Callers all cache
               | heavily with a fairly high hit ratio for interactive
               | callers and a relatively low hit ratio for batch callers.
        
               | sirsinsalot wrote:
               | My teams deploy Python web APIs and yes, it is slow
               | compared to other languages and runtimes.
               | 
               | But on the whole, machines are cheaper than other
               | engineering approaches to scaling.
               | 
               | For us, and many others, fast enough is fast enough.
        
               | eesmith wrote:
               | There's a lot of Django going on in the world.
               | 
               |  _shrug_. If we 're talking personal experience, I've
               | been using Python since 1.4. It's been my primary
               | development language since the late 1990s, with of course
               | speed critical portions in C or C++ when needed - and I
               | know a lot of people who also primarily develop in
               | Python.
               | 
               | And there's a bunch of Python development at CERN for
               | tasks other than OS scripting. ("The ease of use and a
               | very low learning curve makes Python a perfect
               | programming language for many physicists and other people
               | without the computer science background. CERN does not
               | only produce large amounts of data. The interesting bits
               | of data have to be stored, analyzed, shared and
               | published. Work of many scientists across various
               | research facilities around the world has to be
               | synchronized. This is the area where Python flourishes" -
               | https://cds.cern.ch/record/2274794)
               | 
               | I simply don't see how a Python JIT is going to make that
               | much of a difference. We already have PyPy for those
               | needing pure Python performance, and Numba for certain
               | types of numeric needs.
               | 
               | PyPy's experience shows we'll not be expecting a 5x boost
               | any time soon from this new JIT framework, while
               | C/C++/Fortran/Rust are significantly faster.
        
               | pjmlp wrote:
               | > There's a lot of Django going on in the world.
               | 
               | Unfortunely.
               | 
               | > And there's a bunch of Python development at CERN for
               | tasks other than OS scripting
               | 
               | Of course there is, CMT was a build tool, not OS
               | scripting.
               | 
               | No need to give me CERN links to me to show me Python
               | bindings to ROOT, or Jupyter notebooks.
               | 
               | > PyPy's experience shows we'll not be expecting a 5x
               | boost any time soon from this new JIT framework, while
               | C/C++/Fortran/Rust are significantly faster.
               | 
               | I really don't get the attitude that if it doesn't 100%
               | fix all the world problems, then it isn't worth it.
        
               | eesmith wrote:
               | The link wasn't for you - the link was for other HN users
               | who might look at your mention of your use at CERN and
               | mistakenly assume it was a more widespread viewpoint
               | there.
               | 
               | > I really don't get the attitude that if it doesn't 100%
               | fix all the world problems, then it isn't worth it.
               | 
               | Then it's a good thing I'm not making that argument, but
               | rather that "Having a Python with JIT, in many cases it
               | will be fast enough for most cases." has very little
               | information content, because Python without a JIT already
               | meets the consequent.
        
               | formerly_proven wrote:
               | I really wouldn't mind Python being faster than it is and
               | I really didn't mind at all getting an practically free
               | ~30% performance increase just by updating to 3.11.
               | There's tons of applications which just passively benefit
               | from these optimizations. Sure, you might argue "but you
               | shouldn't have written that parser or that UI handling a
               | couple thousand items in Python" but lots of people do
               | and did just that.
        
               | eesmith wrote:
               | I wouldn't mind either.
               | 
               | Do you agree with me that Python is already fast enough
               | for most cases, even without a JIT?
               | 
               | If not, how would a 30% boost improve things enough to
               | change the balance?
        
             | superlopuh wrote:
             | I'm fairly certain that this is false, and am working on
             | proving it. In the cases that Numba is optimised for it's
             | already faster than plausible C++ implementations of the
             | same kernels.
             | 
             | https://stackoverflow.com/questions/36526708/comparing-
             | pytho...
        
               | pas wrote:
               | it's not faster, it's about as fast as C++ compiled with
               | O3 optimizations. which is great and also much more
               | likely to be true.
        
               | eigenspace wrote:
               | Numba is basically another language embedded in Python.
               | It (sometimes severely) modifies the semantics of code.
        
           | fnord123 wrote:
           | > it will finally shift the mentality regarding
           | "C/C++/Fortran libs are Python"
           | 
           | But pjmlp, I use Python because it's a wrapper for
           | C/C++/Fortran libs. - Chocolate Giddyup
        
             | pjmlp wrote:
             | Just like Tcl happens to be.
        
             | astrolx wrote:
             | I can dig it!
        
           | lispm wrote:
           | By default any code loaded into something like SBCL gets AOT
           | compiled.
           | 
           | In Common Lisp not anything can change at any moment.
           | Especially not in implementations where one uses AOT
           | compilation like SBCL, ECL, LispWorks, Allegro CL, ... and so
           | on. They have optimizing compilers which gradually can remove
           | dynamic runtime behavior, upto supporting almost no dynamic
           | runtime behavior.
           | 
           | Stuff which is supported: type specific code, inlining, block
           | compilation, removal of development tools, ...
           | 
           | JIT implementations are rare in the Common Lisp world. They
           | are mostly only used in implementations which use a byte-code
           | virtual machine (CLISP, ABCL, ...). Common Lisp
           | implementations mostly compile either directly to native code
           | or via C compilers. The effect is that native AOT compiled
           | code is much faster.
        
         | adonese wrote:
         | is it any different or comparable to numba or pyjion? Not
         | following python closely in recent years but I recount those
         | two projects with huge potential
        
           | drbaba wrote:
           | I don't know Pyjion, but I have used Numba for real work.
           | It's a great package and can lead to massive speed-ups.
           | 
           | However, last time I used it, it (1) didn't work with many
           | third-party libraries (e.g. SciPy was important for me), and
           | (2) didn't work with object-oriented code (all your @njit
           | code had to be wrapped in functions without classes). Those
           | two has limited for which projects I could adopt Numba in
           | practice, despite loving it in the cases it worked.
           | 
           | I don't know what limitations the built-in Python JIT has,
           | but hopefully it might be a more general JIT that works for
           | _all_ Python code.
        
         | benrutter wrote:
         | This is so true!
         | 
         | A JIT compiler is a big deal for performance improvements,
         | especially where it matters (in large repetitive loops).
         | 
         | Anyone cynical about the potential a python JIT offers should
         | take a look at pypy which has a 5x speed up over regular
         | python, mainly though JIT operations: https://www.pypy.org/
        
         | crabbone wrote:
         | I don't see this as an enhancement.
         | 
         | Not pursuing JIT or efficient compilation in general was a
         | deliberate decision way back when Python made some kind of
         | sense. It was the simplicity of implementation valued over
         | performance gains that motivated this decision.
         | 
         | The mantra Python programmers liked to repeat was that "the
         | performance is good enough, and if you want to go fast, write
         | in C and make a native module".
         | 
         | And if you didn't like that, there was always Java.
         | 
         | Today, Python is getting closer and closer to be "the crappy
         | Java with worse syntax". Except we already have that: it's
         | called Groovy.
        
           | frakt0x90 wrote:
           | What are you talking about? From what I can read here there
           | is no syntax change. Just a framework for faster execution.
           | Plus, Python's usecase has HEAVILY evolved over the last few
           | years since it's now the defacto language for machine
           | learning. It's great that the core devs are keeping up with
           | the time.
           | 
           | The language is definitely getting more complex
           | syntactically, and I'm not a huge fan of some of those
           | changes but it's no where near Java or C++ or anything else.
           | You can still write simple Python with all of these changes.
        
             | crabbone wrote:
             | > What are you talking about?
             | 
             | Read it again. It seems you were reading too fast. I'm
             | talking about the future, not the change being discussed
             | right now.
             | 
             | > It's great that the core devs are keeping up with the
             | time.
             | 
             | You mistake the influence of Microsoft and their desire to
             | sell features for progress. Python is actually regressing
             | as a system. It's becoming worse, not better. But it's hard
             | to see the gestalt of it if all you are looking for is the
             | new features.
             | 
             | > it's no where near Java
             | 
             | That is true. Java is a much more simple and regular (not
             | in the automata theory sense) language. Today, if you want
             | a simpler language, you need to choose Java over Python
             | (although neither is very simple, so, preferably, you need
             | a third option).
             | 
             | > You can still write simple Python
             | 
             | I can also write simple C++ if I limit what I use from the
             | language to a very small subset. This says nothing about
             | the simplicity of the language...
        
       | wrd83 wrote:
       | Honestly I don't understand the pessimistic view here. I think
       | every release since Microsoft started funding python has
       | increased high single digit best case performance.
       | 
       | Rather than focussing on the raw number compare to python 3.5 or
       | so. It's still getting significantly faster.
       | 
       | If they keep doing this steady pace they are slowly saving the
       | planet!
        
         | epcoa wrote:
         | Sorry, but reality bites
         | https://en.wikipedia.org/wiki/Amdahl%27s_law
        
           | vanderZwan wrote:
           | It's not that simple.
           | 
           | Amdahl's Law is about expected speedup/decrease in latency.
           | That actually isn't strongly correlated to "saving the
           | planet" afaik (where I interpret that as reducing direct
           | energy usage, as well as embodied energy usage by reducing
           | the need to upgrade hardware).
           | 
           | If anything, increasing speed and/or decreasing latency of
           | the whole system often involves adding some form of
           | parallelism, which brings extra overhead and requires extra
           | hardware. Note that prefetching/speculative execution kind of
           | counts here as well, since that is essentially doing
           | potentially wasted work in parallel. In the past boosting the
           | clock rate the CPU was also a thing until thermodynamics said
           | no.
           | 
           | OTOH, letting your CPU go to sleep faster should save energy,
           | so repeated single-digit perf improvements via wasting less
           | instructions does matter.
           | 
           | But then again, that could lead to Jevons Paradox (the
           | situation where increasing the efficiency encourages more
           | wasteful than the increase in efficiency saves - Wirth's Law
           | but generalized and older, basically).
           | 
           | So I'd say there's too many interconnected dynamics at play
           | to really simply state "optimization good" or "optimization
           | useless". I'm erring on the side of "faster Python probably
           | good".
           | 
           | [0] https://en.wikipedia.org/wiki/Jevons_paradox
        
         | oblio wrote:
         | When did Microsoft start funding Python?
         | 
         | Also, such a shame that it takes sooo long for crucial open
         | source to be funded properly. Kudos to Microsoft for doing it,
         | shame on everyone else for not pitching in sooner.
         | 
         | FYI Python was launched 32 years ago, Python 2 was released 24
         | years ago and Python 3 was released 16 years ago.
        
         | systems wrote:
         | I think the pessimism really comes from a dislike for Python
         | 
         | While very very very popular, Python is i think is very
         | disliked languages, it doesnt have or it is not built around
         | the current programming language features that programmers
         | like, its not functional or immutable by default, its not fast,
         | the tooling is complex, it uses indentation for code blocks
         | (this feature was cool in the 90s, but dreaded since at least
         | 2010)
         | 
         | so i guess if python become fasters, this will ensure its
         | continued dominance, and all those hoping that one day it will
         | be replace by a nicer , faster language are disappointed
         | 
         | this pessimism is the aching voice of the developers who were
         | hoping for a big python replacement
        
           | dbrueck wrote:
           | To each his own, but the things you list are largely
           | subjective/inaccurate, and there are many, many, many
           | developers who use Python because they enjoy it and like it a
           | lot.
        
             | systems wrote:
             | Python is a very widely used language, and like any popular
             | thing, yes many many many like it , and many many many
             | dislike it .. it is that big, python can be disliked by a
             | million developer and still be a lot more liked than
             | disliked
             | 
             | but i also think that its true that python is not and have
             | not been for a while considered as a modern or technically
             | advanced language
             | 
             | the hype currently is for typed or gradually typed
             | languages, functional languages, immutable data , system
             | languages, type safe language, language with advanced
             | parallelism and concurrency support etc ..
             | 
             | python is old , boring OOP, if you like it, than like
             | millions of developers you are not picky about programming
             | language, you use what works, what pays
             | 
             | but for devs passionate about programming languages, python
             | is a relic they hope vanish
        
               | dbrueck wrote:
               | > devs passionate about programming languages, python is
               | a relic they hope vanish
               | 
               | Statements like this are obviously untrue for large
               | numbers of people, so I'm not sure of the point you're
               | trying to make.
               | 
               | But certainly it's true that there are both objective and
               | subjective reasons for using a particular tool, so I hope
               | you are in a position to use the tools that you prefer
               | the most. Have a great day!
        
               | dataangel wrote:
               | > the hype currently is for typed or gradually typed
               | languages
               | 
               | So Python with mypy
        
               | Sohcahtoa82 wrote:
               | > but for devs passionate about programming languages,
               | python is a relic they hope vanish
               | 
               | If you asked me what language I would consider to be a
               | relic that I hope would vanish, I'd go with Perl.
        
               | zestyping wrote:
               | Python is designed to be "boring" (in other words,
               | straightforward and easy to understand). It is admittedly
               | less so, now that it has gained many features since the
               | 2.x days, but it is still part of its pedigree that it is
               | supposed to be teachable as a beginner language.
               | 
               | It is still the only beginner language that is also an
               | industrial-strength production language. You can learn
               | Python as your first language and also make an entire
               | career out of it. That can't really be said about the
               | currently "hyped" languages, even though those are very
               | fun and cool and interesting!
        
           | dataangel wrote:
           | > (this feature was cool in the 90s, but dreaded since at
           | least 2010)
           | 
           | LOL this is a dead giveaway you haven't been around long.
           | There have been people kvetching about the whitespace since
           | the beginning. Haskell went on to be the next big thing for
           | reddit/HN/etc for years and it also uses whitespace.
        
         | klyrs wrote:
         | Julia is my source of pessimism. Julia is super fast once it's
         | warmed up, but before it gets there, it's painfully slow. They
         | seem to be making progress on this, but it's been gradual. I
         | understand that Java had similar growing pains, but it's better
         | now. Combined with the boondoggle of py3, I'm worried for the
         | future of my beloved language as it enters another phase of
         | transformation.
        
           | leephillips wrote:
           | Would you say of the latest release (v1.10) that Julia is
           | painfully slow until it gets "warmed up"? If so, what exactly
           | does this mean?
        
             | klyrs wrote:
             | I'm not that up to date on the language, it's been a few
             | years since I did anything nontrivial with it because the
             | experience was so poor. And while that might not seem fair
             | to Julia, it's my honest experience: my concern isn't a
             | pissing match between Julia and the world, it's that bad
             | JIT experience is a huge turnoff and I'm worried about
             | Python's future as it goes down this road.
        
               | leephillips wrote:
               | There has been so much progress in Julia's startup
               | performance in the past "few years" that someone's
               | qualitative impressions from several major releases
               | before the current one are of limited relevance.
        
               | klyrs wrote:
               | You're making this about Julia despite my repeated
               | statements to the contrary. Please reread what I've
               | written, you aren't responding to the actual point I've
               | made twice now. A reminder: I'm talking specifically
               | about my outlook on the future of Python, vis a vis my
               | historical experience with how other JIT languages have
               | developed.
               | 
               | If you wanted to rebut this, you'd need to argue that
               | Julia has always been awesome and that my experience with
               | a slow warmup was atypical. But that would be a lie,
               | right?
               | 
               | And, subtext: when I wrote my first commebt in this
               | thread, its highest sibling led with
               | 
               | > I think the pessimism really comes from a dislike for
               | Python
               | 
               | So I weighed in as a Python lover who is pessimistic for
               | reasons other than a bias against the language.
        
               | leephillips wrote:
               | > I'm talking specifically about my outlook on the future
               | of Python, vis a vis my historical experience with how
               | other JIT languages have developed.
               | 
               | But your assessment of the other language you mentioned
               | is several years out of date and made largely irrelevant
               | by the fast pace of progress. Therefore your conclusions
               | about the probable future of Python, which may be
               | correct, nevertheless do not follow.
        
         | vegesm wrote:
         | Because it only increases high single digit each release. If
         | they keep up the 10% improvement for the next 10 release, we
         | will reach a speedup of around 2.5 times. That's very small,
         | considering how Python is like 10-20 times slower than JS (not
         | even talking about C or Java like speeds).
        
       | pjmlp wrote:
       | Finally!
       | 
       | Regardless of the work being done in PyPy, Jython, GraalPy and
       | IronPython, having a JIT in CPython seems to be the only way
       | beyond "C/C++/Fortran libs are Python" mindset.
       | 
       | Looking forward to its evolution, from 3.13 onwards.
        
         | dlahoda wrote:
         | rust libs recently. pydantic for example
        
       | Iridescent_ wrote:
       | Wasn't CPython supposed to remain very simple in its codebase,
       | with the heavy optimization left for other implementations to
       | tackle? I seem to remember hearing as much a few years back.
        
         | toyg wrote:
         | That was the original idea, when Python started attracting
         | interest from big corporations. It has however become clear
         | that maintaining alternative implementations is very difficult
         | and resource-intensive; and if you have to maintain
         | compatibility with the wider ecosystem anyway (because that's
         | what users want), you might as well work with upstream to find
         | solutions that work for everyone.
        
         | lifthrasiir wrote:
         | The copy-and-patch approach was explicitly chosen in order to
         | minimize additional impacts on non-JIT-specific code base.
        
         | hangonhn wrote:
         | Does Python even have a language specification? I've been told
         | that CPython IS the specification. I don't know if this is
         | still true. In the Java world there is a specification and a
         | set of tests to test for conformation so it's easier to have
         | alternative implementations of the JVM. If what I said is
         | correct, then I can see how the optimized alternative
         | implementation idea is less likely to happen.
        
           | oskarkk wrote:
           | Well, for Python the language reference in the docs[0] is the
           | specification, and many things there are described as CPython
           | implementation details. Like: "CPython implementation detail:
           | For CPython, id(x) is the memory address where x is stored."
           | And as another example, dicts remembering insertion order was
           | CPython's implementation detail in 3.6, but from 3.7 it's
           | part of the language.
           | 
           | [0] https://docs.python.org/3/reference/index.html
        
           | recursivecaveat wrote:
           | There is a pretty detailed reference that distinguishes
           | between cpython implementation details and language features
           | at least. There was a jvm python implementation even. The
           | problem is more that a lot of the libraries that everyone
           | wants to use are very dependent on cpython's ffi which bleeds
           | a lot of internals.
        
         | ynik wrote:
         | The problem is that: * CPython is slow, making extension
         | modules written in C(++) very attractive * The CPython
         | extension API exposes many implementation details * Making use
         | of those implementation details helps those extension modules
         | be even faster
         | 
         | This resulted in a situation where the ecosystem is locked-in
         | to those implementation details: CPython can't change many
         | aspects of its own implementation without breaking the
         | ecosystem; and other implementations are forced to introduce
         | complex and slow emulation layers if they want to be compatible
         | with existing CPython extension modules.
         | 
         | The end result is that alternative implementations are not
         | viable in practice, as most existing libraries don't work
         | without their CPython extension modules -- users of alternative
         | implementations are essentially stuck in their own tiny
         | ecosystem and cannot make use of the large existing (C)Python
         | ecosystem.
         | 
         | CPython at least is in a position where they can push a
         | breaking change to the extension API and most libraries will be
         | forced to adapt. But there's very little incentive for library
         | authors to add separate code paths for other Python
         | implementations, so I don't think other implementations can
         | become viable until CPython cleans up their API.
        
         | bastawhiz wrote:
         | PyPy was released 17 years ago
         | 
         | Jython was released 22 years ago
         | 
         | IronPython was released 17 years ago
         | 
         | To date, no Python implementation has managed to hit all three:
         | 
         | 1. Stay compatible with any recent, modern CPython version
         | 
         | 2. Maintain performance for general-purpose usage (it's fast
         | enough without a warmup, and doesn't need to be heavily
         | parallelized to see a performance benefit)
         | 
         | 3. Stayed alive
         | 
         | Which, frankly, is kind of a shame. But the truth of the matter
         | is that it was a high bar to hit in the first place, and even
         | PyPy (which arguably had the biggest advantages: interest,
         | mindshare, compatibility, meaningful wins) managed to barely
         | crack a fraction of a percent of Python market share.
         | 
         | If you bet on other implementations being the source of
         | performance wins, you're betting on something which essentially
         | doesn't exist at this point.
        
       | vanderZwan wrote:
       | I think it's really cool that Haoran Xu and Fredrik Kjolstad's
       | copy-and-patch technique[0] is catching on, I remember
       | discovering it through Xu's blog posts about his LuaJIT remake
       | project[1][2], where he intends to apply these techniques to Lua
       | (and I probably found those through a post here). I was just
       | blown away by how they "recycled" all these battle-tested
       | techniques and technologies, and used it to synthesize something
       | novel. I'm not a compiler writer but it felt really clever to me.
       | 
       | I highly recommend the blog posts if you're into learning how
       | languages are implemented, by the way. They're incredible deep
       | dives, but he uses the details-element to keep the metaphorical
       | descents into Mariana Trench optional so it doesn't get too
       | overwhelming.
       | 
       | I even had the privilege of congratulating him the 1000th star of
       | the GH repo[3], where he reassured me and others that he's still
       | working on it despite the long pause after the last blog post,
       | and that this mainly has to do with behind-the-scenes rewrites
       | that make no sense to publish in part.
       | 
       | [0] https://arxiv.org/abs/2011.13127
       | 
       | [1] https://sillycross.github.io/2022/11/22/2022-11-22/
       | 
       | [2] https://sillycross.github.io/2023/05/12/2023-05-12/
       | 
       | [3] https://github.com/luajit-remake/luajit-remake/issues/11
        
         | checker659 wrote:
         | M. Anton Ertl and David Gregg. 2004. Retargeting JIT Compilers
         | by using C-Compiler Generated Executable Code. In Proceedings
         | of the 13th International Conference on Parallel Architectures
         | and Compilation Techniques (PACT '04). IEEE Computer Society,
         | USA, 41-50.
         | 
         | https://dl.acm.org/doi/10.5555/1025127.1025995
        
           | vanderZwan wrote:
           | Anton Ertl! <3
           | 
           | Context: I've been on a concatenative language binge
           | recently, and his work on Forth is awesome. In my defense he
           | doesn't seem to list this paper among his publications[0].
           | Will give this paper a read, thanks for linking it! :)
           | 
           | If they missed the boat on getting credit for their
           | contributions then at least the approach finally starts to
           | catch on I guess?
           | 
           | (I wonder if he got the idea from his work on optimizing
           | Forth somehow?)
           | 
           | [0] https://informatics.tuwien.ac.at/people/anton-ertl
        
           | lifthrasiir wrote:
           | While bears a significant resemblance, Ertl and Gregg's
           | approach is not automatic and every additional architecture
           | requires a significant understanding of the target
           | architecture---including an ability to ensure that fully
           | relocable code can be generated and extracted. In comparison,
           | the copy-and-patch approach can be thought as a simple
           | dynamic linker, and objects generated by unmodified C
           | compilers are far more predictable and need much less
           | architecture-specific information for linking.
        
             | vanderZwan wrote:
             | Does Ertl and Gregg's approach have any "upsides" over
             | copy-and-patch? Or is it a case of _just_ missing those one
             | or two insights (or technologies) that make the whole thing
             | a lot simpler to implement?
        
               | lifthrasiir wrote:
               | I think so, but I can't say this any more confident until
               | I get an actual copy of their paper (I used other review
               | papers to get the main idea instead).
        
             | pierrebai wrote:
             | The copy-and-patch also assumes the compiler will generate
             | patchable code. For example, on some architecture, have a
             | zero operand might have a smaller or different opcode
             | compared to a more general operand. Same issue for relative
             | jumps or offset ranges. It seems the main difference is
             | that the patch approach also patches jumps to absolute
             | addresses instead of requiring instruction-counter relative
             | code.
        
             | naasking wrote:
             | Full copy of the paper: https://www2.cs.arizona.edu/~collbe
             | rg/Teaching/553/2011/Reso...
             | 
             | There's also this which seems to use the same technique:
             | 
             | Templates-based portable just-in-time compiler,
             | https://dl.acm.org/doi/abs/10.1145/944579.944588
             | 
             | Nice to see there's still room for innovation in the VM
             | space!
        
         | giancarlostoro wrote:
         | Reminds me of David K who is local to me in Florida, or was,
         | last I spoke to him. He has been a Finite State Machine
         | advocate for ages, and its a well known concept, but you'd be
         | surprised how useful they can be. He pushes it for front-end a
         | lot, and even implemented a Tic Tac Toe sample using it.
         | 
         | https://twitter.com/DavidKPiano
        
         | bonzini wrote:
         | Copy and patch is a variant of QEMU's original "dyngen" backend
         | by Fabrice Bellard[1][2], with more help from the compiler to
         | avoid the maintainability issues that ultimately led QEMU to
         | use a custom code generator.
         | 
         | [1]
         | https://www.usenix.org/legacy/event/usenix05/tech/freenix/fu...
         | 
         | [2]
         | https://review.gerrithub.io/plugins/gitiles/spdk/qemu/+/5a24...
        
           | twbarr wrote:
           | Ultimately, most good ideas were first implemented by Fabrice
           | Bellard.
        
             | KRAKRISMOTT wrote:
             | Copy and patch goes all the way back to Grace Hopper's
             | original compiler implementation
        
             | darknavi wrote:
             | I am happy to see him working on QuickJS in the last month
             | or so. It could really use some ES2023 love!
        
         | matheusmoreira wrote:
         | Thanks a lot!! I'm something of a beginner language developer
         | and I've been collecting papers, articles, blog posts, anything
         | that provides accessible, high level description of these
         | optimization techniques.
        
       | mg wrote:
       | I love Python and use it for everything other than web
       | development.
       | 
       | One reason is performance. So if Python has a faster future ahead
       | of it: Hurray!
       | 
       | The other reason is that the Python ecosystem moved away from
       | stateless requests like CGI or mod_php use and now is completely
       | set on long running processes.
       | 
       | Does this still mean you have to restart your local web
       | application after any change you made to it? I heard that some
       | developers automate that, so that everytime they save a file, the
       | web application is restarted. That seems pretty expensive in
       | terms of resource consumption. And complex as you would have to
       | run some kind of watcher process which handles watching your
       | files and restarting the application?
        
         | onetoo wrote:
         | The restart isn't expensive in absolute terms, on a human level
         | it's practically instant. You would only do this during
         | development, hopefully your local machine isn't the production
         | environment.
         | 
         | It's also very easy, often just adding a CLI flag to your local
         | run command.
         | 
         | edit: Regarding performance, Python today can _easily_ handle
         | at least 1k requests per second. The vast vast vast majority of
         | web applications today don 't need anywhere near that kind of
         | performance.
        
           | ubercore wrote:
           | Been working with python for the web for over a decade. This
           | is basically a solved issue, and the performance is a non-
           | issue day to day.
        
           | mg wrote:
           | The thing is, I don't run my applications locally with a
           | "local run command".
           | 
           | I prefer to have a local system set up just like the
           | production server, but in a container.
           | 
           | Maybe using WSGI with MaxConnectionsPerChild=1 could be a
           | solution? But that would start a new (for example) Django
           | instance for every request. Not sure how fast Django starts.
           | 
           | Another option might be to send a HUP signal to Apache:
           | apachectl -k restart
           | 
           | That will only kill the worker threats. And when there are
           | none (because another file save triggered it already), this
           | operation might be almost free in terms of resource usage.
           | This also would require WSGI or similar. Not sure if that is
           | the standard approach for Django+Apache.
        
             | onetoo wrote:
             | I would still recommend running it properly locally, but
             | whatever. Pseudo-devcontainer it is. I assume the code is
             | properly volume mounted.
             | 
             | In production, you would want to run your app through
             | gunicorn/uvicorn/whatever on an internal-only port, and
             | reverse-proxy to it with a public-facing apache or similar.
             | 
             | Set up apache to reverse proxy like you would on prod, and
             | run gunicorn/uvicorn l/whatever like you would on prod,
             | except you also add the autoreload flag. E.g.
             | uvicorn main:app --host 0.0.0.0 --port 12345 --reload
             | 
             | If production uses containers, you should keep the python
             | image slim and simple, including only gunicorn/uvicorn and
             | have the reverse proxy in another container. Etc.
        
             | traverseda wrote:
             | Is the problem you're having that you feel the need to
             | expose a WSGI/ASGI interface instead of just a reverse
             | proxy? Take a look at gunicorn, and for static files server
             | you can use whitenoise.
             | 
             | With those two you can just stand up an python program in a
             | container that serves html, and put it behind whatever
             | reverse proxy you want.
        
           | edgyquant wrote:
           | I hate this argument that "most web apps don't need that kind
           | of performance." For one thing, with responsive apps that are
           | the norm it wouldn't be surprising for a session to begin
           | with multiple requests or to even have multiple requests per
           | second. At that point all it takes is a few hundred active
           | users to hit that 1k limit.
           | 
           | But even leaving that aside, you never know when your
           | application will be linked somewhere or go semi-viral and not
           | being able to serve 1000 users is all it takes for your app
           | to go down and your one shot at a successful company to die a
           | sad death.
        
             | onetoo wrote:
             | I didn't say python can handle <=1K, I was saying >=1K. I
             | feel confident that I am orders of magnitude off the real
             | limit you'd meet.
             | 
             | The specifics of that aside, any unprepared application is
             | going to buckle at a sudden mega-surge of users. The
             | solution remains largely the same, regardless of
             | technology: Make sure everything that can be cached is
             | cached, scale the hardware vertically until it stops
             | helping, optimize your code, scale horizontally until you
             | run out of money. I imagine the DB will be the actual
             | bottleneck, most of the time.
             | 
             | There are other reasons to not choose python for greenfield
             | application, but performance should rarely be one IMO.
        
         | neurostimulant wrote:
         | If you run the debug web server, (e.g. Django's `manage.py
         | runserver`) command, yes it has watcher that will automatically
         | restart the web server process if there is a code changes.
         | 
         | Once you deploy it to production, you usually run it using a
         | WSGI/ASGI server such as Gunicorn or Uvicorn and let whatever
         | deployment process you use handles the lifecycle. You usually
         | don't use watcher in production.
         | 
         | Basically similar stuff with nodejs, rails, etc.
        
         | BiteCode_dev wrote:
         | In dev, this is handled mostly by the OS with things like
         | inotify, so it has little perf impact.
         | 
         | In prod, you don't do it. Deployment implies sending a signal
         | like HUP to your app, so that it reloads the code gracefully.
         | 
         | All in all, everybody is moving to thid, even php. This allows
         | for persitent connexion, function memoization, delegation to
         | threadpools, etc
        
         | gklitz wrote:
         | > That seems pretty expensive in terms of resource consumption.
         | And complex as you would have to run some kind of watcher
         | process which handles watching your files and restarting the
         | application?
         | 
         | What? No, in reality it's just running your app in debug mode
         | (just a cli flag), and when you save the files the next refresh
         | of the browser has the live version of the app. It's neither
         | expensive nor complex.
        
         | swingingFlyFish wrote:
         | Python is amazing and shines for Web development. I'd recommend
         | taking a look at
         | https://www.tornadoweb.org/en/stable/index.html. I use this in
         | production on my pet project at https://www.meecal.co/. Put
         | Nginx in front and you're golden.
         | 
         | Definitely take a look, it's come a long way from ten years
         | ago.
        
           | aftbit wrote:
           | I personally like Quart, which is like Flask, but with
           | asyncio. Django is also incredibly popular and has been
           | around forever, so it is very battle-tested.
        
         | acdha wrote:
         | > Does this still mean you have to restart your local web
         | application after any change you made to it? I heard that some
         | developers automate that, so that everytime they save a file,
         | the web application is restarted. That seems pretty expensive
         | in terms of resource consumption.
         | 
         | All of the popular frameworks automatically reload. It's not
         | instantaneous but with e.g. Django it was less than the time I
         | needed to switch windows a decade ago and it hadn't gotten
         | worse. If you're used to things like NextJS it will likely be
         | noticeably faster.
        
           | qwertox wrote:
           | With `reload(module)` you don't even have to restart the
           | server if you structure it properly. Think server.py and
           | server_handlers.py, where server.py contains logic to detect
           | a modification of server_handlers.py (like via inotify) and
           | the base handlers which then call the "modifiable" handlers
           | in server_handlers.py. This is not limited to servers
           | (anything that loops or reacts to events) and can be nested
           | multiple levels deep and is among the top 3 reasons of why i
           | use Python.
        
         | declaredapple wrote:
         | > The other reason is that the Python ecosystem moved away from
         | stateless requests like CGI or mod_php use and now is
         | completely set on long running processes.
         | 
         | The long-running process is a WSGI/ASGI process that handles
         | spawning the actual code, similar to CGI. The benefit is that
         | it can handle how it spawns the request workers via multiple
         | runtimes, process/threads, etc. It's similar to CGI but instead
         | of nginx handling it, it's a special program that specializes
         | in the different options for python specifically.
         | 
         | > Does this still mean you have to restart your local web
         | application after any change you made to it? I heard that some
         | developers automate that, so that everytime they save a file,
         | the web application is restarted. That seems pretty expensive
         | in terms of resource consumption. And complex as you would have
         | to run some kind of watcher process which handles watching your
         | files and restarting the application?
         | 
         | Only for development!
         | 
         | To update your code in production you first deploy the new code
         | onto the machine, and then you tell the WSGI/ASGI such as
         | Gunicorn to reload. This will cause it to use the new code for
         | new request, without killing current requests.
         | 
         | It's a graceful reload, with no file watching needed. Just a
         | "systemctl reload gunicorn"
        
       | eterevsky wrote:
       | I'm even more excited to noGIL in 3.13. I wonder how these two
       | features will play together?
        
       | lucidguppy wrote:
       | If python became fast, there's a chance it may become a language
       | eater.
        
         | csjh wrote:
         | 2-9% isn't changing any language hierarchies
        
           | wiseowise wrote:
           | Groundwork for the future.
        
         | poncho_romero wrote:
         | What languages do you think it could realistically eat (that it
         | hasn't already)?
        
           | politelemon wrote:
           | Javascript, if it becomes viable for web development?
        
           | Sohcahtoa82 wrote:
           | I'd love to see it eat JavaScript and Java for back-end code.
           | 
           | But I doubt that's going to ever happen.
        
             | janalsncm wrote:
             | I like python but I would never choose it for anything more
             | than trivial on the backend. I want to know what types are
             | being passed around from one middleware function to the
             | next. Yes python has annotations but that's not enough.
        
               | acdha wrote:
               | Just wait until you see what the enterprise Java
               | developers passing around with type Object and encoded
               | XML blobs. Type checking is really useful but it can be
               | defeated in any language if you don't have a healthy
               | technical culture.
        
               | Sohcahtoa82 wrote:
               | Is that why I see much object
               | serialization/deserialization in Java?
               | 
               | They're trying to pass data between layers of middleware,
               | but Java has very strict typing, and the middleware
               | doesn't know what kind of object it will get, so it has
               | to do tons of type introspection and reflection to do
               | anything with the data?
        
         | Sohcahtoa82 wrote:
         | Doubtful.
         | 
         | There are enough people that REALLY hate whitespace-as-syntax.
        
       | matsemann wrote:
       | What is it really JIT-ing? Given it says that it's only relevant
       | for those building CPython. So it's not JIT-ing my Python code,
       | right? And the interpreter is in C. So what is it JIT-ing? Or am
       | I misunderstanding something?
       | 
       | > _A copy-and-patch JIT only requires the LLVM JIT tools be
       | installed on the machine where CPython is compiled from source,
       | and for most people that means the machines of the CI that builds
       | and packages CPython_
        
         | lifthrasiir wrote:
         | Code fragments that implement each opcode in the core
         | interpreter loop are additionally compiled in the way that each
         | fragment is compiled into a relocatable binary. Once processed
         | in that way, the runtime code generator can join required
         | fragments by patching relocations, essentially doing the job of
         | dynamic linkers. So it _is_ compiling your Python code, but the
         | compiled result is composed of pre-baked fragments with
         | patches.
        
       | hawk01 wrote:
       | It never hurts for any language to get an uplift in performance.
       | Exciting to see python getting that treatment
        
       | nritchie wrote:
       | At the end of the day, the number of optimizations that even a
       | JIT can do on Python is limited because all variables are boxed
       | (each time the variable is accessed the type of the variable
       | needs to be checked because it could change) and then function
       | dispatches must be chosen based on the type of the variable.
       | Without some mechanism to strictly type variables, the number of
       | optimizations will always be limited.
        
         | vanderZwan wrote:
         | Per the spec all JS values are boxed too (aside from values in
         | TypedArrays). The implementations managed to work their way
         | around that too for the most part.
        
         | johncolanduoni wrote:
         | Couldn't you say the same for e.g. JavaScript? The variables
         | aren't typed there either and prototypes are mutable. I could
         | definitely see things being harder with Python which has a lot
         | of tricky metaprogramming available that other interpreted
         | languages don't but I don't think it's as simple as a lack of
         | explicit types.
        
         | btown wrote:
         | Can't the happy path be branch predicted and speculatively
         | executed, though? AFAIK V8 seems to do this:
         | https://web.dev/articles/speed-v8#the_optimizing_compiler
        
         | Joker_vD wrote:
         | IIRC Instagram's flavour of Python had unboxed primitives (if
         | the types were constricted enough).
        
         | make3 wrote:
         | javascript is insanely more optimized but has the same
         | limitations as Python. So there is likely a lot more you can do
         | despite the flexibility, like figure out in hot code what
         | flexibility features are not used, & optimize around that
        
         | crabbone wrote:
         | Don't worry. Python already has syntactical constructs with
         | mandatory type annotations. I will not be surprised if few
         | years from now those type annotations will become mandatory in
         | other contexts as well.
        
       | jmdeschamps wrote:
       | Maybe the article should be dated "January 9, 2024" ??? (or is it
       | really a year old article?)
        
       | jmakov wrote:
       | How's this different than running pypy?
        
         | dagw wrote:
         | It supports all your existing python code and python libraries
         | (at the cost of being significantly slower than PyPy).
        
       | jjice wrote:
       | The last two-ish years have been insane for Python performance.
       | Something clicked with the core team and they obviously made this
       | a serious goal of theirs and the last few years have been
       | incredible to see.
        
         | pletnes wrote:
         | Microsoft are paying core devs to work on it full time, for
         | one.
        
         | eigenvalue wrote:
         | It's because the total dollars of capitalized software deployed
         | in the world using Python has absolutely exploded from AI
         | stuff. Just like how the total dollars of business conducted on
         | the web was a big driver of JS performance earlier.
        
           | dehrmann wrote:
           | But all the AI heavy lifting is done in native code.
        
             | dron57 wrote:
             | AI heavy lifting isn't just model training. There's about a
             | million data pipelines and processes before the training
             | data gets loaded into a PyTorch tensor.
        
               | HDThoreaun wrote:
               | also done in native code
        
               | tomjakubowski wrote:
               | Ehhh... if you're lucky. I've seen (and maybe even
               | written) plenty of we-didn't-have-time-to-write-this-
               | properly-with-dataframes Python data munging code, banged
               | out once and then deployed to production. I'll take
               | performance gains there.
        
               | janalsncm wrote:
               | From personal experience, no. I ended up writing rust
               | bindings to call from python that turned minutes of
               | loading into seconds.
        
         | crabbone wrote:
         | There were no noticeable performance improvements in the course
         | of the last two years. I have no idea what you are talking
         | about.
         | 
         | The major change that's been going on in Python core
         | development team is that Microsoft gets more and more power
         | over what happens to Python. Various PSF authorities had strong
         | links to Microsoft until today the head of PSF is straight up a
         | Microsoft employee. Microsoft doesn't like to advertise this
         | fact, because it rightfully suspects that rebranding Python as
         | "Microsoft Python" would scare off some old-timers at least,
         | but de facto it is "Microsoft Python".
         | 
         | The community gone from bad to worse. Any real discussion about
         | the language stopped years ago. Today it's a pretty top-down
         | decision making process where there's no feedback, no criticism
         | is allowed etc. My guess is that Microsoft doesn't have a plan
         | for the third "E" here, but who knows? Maybe eventually they'll
         | find a way to move Python to CLR and will peddle their version
         | of it? -- I wouldn't be surprised if that happened, actually.
        
           | attractivechaos wrote:
           | > _There were no noticeable performance improvements in the
           | course of the last two years._
           | 
           | In fairness, Python did get faster. Python 3.9 took 82
           | seconds for sudoku solving and 62 seconds for interval query.
           | Python 3.11 took 53 and 43 seconds, respectively [1]. v3.12
           | may be better. That said, whether the speedup is noticeable
           | can be subjective. 10x vs 15x slower than v8 may not make
           | much difference mentally.
           | 
           | [1] https://github.com/attractivechaos/plb2
        
           | mattgruskin wrote:
           | Microsoft already tried Python on the CLR! They didn't stick
           | with it. https://en.wikipedia.org/wiki/IronPython
        
             | crabbone wrote:
             | That's why I said _another_.
             | 
             | It was a different time. Microsoft had a different strategy
             | towards languages not developed by Microsoft. Similar to
             | how there also used to be JScript, but now Node.js is
             | basically a Microsoft's pet project.
             | 
             | There are actually plenty of popular Microsoft's projects
             | that took even more than two tries. Azure is like their
             | third attempt at cloud services, iirc. Credit where credit
             | is due, they learn from mistakes... unfortunately, that
             | only makes them more insidious.
        
       | Pxtl wrote:
       | I still don't get why they didn't reduce API of the interpreter
       | internals in Python 3 so that things like this would be more
       | achievable.
       | 
       | If you're going to break backwards compatibility, it's not like
       | Unicode was the _only_ foundational problem Python 2 had.
        
         | peterfirefly wrote:
         | They did change the API for Python modules implemented in C.
         | That was actually part of the reason why the 2->3 transition
         | went so badly.
         | 
         | It wasn't realistic to switch to 3.x when the libraries either
         | weren't there or were a lot slower (due to using pure Python
         | instead of C code).
         | 
         | It also wasn't realistic to rewrite the libraries when the
         | users weren't there.
         | 
         | It was in many respects a perfect case study in how not to do
         | version upgrades.
        
       | agounaris wrote:
       | a 2-9% improvement at global scale is insane! This is not a small
       | number by any means.
        
       | hyperman1 wrote:
       | The article presents a copy and patch jit as something new, but I
       | remember DOS's quickbasic doing the same thing. It generated very
       | bad assembly code in memory by patching together template
       | assembly blocks with filled in values, with a lot of INT
       | instructions toward the quickbasic runtime, but it did compile,
       | not interprete.
        
         | chc4 wrote:
         | Template JITs in general aren't a new technique, but Copy-and-
         | Patch is a specific method of implementing it (leveraging a
         | build time step to generate the templates from C code + ELF
         | relocations).
        
         | PeterisP wrote:
         | TIL! I had used qbasic back in school, but I somehow always
         | assumed that these basics were interpreters.
        
           | rpeden wrote:
           | QBasic was a slightly cut down version of Quickbasic that
           | didn't include the compiler, so your assumption was correct
           | in that case. QBasic was bundled with DOS but you had to buy
           | Quickbasic.
        
       | DeathArrow wrote:
       | If JIT is a good thing for Python, why don't just compile to Java
       | or .NET bytecode and use their already optimized infrastructure?
        
         | crabbone wrote:
         | Given how many Microsoft employees today steer the Python
         | decision making process, I'm sure in not so distant future, we
         | might see a new CLR-based Python implementation.
         | 
         | Maybe Microsoft don't know yet how to sell this thing, or maybe
         | they are just boiling the frog. Time will tell. But I'm pretty
         | sure your question will be repeated as soon as people will get
         | used to the idea of Python on JIT.
        
           | _bohm wrote:
           | Doesn't this already exist in IronPython?
        
             | Timothycquinn wrote:
             | Correct. I just checked the project yesterday and they are
             | presently at 3.4 :-|
        
             | crabbone wrote:
             | It's a lot about popularity / stigma.
             | 
             | Microsoft developed both JScript and Node.js. They could've
             | continued with JScript, but obviously decided against it
             | because JScript didn't earn the reputation they might have
             | hoped for. Even if they invested efforts into rectifying
             | the flaws of JScript, it would've been just too hard to
             | undo the reputation damage.
             | 
             | Microsoft made multiple attempts to "befriend" Python.
             | IronPython was one of the failures. They also tried to
             | provide editing tools (eg. intellisense in MSVS), but kind
             | of given up on that too (but succeeded to a large degree
             | with VSCode).
             | 
             | The whole long-term Microsoft's strategy is to capture and
             | put the developers on a leash. They won't rest until
             | there's a popular language they don't control.
        
           | Gollapalli wrote:
           | "Python code runs 15% faster and and 20% cheaper on azure
           | than aws, thanks to our optimized azurePython runtime. Use it
           | for azure functions and ml training"
           | 
           | Just a guess at the pitch.
        
         | dbrueck wrote:
         | If you're interested in learning more about the challenges and
         | tradeoffs, both Jython (https://www.jython.org/) and IronPython
         | (https://ironpython.net/) have been around for a long time and
         | there's a lot of reading material on that subject.
        
           | jsight wrote:
           | Graal Python exists too: https://www.graalvm.org/python/
           | 
           | It beats Python on performance, supposedly, but compatibility
           | has never been great.
        
             | SpaghettiCthulu wrote:
             | I've found the startup time for Graal Python to be terrible
             | compared with other Graal languages like JS. When I did
             | some profiling, it seemed that the vast majority of the
             | time was spent loading the standard library. If implemented
             | lazily, that should have a negligible performance impact.
        
         | monlockandkey wrote:
         | This is what you are looking for. Python on the GraalVM
         | 
         | https://github.com/oracle/graalpython
        
         | rogerbinns wrote:
         | Python is a convenient friendly syntax for calling code
         | implemented in C. While you can easily re-implement the syntax,
         | you then have to decide how much of that C to re-implement. A
         | few of the builtin types are easy (eg strings and lists), but
         | it soon becomes a mountain of code and interoperability,
         | especially if you want to get the semantics exactly right. And
         | that is just the beginning - a lot of the value of Python is in
         | the extensions, and many popular ones (eg numpy, sqlite3) are
         | implemented in C and need to interoperate with your re-
         | implementation. Trying to bridge from Java or .NET to those
         | extensions will overwhelm any performance advantages you got.
         | 
         | This JIT approach is improving the performance of bits of the
         | interpreter while maintaining 100% compatibility with the rest
         | of the C code base, its object model, and all the extensions.
        
       | jokoon wrote:
       | what are those future optimization he talks about?
       | 
       | he talks about an IL, but what's that IL? does that mean that the
       | future optimization will involve that IL?
        
       | bbojan wrote:
       | > At the moment, the JIT is only used if the function contains
       | the JUMP_BACKWARD opcode which is used in the while statement but
       | that will change in the future.
       | 
       | Isn't this the main reason why it's only a 2-9% improvement? Not
       | much Python code uses the _while_ statement in my experience.
        
       | darrenBaldwin03 wrote:
       | Woah - very interesting!
        
       | haberman wrote:
       | The article describes that the new JIT is a "copy-and-patch JIT"
       | (I've previously heard this called a "splat JIT"). This is a
       | relatively simple JIT architecture where you have essentially
       | pre-compiled blobs of machine code for each interpreter
       | instruction that you patch immediate arguments into by copying
       | over them.
       | 
       | I once wrote an article about very simple JITs, and the first
       | example in my article uses this style:
       | https://blog.reverberate.org/2012/12/hello-jit-world-joy-of-...
       | 
       | I take some issue with this statement, made later in the article,
       | about the pros/cons vs a "full" JIT:
       | 
       | > The big downside with a "full" JIT is that the process of
       | compiling once into IL and then again into machine code is slow.
       | Not only is it slow, but it is memory intensive.
       | 
       | I used to think this was true also, because my main exposure to
       | JITs was the JVM, which is indeed memory-intensive and slow.
       | 
       | But then in 2013, a miraculous thing happened. LuaJIT 2.0 was
       | released, and it was incredibly fast to JIT compile.
       | 
       | LuaJIT is undoubtedly a "full" JIT compiler. It uses SSA form and
       | performs many optimizations
       | (https://github.com/tarantool/tarantool/wiki/LuaJIT-
       | Optimizat...). And yet feels no more heavyweight than an
       | interpreter when you run it. It does not have any noticeable warm
       | up time, unlike the JVM.
       | 
       | Ever since then, I've rejected the idea that JIT compilers have
       | to be slow and heavyweight.
        
         | cbmuser wrote:
         | >>LuaJIT is undoubtedly a "full" JIT compiler.<<
         | 
         | Yes, and it's practically unmaintained. Pull requests to add
         | support for various architectures have remained largely
         | unanswered, including RISC-V.
        
           | dataangel wrote:
           | Doesn't change parent's point, clearly proves it's possible
        
           | peterfirefly wrote:
           | I think Mike Pall has done enough work on LuaJIT for several
           | lifetimes. If nobody else wants to merge pull requests and
           | make sure everything still works then maybe LuaJIT isn't
           | important enough to the world.
        
           | jpfr wrote:
           | The commit history looks pretty active...
           | 
           | https://github.com/LuaJIT/LuaJIT/commits/v2.1/
        
       | kzrdude wrote:
       | How do you access optimizations such as dead code removal and
       | constant propagation using this technique?
        
         | SpaghettiCthulu wrote:
         | I believe a JIT using this technique could eliminate dead code
         | at the Python bytecode level, but not at the machine code
         | level. That seems pretty reasonable to me.
        
           | kzrdude wrote:
           | Not sure, these optimizations multiply in power when used
           | together. Propagate constants and fold constants, after that
           | you can remove things like "if 0 > 0", both the conditional
           | check and the whole block below it, and so on.
        
       | divbzero wrote:
       | I love the description in the draft PR:                 'Twas the
       | night before Christmas, when all through the code       Not a
       | core dev was merging, not even Guido;       The CI was spun on
       | the PRs with care       In hopes that green check-markings soon
       | would be there;              ...              ...
       | ...              --enable-experimental-jit, then made it,
       | And away the JIT flew as their "+1"s okay'ed it.       But they
       | heard it exclaim, as it traced out of sight,       "Happy JIT-mas
       | to all, and to all a good night!"
       | 
       | https://github.com/python/cpython/pull/113465
        
       | thetinymite wrote:
       | The PR message with a riff off the Night Before Christmas is
       | gold.
       | 
       | https://github.com/python/cpython/pull/113465
        
       | julienchastang wrote:
       | "by Anthony Shaw, January 9, 2023"
       | 
       | 2024, right?
        
       | EmilStenstrom wrote:
       | It's interesting to see these 2-9% improvements from version to
       | version. They are always talked about with disappointment, as if
       | they are too small, but they also keep coming, with each version
       | being faster than the previous one. I prefer a steady 10% per
       | version over breaking things because you are hoping for bigger
       | numbers. Those percentages add up!
        
         | technocratius wrote:
         | Well I think they even multiply, making it even better news!
        
           | chmod775 wrote:
           | I'd rather they add up. Minus -5% runtime there, another -5%
           | there... Soon enough, python will be so fast my scripts
           | terminate before I even run them, allowing me to send
           | messages to my past self.
        
           | Qem wrote:
           | log(2)/log(1.1) ~= 7.27, so in principle sustained 10%
           | improvements could double performance every 7 releases. But
           | at some point we're bound to face diminishing returns.
        
         | hartator wrote:
         | Because it took 10 years to have Python 3 being as fast as
         | Python 2 while being more strict. 2-9% means it will be another
         | 10 years to have Python 3 being significantly faster.
         | 
         | Ref: https://mail.python.org/pipermail/python-
         | dev/2016-November/1...
        
           | chalst wrote:
           | 5.5% compounded over 5 years is a bit over 30%: not a huge
           | amount but an easily noticeable speed-up. What were you
           | thinking of when you typed "significantly faster"?
        
             | aktenlage wrote:
             | Compunding a decrease works differently than an increase.
             | If something gets 10% faster twice it actually got 19%
             | faster. In other words, the runtime is 90% of 90%, i.e.
             | 81%.
        
               | ummonk wrote:
               | Not if "faster" refers to computation rate rather than
               | runtime, in which case it becomes 100/81 i.e. 23% faster.
        
           | bilsbie wrote:
           | What! Why? (I couldn't figure it out from your link)
        
             | Retr0id wrote:
             | The link seems fairly clear to me - One explanation given
             | is that python3 represents _all_ integers in a  "long"
             | type, whereas python2 defaulted to small ints. This gave
             | (gives?) python2 an advantage on tasks involving
             | manipulating lots of small integers. Most real-world python
             | code isn't like this, though.
             | 
             | Interestingly they singled out pyaes as one of the worst
             | offenders. I've also written a pure-python AES
             | implementation, one that deliberately takes advantage of
             | the "long" integer representation, and it beats pyaes by
             | about 2000%.
        
         | nextaccountic wrote:
         | This is happening mostly because Guido left, right? The take
         | that CPython should be a reference implementation and thus slow
         | always aggravated me (because, see, no other implementation can
         | compete because every package depends on CPython kirks, in such
         | a way that we're now removing the GIL of CPython rather than
         | migrating to Pypy for example)
        
           | bb88 wrote:
           | Guido is still involved, but he's no longer the BDFL.
        
             | Cupprum wrote:
             | Just to clarify BDFL.
             | 
             | [1]:
             | https://en.wikipedia.org/wiki/Benevolent_dictator_for_life
        
           | korijn wrote:
           | Partly, yes, but do note he is still very much involved with
           | the faster-cpython project via Microsoft. Google faster
           | cpython and van rossum to find some interviews and talks. You
           | can also check out the faster-cpython project on github to
           | read more.
        
           | girvo wrote:
           | It's fascinating to me that this process seems to rhyme with
           | that of the path PHP took, with HHVM being built as a second
           | implementation, proving that PHP could be much faster -- and
           | the main project eventually adopting similar approaches. I
           | wonder if that's always likely to happen when talking about
           | languages as big as these are? Can a new implementation of it
           | ever really compete?
        
         | bilsbie wrote:
         | Someone please compare 3.13 to 2.3! I'd love to see how far
         | we've come.
        
           | chalst wrote:
           | Good idea! It can be done fairly easily by people who are
           | good with changelogs.
           | 
           | FWIW, the most recent changelog is at
           | https://docs.python.org/3.13/whatsnew/3.13.html
        
             | ska wrote:
             | I suspect parent meant a performance comparison...
        
         | matheusmoreira wrote:
         | I _envy_ these small and steady improvements!!
         | 
         | I spent about one week implementing PyPy's storage strategies
         | in my language's collection types. When I finished the vector
         | type modifications, I benchmarked it and saw the ~10% speed up
         | claimed in the paper1. The catch is performance increased
         | _only_ for unusually large vectors, like thousands of elements.
         | Small vectors were actually slowed down by about the same
         | amount. For some reason I decided to press on and implement it
         | on my hash table type too which is used everywhere. That slowed
         | the _entire_ interpreter down by nearly 20%. The branch is
         | still sitting there, unmerged.
         | 
         | I can't imagine how difficult it must have been for these guys
         | to write a compiler and _succeed_ at speeding up the Python
         | interpreter.
         | 
         | 1
         | https://tratt.net/laurie/research/pubs/html/bolz_diekmann_tr...
        
       | meisel wrote:
       | Why has it taken so much longer for CPython to get a JIT than,
       | say, PyPy? I would imagine the latter has far less engineering
       | effort and funding put into it.
        
         | kstrauser wrote:
         | For the longest time, CPython was deliberately optimized for
         | simplicity. That's a perfectly reasonable choice: it's easier
         | to reason about, easier for new maintainers to learn it, easier
         | to alter, easier to fix when it breaks, etc. Also, CPUs are
         | pretty good at running simple code very quickly.
         | 
         | It's only fairly recently that there's been critical mass of
         | people who thought that performance trumps simplicity, and even
         | then, it's only to a point.
        
           | nomel wrote:
           | > It's only fairly recently that there's been critical mass
           | of people who thought that performance trumps simplicity
           | 
           | This definitely wasn't true, from the user perspective. And,
           | I'm not even convinced it's some "critical mass" of
           | developers. These changes aren't coming from some mass of
           | developers, there's coming from a few experts that had a
           | clear plan, backed by the sanity of the huge disconnect that
           | languages are actually meant for users of the language, not
           | the developers of the language.
        
             | arccy wrote:
             | critical mass of cpython devs
        
       | ya3r wrote:
       | Maybe relevant blogpost is
       | https://devblogs.microsoft.com/python/python-311-faster-cpyt...
       | dated Oct 2022. The team behind this and some other recent
       | improvements to Python are at Microsoft.
        
       | t43562 wrote:
       | I wish the money could be spent on PyPy but pypy has its problems
       | - you don't get a big boost on small programs that run often
       | because the warmup time isn't that fabulous.
       | 
       | For larger programs like you sometimes it some incredibly
       | complicated incompatibility problem. For me bitbake was one of
       | those - could REALLY benefit from pypy but didn't work properly
       | and I couldn't fix it.
       | 
       | If this works more reliably or has a faster warmup then....well
       | it could help to fill in some gaps.
        
       | GGerome wrote:
       | What an horrible language..
        
       | matheusmoreira wrote:
       | So they compile the C implementation of every opcode into
       | templates and then patch in the actual values from the functions
       | being compiled. That's genius, massive inspiration for me. It's
       | automatically ABI compatible with the rest of CPython too.
       | 
       | Is there an similarly accessible article about the specializing
       | adaptive interpreter? It's mentioned in this article but not much
       | detail is given, only that the JIT builds upon it.
       | 
       | I wonder if I can skip the bytecode compilation phase.
        
       ___________________________________________________________________
       (page generated 2024-01-09 23:00 UTC)