[HN Gopher] Python 3.13 Gets a JIT
___________________________________________________________________
Python 3.13 Gets a JIT
Author : todsacerdoti
Score : 850 points
Date : 2024-01-09 08:35 UTC (14 hours ago)
(HTM) web link (tonybaloney.github.io)
(TXT) w3m dump (tonybaloney.github.io)
| milliams wrote:
| Brandt gave a talk about this at the CPython Core Developer
| Sprint late last year https://www.youtube.com/watch?v=HxSHIpEQRjs
| ageitgey wrote:
| For the lazy who just want to know if this makes Python faster
| yet, this is foundational work to enable later improvements:
|
| > The initial benchmarks show something of a 2-9% performance
| improvement.
|
| > I think that whilst the first version of this JIT isn't going
| to seriously dent any benchmarks (yet), it opens the door to some
| huge optimizations and not just ones that benefit the toy
| benchmark programs in the standard benchmark suite.
| Topfi wrote:
| Honestly, 2-9% already seems like a very signficant
| improvement, especially since as they mention "remember that
| CPython is already written in C". Whilst it's great to look at
| the potential for even greater gains by building upon this
| work, I feel we shouldn't undersell what's been accomplished.
| adastra22 wrote:
| What is being accomplished then?
| gray_-_wolf wrote:
| 2-9%
| bomewish wrote:
| Also recall that a 50% speed improvement in SQLite was caused
| by 50-100 different optimisations that each eeked out 0.5-1%
| speedups. On phone now don't have the ref but it all adds up.
| boxed wrote:
| Many small improvements is the way to go in most
| situations. It's not great clickbait, but we should
| remember that we got from a single cell at some time to
| humans through many small changes. The world would be a lot
| better if people just embraced the grind of many small
| improvements...
| toyg wrote:
| Marginal gains.
| https://www.bbc.co.uk/news/magazine-34247629
| Akronymus wrote:
| I tried searching for that article because I vaguely recall
| it, but can't find it either. But yeah, a lot of small
| improvements add up. Reminds me of this talk:
| https://www.youtube.com/watch?v=NZ5Lwzrdoe8
| Topfi wrote:
| Here is a source for the SQLite case:
| https://topic.alibabacloud.com/a/sqlite-387-a-large-
| number-o...
| Akronymus wrote:
| That looks like blogspam to me, rather than an actual
| source.
| formerly_proven wrote:
| https://sqlite-
| users.sqlite.narkive.com/CVRvSKBs/50-faster-t...
| IshKebab wrote:
| That's true, and Rust compiler speed has seen similar
| speedups from lots of 1% improvements.
|
| But even if you can get a 2x improvement from lots of 1%
| improvements (if you work really really hard), you're never
| going to get a 10x improvement.
|
| Rust is never going to compile remotely as quickly as Go.
|
| Python is never going to be remotely as fast as Rust, C++,
| Go, Java, C#, Dart, etc.
| inglor_cz wrote:
| Does it matter?
|
| Trains are never going to beat jets in pure speed. But in
| certain scenarios, trains make a lot more sense to use
| than jets, and in those scenarios, it is usually
| preferable having a 150 mph train to a 75 mph train.
|
| Looking at the world of railways, high-speed rail has
| attracted a lot more paying customers than legacy
| railways, even though it doesn't even try to achieve
| flight-like speeds.
|
| Same with programming languages, I guess.
| fl0ki wrote:
| What is the programming analogy here?
|
| Two decades ago, you could (as e.g. Paul Graham did at
| the time) argue that dynamically typed languages can get
| your ideas to market faster so you become viable and
| figure out optimization later.
|
| It's been a long time since that argument held. Almost
| every dynamic programming language still under active
| development is adding some form of gradual typing because
| the maintainability benefits alone are clearly
| recognized, though such languages still struggle to
| optimize well. Now there are several statically typed
| languages to choose from that get those maintainability
| benefits up-front and optimize very well.
|
| Different languages can still be a better fit for
| different projects, e.g. Rust, Go, and Swift are all
| statically typed compiled languages better fit for
| different purposes, but in your analogy they're all jets
| designed for different tactical roles, none of them are
| "trains" of any speed.
|
| Analogies about how different programming languages are
| like different vehicles or power tools or etc go way back
| and have their place, but they have to recognize that
| sometimes one design approach largely supersedes another
| for practical purposes. Maybe the analogy would be
| clearer comparing jets and trains which each have their
| place, to horse-drawn carriages which still exist but are
| virtually never chosen for their functional benefits.
| inglor_cz wrote:
| I cut my teeth on C/C++, and I still develop the same
| stuff faster in Python, with which I have _less_ overall
| experience by almost 18 years. Python is also much easier
| to learn than, say, Rust, or the current standard of C++
| which is a veritable and intimidating behemoth.
|
| In many domains, it doesn't really matter if the
| resulting program runs in 0.01 seconds or 0.1 seconds,
| because the dominant time cost will be in user input, DB
| connection etc. anyway. But it matters if you can crank
| out your basic model in a week vs. two.
| fl0ki wrote:
| > Python is also much easier to learn than, say, Rust
|
| I don't doubt it, but learning is only the first step to
| using a technology for a series of projects over years or
| even decades, and that step doesn't last that long.
|
| People report being able to pick up Rust in a few weeks
| and being very productive. I was one of them, if you
| already got over the hill that was C++ then it sounds
| like you would be too. The point is that you and your
| team stay that productive as the project gets larger,
| because you can all enforce invariants for yourselves
| rather than have to carry their cognitive load and make
| up the extra slack with more testing that would be
| redundant with types.
|
| Outside of maybe a 3 month internship, when is it
| worthwhile to penalize years of software maintenance to
| save a few weeks of once-off up-front learning? And it's
| not like you save it completely, writing correct Python
| still takes some learning too, e.g. beginners easily get
| confused about when mutable data structures are silently
| being shared and thus modified when they don't expect it.
| People who are already very comfortable with Python
| forget this part of their own learning curve, just like
| people very comfortable with Rust forget their first
| borrow check header scratcher.
|
| I never made a performance argument in this thread so I'm
| not sure why 0.01 or 0.1 seconds matters here. Even the
| software that got you into a commercial market has to be
| maintained once you get there. Ask Meta how they feel
| about the PHP they're stuck with, for example.
| spacechild1 wrote:
| > "remember that CPython is already written in C"
|
| What is this supposed to say? Most scripting language
| interpreters are written in low level languages (or
| assembly), but that alone doesn't say anything about the
| performance of the language itself.
| nertirs wrote:
| This means, that a lot of python libraries like polars or
| tensorflow are written not in python.
|
| So python programs, that already spend most of its cpu time
| running these libraries code, won't see much of an impact.
| gh02t wrote:
| Isn't the point that if pure Python was faster they
| wouldn't need to be written in other [compiled]
| languages? Having dealt with Cython it's not bad, but if
| I could write more of my code in native Python my
| development experience would be a lot simpler.
|
| Granted we're still very far from that and probably won't
| ever reach it, but there definitely seems to be a lot of
| progress.
| __MatrixMan__ wrote:
| Since Nim compiles to C, a middle step worth being aware
| of is Nim + nimporter which isn't anywhere near "just
| python" but is (maybe?) closer than "compile a C binary
| and call it from python".
|
| Or maybe it's just syntactic sugar around that. But sugar
| can be nice.
| eequah9L wrote:
| I think they mean that a lot of runtime of any benchmark is
| going to be spent in the C bits of the standard library,
| and therefore not subject to the JIT. Only the glue code
| and the bookkeeping or whatnot that the benchmark
| introduces would be improved by the JIT. This reduces the
| impact that the JIT can make.
| blagie wrote:
| From the write-up, I honestly don't understand how this paves
| the way. I don't see an architectural path from a cut-and-paste
| JIT to something optimizing. That's the whole point of a cut-
| and-paste JIT.
| lumpa wrote:
| There's a lot of effort going on to improve CPython
| performance, with optimization tiers, etc. It seems the JIT
| is how at least part of that effort will materialize:
| https://github.com/python/cpython/issues/113710
|
| > We're getting a JIT. Now it's time to optimize the traces
| to pass them to the JIT.
| guenthert wrote:
| Isn't it the case that Python allows for type specifier (type
| hints) since 3.5, albeit the CPython interpreter ignores
| them? The JIT might take advantage of them, which ought to
| improve performance significantly for some code.
|
| That what makes Python flexible is what makes it slow.
| Restricting the flexibility were possible offers
| opportunities to improve performance (and allows for tools
| and humans to spot errors more easily).
| a-french-anon wrote:
| Isn't CL a good counter-example to that "dynamism
| inherently stunts performances" mantra?
| guenthert wrote:
| To the contrary. In CL some flexibility was given up
| (compared to other LISP dialects) in favor of enabling
| optimizing compilers, e.g. the standard symbols cannot be
| reassigned (also preserving the sanity of human readers).
| CL also offers what some now call 'gradual typing', i.e.
| optional type declarations. And remaining flexibility,
| e.g. around the OO support, limits how well the compiler
| can optimize the code.
| Joker_vD wrote:
| But type declarations in Python are not required to be
| correct, are they? You are allowed to write
| def twice(x: int) -> int: return x + x
| print(twice("nope"))
|
| and it should print "nopenope". Right?
| Ringz wrote:
| Yep. Therefore it's better to def
| twice(x: int) -> int: if not isinstance(x, int):
| raise TypeError("Expected x to be an int, got " +
| str(type(x))) return x + x
| jhardy54 wrote:
| This can have substantial performance implications, not
| to mention DX considerations.
| Ringz wrote:
| Of course, this is not a good example of good, high-
| performance code, only an answer to the specific
| question... the questioner certainly also knows MyPy.
| Joker_vD wrote:
| I actually don't know anything about MyPy, only that it
| exists. Does it run that example correctly, that is, does
| it print "nopenope"? Because I think it's the correct
| behaviour, type hints should not actually affect
| evaluation (well, beyond the fact that they must be names
| that are visible in the scopes thay're used in,
| obviously), altough I could be wrong.
|
| Besides, my point was that one of the reasons why
| languages with (sound-ish) static types manage to have
| better performance because they can omit all of those
| run-time type checks (and the supporting machinery)
| because they'd never fail. And if you have to put those
| explicit checks, then the type hints are actually
| entirely redundant: e.g. Erlang's JIT ignores type specs,
| it instead looks at the type guards in the code to
| generate specialized code for the function bodies.
| sgerenser wrote:
| Surely this is the job for a linter or code generator (or
| perhaps even a hypothetical 'checked' mode in the
| interpreter itself)? Ain't nobody got time to add manual
| type checks to every single function.
| Ringz wrote:
| Of course not. That's what MyPy is for. It was only about
| the answer to exactly this question in this function.
| adhamsalama wrote:
| Or use mypy.
| VagabundoP wrote:
| The Python language server in Visual Studio Code will
| catch this if type checking is turned on, but by default,
| in CPython, that code will just work.
| kazinator wrote:
| Standard symbols being reassigned also breaks macros.
| formerly_proven wrote:
| You can't really on type annotations to help interpret the
| code.
| Difwif wrote:
| AFAIK good JITs like V8 can do runtime introspection and
| recompile on the fly if types change. Maybe using the type
| hints will be helpful but I don't think they are necessary
| for significant improvement.
| amelius wrote:
| Are there any benchmarks that give an idea of how much
| this might improve Python's speed?
| mike_hearn wrote:
| Well, GraalPython is a Python JIT compiler which can
| exploit dynamically determined types, and it advertises
| 4.3x faster, so it's possible to do drastically better
| than a few percent. I think that's state of the art but
| might be wrong.
|
| That's for this benchmark:
|
| https://pyperformance.readthedocs.io/
|
| Note that this is with a relatively small investment as
| these things go, the GraalPython team is about ~3 people
| I guess, looking at the GH repo. It's an independent
| implementation so most of the work went into being
| compatible with Python including native extensions (the
| hard part).
|
| But this speedup depends a lot on what you're doing. Some
| types of code can go much faster. Others will be slower
| even than CPython, for example if you want to sandbox the
| native code extensions.
| amelius wrote:
| This is great info, thanks!
| pletnes wrote:
| Pypy is a different JIT that gives anything from
| slower/same to 100x speedup depending on the benchmark.
| They give a geometric mean of 4.8x speedup across their
| suite of benchmarks. https://speed.pypy.org/
| cuchoi wrote:
| Doesn't Python already do this?
| https://www.youtube.com/watch?v=shQtrn1v7sQ
| dataangel wrote:
| I doubt it with a copy-and-patch JIT, not the way they work
| now. I'm a serious mypy/python-static-types user and as is
| they currently wouldn't allow you to do much optimization
| wise.
|
| - All integers are still big integers
|
| - Use of the typing opt-out 'Any' is very common
|
| - All functions/methods can still be overwritten at runtime
|
| - Fields can still be added and removed from objects at
| runtime
|
| The combination basically makes it mandatory to not use
| native arithmetic, allocate everything on the heap, and
| need multiple levels of indirection for looking up any
| variable/field/function. CPU perf nightmare. You need a
| real optimizing JIT to track when integers are in a narrow
| range and things aren't getting redefined at runtime.
| tekknolagi wrote:
| Sort of! But also not really. If you want to get into this,
| I wrote a post about this:
| https://bernsteinbear.com/blog/typed-python/
| Someone wrote:
| It should be fairly easy to add instruction fusing, where
| they recognize often-used instruction pairs, combine their C
| code, and then let the compiler optimize the combined code.
| Combining _LOAD_CONST_ with the instruction following it if
| that instruction pops the const from the stack seems an easy
| win, for example.
| ncruces wrote:
| If it was that easy, you'd do that in the interpreter and
| proportionally reduce interpretation overhead.
| Someone wrote:
| In the interpreter, I don't think it wouldn't reduce
| overhead much, if at all. You'd still have to recognize
| the two byte codes, and your interpreter would spend
| additional time deciding, for most byte code pairs, that
| it doesn't know how to combine them.
|
| With a compiler, that part is done once and, potentially,
| run zillions of times.
| ncruces wrote:
| If fusing a certain pair would significantly improve
| performance of most code, you'd just add that fused
| instruction to your bytecode and let the C compiler
| optimize the combined code in the interpreter. I have to
| assume CPython as already done that for all the low
| hanging fruit.
|
| In fact, for such a fused instruction to be optimized
| that way on a copy-and-patch JIT it'd need to exist as a
| new bytecode in interpreter. A JIT that fuses
| instructions is no longer a copy-and-patch JIT.
|
| A copy-and-patch JIT reduces interpretation overhead by
| making sure the branches in the executed machine code are
| the branches in the code to be interpreted, not branches
| in the interpreter.
|
| This is make a huge difference in more naive
| interpreters, not so much in an heavily optimized
| threaded-code interpreter.
|
| The 10% is great, and nothing to sneeze at for a first
| commit. But I'd actually like some realistic analysis of
| next steps for improvement, because I'm skeptical
| instruction fusing and other things being hand waved are
| it. Certainly not on a copy-and-patch JIT.
|
| For context: I spent significant effort trying to add
| such instruction fusing to a simple WASM AOT compiler and
| got nowhere (the equivalent of constant loading was
| precisely one of the pairs). Only moving to a much
| smarter JIT (capable of looking at whole basic blocks of
| instructions) started making a difference.
| londons_explore wrote:
| > . I don't see an architectural path from a cut-and-paste
| JIT to something optimizing.
|
| One approach used in V8 is to have a dumb-but-very-fast JIT
| (ie. this), and keep counters of how often each block of code
| runs (perhaps actual counters, perhaps using CPU sampling
| features), and then any block of code running more than a few
| thousand times run through a far more complex yet slower
| optimizing jit.
|
| That has the benefit that the 0.2% of your code which uses
| 95% of the runtime is the only part that has to undergo the
| expensive optimization passes.
| Sesse__ wrote:
| Note that V8 didn't have a dumb-but-very-fast JIT
| (Sparkplug) until 2021; the interpreter (Ignition) did that
| block counting and sent it straight to the optimizing JIT
| (TurboFan).
|
| V8 pre-2021 (i.e., only Ignition+TurboFan) was
| significantly faster than current CPython is, and the full
| current four-tier bundle
| (Ignition+Sparkplug+Maglev+TurboFan) only scores roughly
| twice as good on Speedometer as pure Ignition does.
| (Ignition+Sparkplug is about 40% faster than Ignition
| alone; compare that "dumbness" with CPython's 2-9%.) The
| relevant lesson should be that things like very carefully
| designed value representation and IR is a much more
| important piece of the puzzle than having as many tiers of
| compilation as possible.
| uluyol wrote:
| In case anyone is interested, V8 pre-ignition/TurboFan
| had different tiers [1]: full-codegen (dumb and fast) and
| crankshaft (optimizing). It's interesting to see how
| these things change over time.
|
| [1]: https://v8.dev/blog/ignition-interpreter
| dataangel wrote:
| > keep counters of how often each block of code runs ...
| and then any block of code running more than a few thousand
| times run through a far more complex yet slower optimizing
| jit.
|
| That's just all JITs. Sometimes its counters for going from
| interpreter -> JIT rather than levels of JITs, but this
| idea is as old as JITs.
| fulafel wrote:
| Support for generating machine code at all seems like a
| necessary building block to me and probably is quite a bit of
| effort to work on top of a portable interpreter code base.
| lifthrasiir wrote:
| An important context here is that the same code was reused for
| interpreter and JIT implementations (that's a main selling
| point for copy-and-patch JIT). In the other words, this 2--9%
| improvement mostly represents the core interpreter overhead
| that JIT should significant reduce. It was even possible that
| JIT itself might have no performance impact by itself, so this
| result is actually very encouraging; any future opcode
| specialization and refinement should directly translate to a
| measurable improvement.
| formerly_proven wrote:
| Copy&patch seems not much worse than compiling pure Python
| with Cython, which roughly corresponds to "just call whatever
| CPython API functions the bytecode interpreter would call for
| this bunch of Python", so that's roughly a baseline for how
| much overhead you get from the interpeter bit.
| lifthrasiir wrote:
| There is no reason to use copy-and-patch JIT if that were
| the case, because the good old threaded interpreter would
| have been fine. There are other optimization works in
| parallel with this JIT effort, including finer-grained
| micro operations (uops) that can replace usual opcodes at
| higher tiers. Uops themselves can be used without JIT, but
| the interpreter overhead is proportional to the number of
| (u)ops executed and would be too large for uops. The hope
| is that copy-and-patch JIT combined with uops have to be
| much faster than threaded code.
| ncruces wrote:
| A threaded interpreter still has one branch per bytecode
| instruction; a copy-and-patch JIT removes this overhead.
| vanderZwan wrote:
| You're right, and in this case "foundational work" even
| undersells how minimal this work really is compared to the
| results it already gets.
|
| I recommend that people watch Brandt Bucher's _" A JIT Compiler
| for CPython"_ from last year's CPython Core Developer
| Sprint[0]. It gives a good impression of the current
| implementation and its limitations, and some hints at what may
| or may not work out. It also indirectly gives a glimpse into
| the process of getting this into Python through the exchanges
| during the Q&A discussion.
|
| One thing to especially highlight is that this copy-and-patch
| has a much, much lower implementation complexity for the
| maintainers, as a lot of the heavy lifting is offloaded to
| LLVM.
|
| Case in point: as of the talk this was all just Brandt Bucher's
| work. The implementation at the time was ~700 lines of
| "complex" Python, ~100 lines of "complex" C, plus of course the
| LLVM dependency. This produces ~3000 lines of "simple"
| generated C, requires an additional ~300 lines of "simple"
| hand-written C to come together, and _no further dependencies_
| (so no LLVM necessary to run the JIT. Also "complex" and
| "simple" qualifiers are Bucher's terms, not mine).
|
| Another thing to note is that these initial performance
| improvements are _just from getting this first version of the
| copy-and-patch JIT to work at all_ , without really doing any
| further fine-tuning or optimization.
|
| This may have changed a bit in the months since, but the
| situation is probably still comparable.
|
| So if one person can get this up and running in a few klocs,
| most of which are generated, I think it's reasonable to have
| good hopes for its future.
|
| [0] https://www.youtube.com/watch?v=HxSHIpEQRjs
| sylware wrote:
| I removed the SDKs of some big (big for the wrong reasons) open
| source projects which generates a lot of code using python3
| scripts.
|
| In those custom SDKs, I do generate all the code at the start
| of the build, which takes a significant amount of time for
| mostly non-pertinent anymore/inappropiately done code
| generation.. I will really feel python3 speed improvement for
| those builds.
| attractivechaos wrote:
| I wouldn't be so enthusiastic. Look at other languages that
| have JIT now: Ruby and PHP. After years of efforts, they are
| still an order of magnitude slower than V8 and even PyPy [1].
| It seems to me that you need to design a JIT implementation
| from ground up to get good performance - V8, Dart and LuaJIT
| are like this; if you start with a pure interpreter, it may be
| difficult to speed it up later.
|
| [1] https://github.com/attractivechaos/plb2
| vlovich123 wrote:
| PyPy is designed from the ground up and is still slower than
| V8 AFAIK. Don't forget that v8 has enormous amounts of
| investment from professionally paid developers whereas PyPy
| is funded by government grants. Not sure about Ruby & PHP and
| it's entirely possible that the other JIT implementations are
| choosing simplicity of maintenance over eking out every
| single bit of performance.
|
| Python also has structural challenges like native extensions
| (don't exist in JavaScript) where the API forces slow code or
| massive hacks like avoiding the C API at all costs (if I
| recall correctly I read that's being worked on) and the GIL.
|
| One advantage Python had is the ability to use multiple cores
| way before JS but the JS ecosystem remained single threaded
| longer & decided to use message passing instead to build
| WebWorkers which let the JIT remain fast.
| attractivechaos wrote:
| PyPy is only twice as slow as v8 and is about an order of
| magnitude faster than CPython. It is quite an achievement.
| I would be very happy if CPython could get this performance
| but I doubt.
| chaxor wrote:
| Anyone know if there will be any better tools for cross-
| compiling python projects?
|
| The package management and build tools for python have been so
| atrociously bad (environments add far too much complexity to
| the ecosystem) that it turns many developers away from the
| language altogether. A system like Rust's package management,
| build tools, and cross compilation capability is an enormous
| draw, even without the memory safety. The fact that it
| _actually works_ (because of the package management and build
| tools) is the main reason to use the language really. Python
| used to do that ~10 years ago. Now _absolutely nothing_ works.
| It takes weeks to get simple packages working, only can do
| anything under extremely brittle conditions that nullify the
| project you 're trying to use this other package for, etc.
|
| If python could ever get it's act together and make better
| package management, and allow for cross-compiling, it could
| make a big difference. (I am aware of the very basic fact that
| it's interpreted rather than compiled yada yada - there are
| still ways to make executables, they are just awful). Since
| python is data science centric, it would be good to have decent
| data management capabilities too, but perhaps that could be
| after fundamental problem are dealt with.
|
| I tried looking at mojo, but it's not open source, so I'm quite
| certain that kills any hope of it ever being useful at all to
| anyone. The fact that I couldn't even install it without making
| an account made me run away as fast as possible.
| simonw wrote:
| "It takes weeks to get simple packages working"
|
| Can you expand on what you mean by that? I have trouble
| imagining a Python packaging problem that takes weeks to
| resolve - I'd expect them to either be resolvable in
| relatively short order or for them to prove effectively
| impossible such that people give up.
| chaxor wrote:
| - Trying to figure out what versions the scripts used and
| specifying them in a new poetry project - Realizing some
| OS-dependent software is needed so making a docker
| file/docker-compose.yml - Getting some of it working in the
| container with a poetry environment - Realizing that
| _other_ parts of the code work with _other_ versions, so
| making a different poetry environment for those parts -
| Trying to tie this package /container as a dependency of
| another project - Oh actually, this is a dependency of a
| dependency - How do you call a function from a package
| running in a container with multiple poetry environments in
| a package? - What was I doing again? - 2 weeks have passed
| trying to get this to work, perhaps I'll just do something
| else
|
| Rinse and repeat.
|
| -\\_(tsu)_/- That's python!
| riperoni wrote:
| I can't answer your initial question, but I do like to pile
| onto the package management points.
|
| Package consumption sucks so bad, since the sensible way of
| using are virtual envs where you copy all dependencies. Then
| for freezing venvs or dumping package versions, so you can
| port your project to a different system, doesn't consider
| only packages actually used/imported in code, but it just
| dumps everything in the venv. The fact you need external
| tools for this is frustrating.
|
| Then there is package creation. Legacy vs modern approach,
| cryptic __init__ files, multiple packaging backends, endless
| sections in pyproject.toml, manually specifying dependencies
| and dev-dependencies, convoluted ways of getting package
| metadata actually in code without having it in two places
| (such as CLI programs with --version).
|
| Cross compilation really would be a nice feature to simply
| distribute a single file executable. I haven' tested it, but
| a Linux system with Wine should in theory be capable of
| "cross" compiling between Linux and Windows.
|
| Still, like you, as a beginning I would prefer a sensible
| package management and package creation process.
| mcoliver wrote:
| Have you taken a look at Nuitka with GitHub actions for cross
| compilation? https://github.com/Nuitka/Nuitka-Action
| eviks wrote:
| > The initial benchmarks show something of a 2-9% performance
| improvement. You might be disappointed by this number, especially
| since this blog post has been talking about assembly and machine
| code and nothing is faster than that right?
|
| Indeed, reading the blog post build much higher expectations
| G3rn0ti wrote:
| Just running machine code itself does not make a program
| magically faster. It's all about the amount of work the machine
| code is doing.
|
| For example, if the JIT compiler realizes the program is adding
| two integers it could potentially replace the code with two
| MOVs and a single ADD. However, what about the error handling
| in the case of an overflow? Python switches to its internal
| BigInt representation in this case and cannot rely on
| architecture specific instructions alone once the result gets
| too large to fit into a register.
|
| Modern programming languages are all about trading performance
| for convenience and that is what makes them slow -- not because
| they are running an interpreter and not compiling to machine
| code.
| fer wrote:
| The bit everyone wants:
|
| > The initial benchmarks show something of a 2-9% performance
| improvement.
|
| Which is underwhelming (as mentioned in the article), especially
| if we look at PyPy[0]. But it's a step forward nonetheless.
|
| [0] https://speed.pypy.org/
| woadwarrior01 wrote:
| > At the moment, the JIT is only used if the function contains
| the JUMP_BACKWARD opcode which is used in the while statement
| but that will change in the future.
|
| It's a bit less underwhelming if you consider that only
| function objects with loops are being JITed. nb: for loops in
| Python also use the JUMP_BACKWARD op.
| lifthrasiir wrote:
| PyPy was never able to get fast enough to replace CPython in
| spite of its lack of compatible C API. CPython is trying to
| move fast without breaking C API, and 2--9% improvement is in
| fact very encouraging for that and other reasons (see my other
| comment).
| cryptos wrote:
| I always wondered how Python can be one of the world's most
| popular languages without anyone (company) stepping up and make
| the runtime as fast as modern JavaScript runtimes.
| tomwphillips wrote:
| Because enough users find the performance sufficient.
| rfoo wrote:
| Because the reason why Python is one of the world's most
| popular language (a large set of scientific computing C
| extensions) is bound to every implementation details of the
| interpreter itself.
| fer wrote:
| Easy and the number-crunching libs are optimized away in
| (generally) C.
| PartiallyTyped wrote:
| and FORTRAN.
| est31 wrote:
| Python is already fast where it matters: often, it is just used
| to integrate existing C/C++ libraries like numpy or pytorch. It
| is more an integration language than one where you write your
| heavy algorithms in.
|
| For JS, during the time that it received its JITs, there was no
| cross platform native code equivalent like wasm yet. JS had to
| compete with plugins written in C/C++ however. There was also
| competition between browser vendors, which gave the period the
| name "browser wars". Nowadays at least, the speed improvements
| for the end user thanks to JIT aren't also _that_ great, Apple
| provides a mode to turn off JIT entirely for security.
| ephimetheus wrote:
| I think usually the term "browser wars" refers to the time
| when Netscape and Microsoft were struggling for dominance,
| which concluded in 2001.
|
| JavaScript JITs only emerged around 2008 with SpiderMonkey's
| TraceMonkey, JavaScriptCore's SquirrelFish Extreme, and V8's
| original JIT.
| lifthrasiir wrote:
| There were multiple browser wars, otherwise you wouldn't
| need -s there ;-)
| nyanpasu64 wrote:
| Having recently implemented parallel image rendering in
| corrscope (https://github.com/corrscope/corrscope/pull/450),
| I can say that friends don't let friends write performance-
| critical code in Python. Depending on prebuilt C++ libraries
| hampers flexibility (eg. you can't customize the memory
| management or rasterization pipeline of matplotlib). Python's
| GIL inhibits parallelism within a process, and the workaround
| of multiprocessing and shared memory is awkward, has
| inconsistencies between platforms, and loses performance (you
| can't get matplotlib to render directly to an inter-process
| shared memory buffer, and the alternative of copying data
| from matplotlib's framebuffer to shared memory wastes CPU
| time).
|
| Additionally a lot of the libraries/ecosystem around shared
| memory (https://docs.python.org/3/library/multiprocessing.sha
| red_mem...) seems poorly conceived. If you pre-open shared
| memory in a ProcessPoolExecutor's initializer functions, you
| can't close them when the worker process exits (which _might_
| be fine, nobody knows!), but if you instead open and close a
| shared memory segment on every executor job, it _measurably_
| reduces performance, presumably from memory mapping overhead
| or TLB /page table thrashing.
| mkesper wrote:
| That's why optional GIL will be so important.
| ngrilly wrote:
| What would you use instead of Python?
| pas wrote:
| Cython? :o
| pas wrote:
| > Depending on prebuilt C++ libraries hampers flexibility
| (eg. you can't customize the memory management or
| rasterization pipeline of matplotlib).
|
| But what is the counterfactual? Implementing the whole
| thing in Python? It seems much more work than
| forking/fixing matplotlib.
| amelius wrote:
| > Python's GIL inhibits parallelism within a process, and
| the workaround of multiprocessing and shared memory is
| awkward, has inconsistencies between platforms, and loses
| performance
|
| Well, imho the biggest problem with this approach to
| paralellism is that you're stepping out of the Python world
| with gc'ed objects etc. and into a world of ctypes and
| serialization. It's like you're not even programming Python
| anymore, but more something closer to C with the speed of
| an interpreted language.
| zbentley wrote:
| > If you pre-open shared memory in a ProcessPoolExecutor's
| initializer functions, you can't close them when the worker
| process exits
|
| That's quite surprising to learn, as I didn't think the
| initializer ran in a specialized context (like a
| pthread_atfork postfork hook in the child).
|
| What happens when you try to close an initializer-allocated
| SharedMemory object on worker exit?
| nyanpasu64 wrote:
| ProcessPoolExecutor doesn't let you supply a callback to
| run on worker process exit, only startup. Perhaps I
| could've looked for and tried something like atexit
| (https://docs.python.org/3/library/atexit.html)? In any
| case I don't want to touch my code at the moment until I
| regain interest or hear of resource exhaustion, since "it
| works".
| xiphias2 wrote:
| Billions of dollars of product decisions use JS benchmark speed
| as one of the standard benchmarks to base buying decision on
| (for a good reason).
|
| For machine learning speed compiling to the right CUDA / OpenCL
| kernel is much more crucial, so there's where the money goes.
| CJefferson wrote:
| A big part of what made Python so successful was how easy it
| was to extend with C modules. It turns out to be very hard to
| JIT Python without breaking these, and most people don't want a
| Python that doesn't support C extension modules.
|
| The JavaScript VMs often break their extensions APIs for speed,
| but their users are more used to this.
| toyg wrote:
| JS doesn't really have the tradition of external modules that
| Python has, for a long time it only really existed inside the
| browser.
| amelius wrote:
| On the other hand, rewriting the C modules and adapting them
| to a different C API is very straightforward after you've
| done 1 or 2 of such modules. Perhaps it's even something that
| could be done by training an LLM like Copilot.
| Redoubts wrote:
| That's breakage you'd have to tread carefully on; and given
| the 2to3 experience, there would have to be immediate
| reward to entice people to undertake the conversion. No
| one's interested in even minor code breakage for minor
| short-term gain.
| Pxtl wrote:
| Which is why I'm shocked that Python's big "we're breaking
| backwards compatibility" release (Python 3) was mostly just
| for Unicode strings. It seems like the C API and the various
| __builtins__ introspection API thingies should've been the
| real focus on breaking backwards compatibility so that Python
| would have a better future for improvements like this.
| AndrewDucker wrote:
| Always interested in replies to this kind of comment, which
| basically boil down to "Python is so slow that we have to write
| any important code in C. And this is somehow a good thing."
|
| I mean, it's great that you _can_ write some of your code in C.
| But wouldn 't it be great if you could just write your
| libraries in Python and have them still be really fast?
| bdd8f1df777b wrote:
| Yes, but not so good when the JIT-ed Python can no longer
| reference those fast C code others have written. Every Python
| JIT project so far has suffered from incompatibility with
| some C-base Python extension, and users just go back to the
| slow interpreter in those cases.
| AndrewDucker wrote:
| "not so good when the JIT-ed Python can no longer reference
| those fast C code others have written"
|
| I don't see an indication in the article that that's the
| case. Am I missing something?
| kragen wrote:
| this was a big obstacle for pypy specifically
|
| https://www.pypy.org/posts/2011/05/numpy-follow-
| up-692862769...
|
| https://doc.pypy.org/en/latest/faq.html#what-about-numpy-
| num...
|
| i'm not sure what version they gave up at
| dagw wrote:
| _But wouldn 't it be great if you could just write your
| libraries in Python_
|
| Everybody obviously wants that. The question is are you
| willing to lose what you have in order to hopefully,
| eventually, get there. If Python 3 development stopped and
| Python 4 came out tomorrow and was 5x faster than python 3
| and a promise of being 50-100x faster in the future, but you
| have to rewrite all the libraries that use the C API, it
| would probably be DOA and kill python. People who want a
| faster 'almost python' already have several options to choose
| from, none of which are popular. Or they use Julia.
| AndrewDucker wrote:
| Why are you assuming that they'd have to rewrite all of
| their libraries? I don't see anything in the article that
| says that.
| dagw wrote:
| The reason this approach is so much slower than some of
| the other 'fast' pythons out there that have come before
| is that they are making sure you don't have to rewrite a
| bunch of existing libraries.
|
| That is the problem with all the fast python
| implementations that have come before. Yes, they're
| faster than 'normal' python in many benchmarks, but they
| don't support the entire current ecosystem. For example
| Instagram's python implementation is blazing fast for
| doing exactly what Instagram is using python for, but is
| probably completely useless for what I'm using python
| for.
| AndrewDucker wrote:
| Aaah, so it's not _this_ approach that you 're saying is
| an issue, it's the ones that significantly change Python.
| Gotcha, that makes sense. Thank you.
| el_oni wrote:
| it depends what speed is most important to you.
|
| When i was a scientist, speed was getting the code written
| during my break, and if it took all afternoon to run that's
| fine because i was in the lab anyway.
|
| Even as i moved more into the software engineer direction,
| and started profiling code more, most of the bottlenecks come
| from things like "creating objects on every incovation rather
| than pooling them", "blocking IO", "using a bad algorithm" or
| "using the wrong datasctructure for the task". problems that
| exist in every language, though "bad algorithm" or "using the
| wrong datasctructure" might matter less in a faster language
| you're still leaving performance on the table.
|
| > "Python is so slow that we have to write any important code
| in C. And this is somehow a good thing."
|
| The good thing is that python has a very vibrant ecosystem
| filled with great libraries, so we don't have to write it in
| C, because somebody else has. We can just benefit from that
| when the situation calls for it
| JodieBenitez wrote:
| Between writing C code and writing Python code, there is also
| Cython.
|
| But sure, I'm all for removing build steps and avoiding yet
| another layer.
| dannymi wrote:
| >I mean, it's great that you can write some of your code in
| C. But wouldn't it be great if you could just write your
| libraries in Python and have them still be really fast?
|
| That really depends.
|
| To make the issue clear, let's think about a similar
| situation:
|
| bash is nice because you can plug together inputs and outputs
| of different sub-executables (like grep, sed and so on) and
| have a big "functional" pipeline deliver the final result.
|
| Your idea would be "wouldn't it be great if you could just
| write your libraries in bash and have them still be really
| fast?". Not if you make bash into C, tanking productivity.
| And _definitely_ not if that new bash can 't run the old grep
| anymore (which is what usually is implied by the proposal in
| the case of Python).
|
| Also, I'm fine with not writing my search engines, databases
| and matrix multiplication algorithm implementations in bash,
| really. So are most other people, I suspect.
|
| Also, many proposals would weaken Python-the-language so it's
| not as expressive anymore. But I _want_ it to stay as dynamic
| as it is. It 's nice as a scripting language about 30 levels
| above bash.
|
| As always, there are tradeoffs. Also with this proposal there
| will be tradeoffs. Are the tradeoffs worth it or not?
|
| For the record, rewriting BLAS in Python (or anything else),
| even if the result was faster (!), would be a phenomenally
| bad idea. It would just introduce bugs, waste everyone's
| time, essentially be a fork of BLAS. There's no upside I can
| see that justifies it.
| pdpi wrote:
| languages don't need to all be good at the same thing. Python
| currently excels as a glue language you use to write drivers
| for modules written in lower-level languages, which is a
| niche that (afaik) nobody else seems to fill right now.
|
| While I'm all for making Python itself faster, it would be a
| shame to lose the glue language par excellence.
| hot_gril wrote:
| Pure JS libs are more portable. In Python, portability
| doesn't matter as much.
| yowlingcat wrote:
| > basically boil down to "Python is so slow that we have to
| write any important code in C. And this is somehow a good
| thing."
|
| I think that's a pretty ignorant interpretation. Python has
| been built to have a giant ecosystem of useful, feature-
| complete, stable, well built code that has been used for
| decades and for which there is no need to reinvent the wheel.
| If that already describes the universe of libraries that you
| /need/ to be extremely fast and the rest of your code is IO
| limited and not CPU limited, why reinvent the wheel?
|
| That makes your comment even more inaccurate because you
| likely don't need to write any "important" (which you are
| stretching to mean "fast") code in C -- you utilize existing
| off the shelf fast libraries that are written in Fortran,
| CUDA, C, Rust or any other language a pre-existing ecosystem
| was built in.
|
| Try and think of a language that has mature capabilities for
| domains as far away as what Django solves for, what pandas
| solves for, what pytorch solves for, and still has fantastic
| tooling like jupyter and streamlit. I can't think of any
| other language that has the combined off the shelf breadth
| and depth of Python. I don't want to have to write fast code
| in any language unless forced to, because the vast majority
| of the time I can customize a great off the shelf package and
| only write the remaining 1% of glue. I can't see why a
| professional engineer would 99% of the time would need to
| take a remotely different approach.
| yellowstuff wrote:
| There have been several attempts. For example, Google tried to
| introduce a JIT in 2011 with a project named Unladen Swallow,
| but that ended up getting abandoned.
| pansa2 wrote:
| Unladen Swallow was massively over-hyped. It was talked about
| as though Google had a large team writing "V8 for Python",
| but IIRC it was really just an internship project.
| kragen wrote:
| well, there were a couple of guys working on it
| dagw wrote:
| _anyone (company) stepping up and make the runtime as fast as
| modern JavaScript runtimes._
|
| There are a lot of faster python runtimes out there. Both
| Google and Instagram/Meta have done a lot of work on this,
| mostly to solve internal problems they've been having with
| python performance. Microsoft has also done work on parallel
| python. There's PyPy and Pythran and no doubt several others.
| However none of these attempts have managed to be 100%
| compatible with the current CPython (and more importantly the
| CPython C API), so they haven't been considered as
| replacements.
|
| JavaScript had the huge advantage that there was very little
| mission critical legacy JavaScript code around they had to take
| into consideration, and no C libraries that they had to stay
| compatible with. Meaning that modern JavaScript runtime teams
| could more or less start from scratch. Also the JavaScript
| world at the time were a lot more OK with different JavaScript
| runtimes not being 100% compatible with each other. If you
| 'just' want a faster python runtime that supports most of
| python and many existing libraries, but are OK with having to
| rewrite some your existing python code or third party libraries
| to make it work on that runtime, then there are several to
| choose from.
| skriticos2 wrote:
| JS also had the major advantage of being sandboxed by design,
| so they could work from there. Most of the technical legacy
| centered around syntax backwards compatibility, but it's all
| isolated - so much easier to optimize.
|
| Python with it's C API basically gives you the keys to the
| kingdom on a machine code level. Modifying something that has
| an API to connect to essentially anything is not an easy
| proposition. Of course, it has the advantage that you can
| make Python faster by performance analysis and moving the
| expensive parts to optimized C code, if you have the
| resources.
| mike_hearn wrote:
| Google/Instagram have done bits, but the company that's done
| the most serious work on Python performance is actually
| Oracle. GraalPython is a meaningfully faster JIT (430% faster
| vs 7% for this JITC!) and most importantly, it can utilize at
| least some CPython modules.
|
| They test it against the top 500 modules on PyPI and it's
| currently compatible with about half:
|
| https://www.graalvm.org/python/compatibility/
|
| But investment continues. It has some other neat features too
| like sandboxing and the ability to make single-binary
| programs.
|
| The GraalPython guys are working on the HPy effort as well,
| which is an attempt to give Python a properly specified and
| engine-neutral extension API.
| Pxtl wrote:
| Node.js and Python 3 came out at around the same time. Python
| had their chance to tell all the "mission critical legacy
| code" that it was time to make hard changes.
| dagw wrote:
| As much as I would have loved to see some more 'extreme'
| improvements to python, given how the python community
| reacted to the relatively minor changes that python 3
| brought, anything more extreme would very likely have
| caused a Perl 6 style situation and quite possibly have
| killed the language.
| albertzeyer wrote:
| In lots of applications, all the computations already happen
| inside native libraries, e.g. Numpy, PyTorch, TensorFlow, JAX
| etc.
|
| And if you have a complicate computation graph, there are
| already JITs on this level, based on Python code, e.g. see
| torch.compile, or TF XLA (done by default via tf.function),
| JAX, etc.
|
| It's also important to do JIT on this level, to really be able
| to fuse CUDA ops, etc. A generic Python JIT probably cannot
| really do this, as this is CUDA specific, or TPU specific, etc.
| el_oni wrote:
| I think the thing with python is that it's always been "fast
| enough" and if not you can always reach out to natively
| implemented modules. On the flipside javascript was the main
| language embedded in web browsers.
|
| There has been a lot of competition to make browsers fast.
| Nowadays there are 3 main JS engines, V8 backed by google,
| JavaScriptCore backed by apple, and spidermonkey backed by
| mozilla.
|
| If python had been the language embedded into web browsers,
| then maybe we would see 3 competing python engines with crazy
| performance.
|
| The alternative interpreters for python have always been a bit
| more niche than Cpython, but now that Guido works at microsoft
| there has been a bit more of a push to make it faster
| PartiallyTyped wrote:
| Meta has actually been doing that -- helping improve python's
| speed -- with things like [1,2]
|
| [1] https://peps.python.org/pep-0703/
|
| [2] https://news.ycombinator.com/item?id=36643670
| JodieBenitez wrote:
| Because it's already fast enough for most of us ? Anecdote, but
| I've had my share of slow things in Javascript that are _not_
| slow in Python. Try to generate a SHA256 checksum for a big
| file in the browser...
|
| Good to see progress anyways.
| jampekka wrote:
| Python's SHA256 is written in C. And I'd quess Web Crypto API
| for JS is in the same ballbark.
|
| SHA256 in pure Python would be unusably slow. In Javascript
| it would be at least usably slow.
|
| Javascript is fast. Browsers are fast.
| Scarblac wrote:
| The point of Python is quickly integrating a very wide
| range of fast libraries written in other languages though,
| you can't ignore that performance just because it's not
| written in Python.
| JodieBenitez wrote:
| Have you tried to generate a SHA256 checksum for a file in
| the browser, no matter what crypto lib or api is available
| to you ? Have you tried to generate it using Python
| standard lib ?
|
| I did, and doing it in the browser was so bad that it was
| unusable. I suspect that it's not the crypto that's slow
| but the file reading. But anyway...
|
| > SHA256 in pure Python would be unusably slow
|
| None would do that because:
|
| > Python's SHA256 is written in C
|
| Hence why comparing "pure python" to "pure javascript" is
| mostly irrelevant for most day to day tasks, like most
| benchmarks.
|
| > Javascript is fast. Browsers are fast.
|
| Well, no they were not for my use case. Browsers are
| _really slow_ at generating file checksums.
| adastra22 wrote:
| The Pytthon standard lib calls out to hand optimized
| assembly language versions of the crypto algos. It is of
| no relevance to a JIT-vs-interpreted debate.
| masklinn wrote:
| It absolutely _is_ relevant to the "python is slow reee"
| nonsense tho, which is the subject. Python-the-language
| being slow is not relevant for a lot of the users,
| because even if they don't know they use Python mostly as
| a convenient interface to huge piles of native code which
| does the actual work.
|
| And as noted upthread that's a significant part of the
| uptake of Python in scientific fields, and why pypy
| despite the heroic work that's gone into it is often a
| non-entity.
| jampekka wrote:
| Python is slow, reee.
|
| This is a major problem in scientific fields. Currently
| there are sort of "two tiers" of scientific programmers:
| ones who write the fast binary libraries and ones that
| use these from Python (until they encounter e.g. having
| to loop and they are SOL).
|
| This is known as the two language problem. It arises from
| Python being slow to run and compiled languages being bad
| to write. Julia tries to solve this (but fails due to
| implementation details). Numba etc try to hack around it.
|
| Pypy is sadly vaporware. The failure from the beginning
| was not supporting most popular (scientific) Python
| libraries. It nowadays kind of does, but is brittle and
| often hard to set up. And anyway Pypy is not very fast
| compared to e.g. V8 or SpiderMonkey.
|
| Reee.
| pas wrote:
| The major problem in scientific fields is not this, but
| the amount of incompetence and the race-to-the-bottom
| environment which enables it. Grant organizations don't
| demand rigor and efficiency, they demand shiny papers.
| And that's what we get. With god awful code and very
| questionable scientific value.
| jampekka wrote:
| There are such issues, but I don't think they are a very
| direct cause of the two language problem.
|
| And even these issues are part of the greater problem of
| late stage capitalism that in general produces god awful
| stuff with questionable value. E.g. vast majority of
| industry code is such.
| JodieBenitez wrote:
| > Julia tries to solve this (but fails due to
| implementation details)
|
| Care to list some of those details ? (I have zero
| knowledge in Julia)
| jampekka wrote:
| This is quite a good intro:
| https://viralinstruction.com/posts/badjulia/
| affinepplan wrote:
| fyi: the author of that post is a current Julia user and
| intended the post as counterpoint to their normally
| enthusiastic endorsements. so while it is a good intro to
| some of the shortfalls of the language, I'm not sure the
| author would agree that Julia has "failed" due to these
| details
| jampekka wrote:
| Yes, but it's a good list of the major problems, and
| laudable for a self-professed "stan" to be upfront about
| them.
|
| It's my assesment that the problems listed in there are a
| cause why Julia will not take off and we're largely stuck
| with Python for the foreseeable future.
| adgjlsfhk1 wrote:
| It is worth noting that the first of the reasons
| presented is significantly improved in Julia 1.9 and 1.10
| (released ~8 months and ~1 month ago). The time for
| `using BioSequences, FASTX` on 1.10 is down to 0.14
| seconds on my computer (from 0.62 seconds on 1.8 when the
| blog post was published).
| jampekka wrote:
| TTFX is indeed getting a lot better. But e.g. "using
| DynamicalSystems" is still over 5 seconds.
|
| There is something big going on in caching the binaries,
| so there's a chance the TTFX will get workable.
| adastra22 wrote:
| There is pleeeenty of mission critical stuff written in
| Python, for which interpreter speed is a primary concern.
| This has been true for decades. Maybe not in your
| industry, but there are other Python users.
| jiripospisil wrote:
| Just for giggles I tried this and I'm getting ~200ms when
| reading and hashing 50MB file in the browser (Chromium
| based) vs ~120ms using Python 3.11.6.
|
| https://gist.github.com/jiripospisil/1ae8b877b1c728536e38
| 2fc...
|
| https://jsfiddle.net/yebdnz6x/
| JodieBenitez wrote:
| Not so bad compared to what I tried a few years ago.
| Might finally be _usable_ for us...
| jampekka wrote:
| All major browsers have supported it for over eight
| years. Maybe the problem was between the seat and the
| keyboard?
|
| https://caniuse.com/mdn-api_crypto_subtle
| JodieBenitez wrote:
| Maybe 8 years is not much in a career ? Maybe we had to
| support one of those browsers that did not support it ?
| Maybe your snarky comment is out of place ? And even to
| this day it's still significantly slower than Python
| stdlib according to the tester. So much for "why python
| not as fast as js, python is slow, blah blah blah".
| ptx wrote:
| I thought that perhaps the difference could be due to the
| JavaScript version having to first read the entire file
| before getting started on hashing it , whereas the Python
| does it incrementally (which the browser API doesn't
| support [0]). But changing the Python version to work
| like the JavaScript version doesn't make a big
| difference: 30 vs 35 ms (with a ~50 MB file) on my
| machine.
|
| The slowest part in the JavaScript version seems to be
| reading the file, accounting for 70-80% of the runtime in
| both Firefox and Chromium.
|
| [0] https://github.com/w3c/webcrypto/issues/73
| lambda_garden wrote:
| > Have you tried to generate a SHA256 checksum for a file
| in the browser
|
| Have you tried to do this in Python?
|
| A Node comparison would be more appropriate.
| tgv wrote:
| Teaching. So many colleges/unis I know teach "Introduction to
| Programming" with Python these days, especially to non-CS
| students/pupils.
| bigfishrunning wrote:
| I think python is very well suited to people who do
| computation in Excel spreadsheets. For actual CS students,
| I'd rather see something like scheme be a first language (but
| maybe I'm just an old person)
| hot_gril wrote:
| They do both Python and Scheme in the same Berkeley intro
| to CS class. But I think the point of Scheme is more to
| expand students' thinking with a very different language.
| The CS fundamentals are still covered more in the Python
| part of the course.
| VagabundoP wrote:
| Its even _in_ Excel nowadays!!
| fractalb wrote:
| I still scratch my head why it's not installed by default on
| Windows.
| Yasuraka wrote:
| You might want to checkout Mojo, which is not a runtime but a
| different language, but also designed to be a superset of
| Python. Beware though that it's not yet open source, which is
| slated for this Q1
|
| https://docs.modular.com/mojo/manual/
|
| edit: The main point I forgot to mention - it aims to compete
| with "low-level" languages like C and Rust in performance
| FergusArgyll wrote:
| Because it doesn't useGrossCamelCaseAsOften
| IshKebab wrote:
| Two reasons:
|
| 1. Javascript is a less dynamic language than Python and
| numbers are all float64 which makes it a lot easier to make
| fast.
|
| 2. If you want to run fast code on the web you only have one
| option: make Javascript faster. (Ok we have WASM now but that
| didn't exist at the time of the Javascript Speed wars.) If you
| want to run fast code on your desktop you have a _MUCH_ easier
| option: don 't use Python.
| soulbadguy wrote:
| > Javascript is a less dynamic language than Python
|
| I have seen this mentioned multiple times, someone as a good
| reference explaining what makes python more dynamic than JS ?
| bjackman wrote:
| JavaScript has to be fast because its users were traditionally
| captive on the platform (it was the only language in the
| browser).
|
| Python's users can always swap out performance critical
| components to another language. So Python development delivered
| more when it focussed on improving strengths rather than
| mitigating weaknesses.
|
| In a way, Python being slow is just a sign of a healthy
| platform ecosystem allowing comparative advantages to shine.
| hot_gril wrote:
| New runtimes like NodeJS have expanded JS beyond web, and JS's
| syntax has improved the past several years. But before that
| happened, Python on its own was way easier for non-web scripts,
| web servers, and math/science/ML/etc. Optimized native libs and
| ecosystems for those things got built a lot earlier around
| Python, in some cases before NodeJS even existed.
|
| Python's syntax is still nicer for mathy stuff, to the point
| where I'd go into job coding interviews using Python despite
| having used more JS lately. And I'm comparing to JS because
| it's the closest thing, while others like Java are/were far
| more cumbersome for these uses.
| p4bl0 wrote:
| This was a fantastic, very clear, write-up on the subject. Thanks
| for sharing!
|
| If the further optimizations that this change allows, as
| explained at the end of this post, are covered as well as this
| one, it promises to be a _very_ interesting series of blog posts.
| jagaerglad wrote:
| Can someone explain what a JIT compiler means in the case of an
| interpreted language?
| pjmlp wrote:
| Basically a JIT (Just In Time), is also known as a dynamic
| compiler.
|
| It is an approach that traces back to original Lisp and BASIC
| systems, among others lesser kwown ones.
|
| The compiler is part of the language runtime, and code gets
| dynamically compiled into native code.
|
| Why is this a good approach?
|
| It allows for experiences that are much harder to implement in
| languages that tradicionally compile straight to native code
| like C (note there are C interpreters).
|
| So you can have an interpreter like experience, and code gets
| compiled to native code before execution on the REPL, either
| straight away, or after the execution gets beyond a specific
| threshold.
|
| Additionally, since dynamic languages per definition can change
| all the time, a JIT can profit from code instrumentation, and
| generate machine code that takes into account the types
| actually being used, something that an AOT approach for a
| dynamic language cannot predit, thus optimizations are hardly
| an option in most cases.
| LewisVerstappen wrote:
| Great article, but small typo when the author says "copy-any-
| patch JIT"
| ericvsmith wrote:
| That's not a typo, that's the name of the technique.
| kragen wrote:
| i think it's 'copy-and-patch'
| ericvsmith wrote:
| D'oh! Of course you're correct. I skipped over "any", and
| focused on "patch". Sorry about that.
| kragen wrote:
| no harm done :)
| cqqxo4zV46cp wrote:
| Unfortunate to see a couple of comments here drive-by pulling out
| the "x% faster" stat whilst minimising the context. This is a big
| deal and it's effectively a given that this'll pave the way for
| further enhancements.
| kragen wrote:
| maybe, maybe not. time will tell. ahead-of-time compilation is
| even better known for improving performance and yet perl's
| compile-to-c backend turned out to fail to do that
| pjmlp wrote:
| Ahead-of-time compilation is a bad solution for dynamic
| languages, so that is an expected outcome for Perl.
|
| The base line should be how heavily dynamic languages like my
| favourite set, Smalltalk, Common Lisp, Dylan, SELF,
| NewtonScript, ended up gaining from JIT, versus the original
| interpreters, while being in the genesis of many relevant
| papers for JIT research.
| kragen wrote:
| when i wrote ur-scheme one of the surprising things i
| learned from it was that ahead-of-time compilation worked
| amazingly well for scheme. scheme is ruthlessly monomorphic
| but i was still doing a type check on every primitive
| argument
|
| i didn't realize they ever jitted newtonscript
| pjmlp wrote:
| NewtonScript 2.0 introduced a mechanism to manually JIT
| code, functions marked as native get compiled into
| machine code.
|
| Had the Newton not been canceled, probably there would be
| an evolution from that support.
|
| See "Compiling Functions for Speed"
|
| https://www.newted.org/download/manuals/NewtonToolkitUser
| sGu...
| kragen wrote:
| this is great, thanks! but it sounds like it was an aot
| compiler, not a jit compiler; for example, it explains
| that a drawback of compiling functions to native code is
| that they use more memory, and that the compiler still
| produces bytecode for the functions it compiles natively,
| unless you suppress the bytecode compilation in project
| settings
| pjmlp wrote:
| Yeah, I guess if one wants to go more technical, I see it
| as the first step of a JIT that didn't had the
| opportunity to evolve due to market decisions.
| kragen wrote:
| i guess if they had, we would know whether a jit made
| newtonscript faster or slower, but they didn't, so we
| don't. what we do know is that an aot compiler sometimes
| made newtonscript faster (though maybe only if you added
| enough manifest static typing annotations to your source
| code)
|
| that seems closer to the opposite of what you were saying
| in the point on which we were in disagreement?
| pjmlp wrote:
| I guess my recolection regarding NewtonScript wasn't
| correct, if you prefer that I put it like that, however I
| am quite certain in regards to the other languages in my
| list.
| kragen wrote:
| i agree that the other languages gained a lot for sure
|
| maybe i should have said that up front!
|
| except maybe common lisp; all the implementations i know
| are interpreted or aot-compiled (sometimes an expression
| at a time, like sbcl), but maybe there's a jit-compiled
| one, and i bet it's great
|
| probably with enough work python could gain a similar
| amount. it's possible that work might get done. but it
| seems likely that it'll have to give up things like
| reference-counting, as smalltalk did (which most of the
| other languages never had)
| lispm wrote:
| Note that interpreter in the Lisp world by default has a
| different meaning.
|
| A "Lisp interpreter" runs Lisp source in the form of
| s-expressions. That's what the first Lisp did.
|
| A "Lisp compiler" compiles Lisp source code to native
| code, either directly or with the help of a C compiler or
| an assembler. A Lisp compiler could also compile source
| code to byte code. In some implementations this byte code
| can be JIT compiled (ABCL, CLISP, ...).
|
| The first Lisp provided a Lisp to assembly compiler,
| which compiled Lisp code to assembly code, which then
| gets compiled to machine code. That machine code could be
| loaded into Lisp and functions then could be native
| machine code.
|
| The Newton Toolkit could compile type declared functions
| to machine code. That's something most Common Lisp
| compilers do, sometimes by default (SBCL, CCL, ... by
| default directly compile source code to machine code).
|
| SBCL: * (defun add (a b) (declare
| (fixnum a b) (optimize (speed 3))) (+ a b)) ADD
| * (disassemble #'add) ; disassembly for ADD
| ; Size: 104 bytes. Origin: #x7006E1789C
| ; ADD ; 89C: 0000018B ADD NL0, NL0,
| NL1 ; 8A0: 0A0000AB ADDS R0, NL0,
| NL0 ; 8A4: E7010054 BVC L1
| ; 8A8: BD2A00B9 STR WNULL, [THREAD, #40]
| ; pseudo-atomic-bits ; 8AC: BC7A47A9
| LDP TMP, LR, [THREAD, #112] ; mixed-tlab.{free-
| pointer, end-addr} ; 8B0: 8A430091
| ADD R0, TMP, #16 ; 8B4: 5F011EEB
| CMP R0, LR ; 8B8: E8010054 BHI L2
| ; 8BC: AA3A00F9 STR R0, [THREAD, #112]
| ; mixed-tlab ; 8C0: L0: 8A3F0091 ADD
| R0, TMP, #15 ; 8C4: 3E2280D2 MOVZ
| LR, #273 ; 8C8: 9E0300A9 STP LR,
| NL0, [TMP] ; 8CC: BF3A03D5 DMB
| ISHST ; 8D0: BF2A00B9 STR WZR,
| [THREAD, #40] ; pseudo-atomic-bits ;
| 8D4: BE2E40B9 LDR WLR, [THREAD, #44]
| ; pseudo-atomic-bits ; 8D8: 5E0000B4
| CBZ LR, L1 ; 8DC: 200120D4 BRK #9
| ; Pending interrupt trap ; 8E0: L1: FB031AAA
| MOV CSP, CFP ; 8E4: 5A7B40A9 LDP
| CFP, LR, [CFP] ; 8E8: BF0300F1 CMP
| NULL, #0 ; 8EC: C0035FD6 RET
| ; 8F0: E00120D4 BRK #15
| ; Invalid argument count trap ; 8F4: L2:
| 1C0280D2 MOVZ TMP, #16 ; 8F8:
| 0AFBFF58 LDR R0, #x7006E17858 ; SB-
| VM::ALLOC-TRAMP ; 8FC: 40013FD6 BLR
| R0 ; 900: F0FFFF17 B L0 NIL
|
| I've entered a function and it gets ahead of time
| compiled to non-generic machine code.
|
| Calling the function ADD with the wrong numeric arguments
| is an error, which will be detected both a compile and at
| runtime. * (add 3.0 2.0)
| debugger invoked on a TYPE-ERROR @7006E17898 in thread
| #<THREAD "main thread" RUNNING {70088224A3}>:
| The value 3.0 is not of type
| FIXNUM when binding A
|
| Redefinition of + will do nothing to the code. The
| addition is inlined machine code.
| lispm wrote:
| JIT compilation is rare in Common Lisp. I wouldn't think
| that Dylan implementations used JIT compilation.
|
| Apple's Dylan IDE and compiler was implemented in Macintosh
| Common Lisp (MCL). MCL then was not a part of the Dylan
| runtime.
|
| I would think that Open Dylan (the Dylan implementation
| originally from Harlequin) can also generate LLVM bitcode,
| but I don't know if that one can be JIT executed.
| Possibly...
| kragen wrote:
| are there any cl implementations that use jit? there are
| a lot of cl implementations so i assumed there must be
| one
| lispm wrote:
| ABCL runs on the JVM. It generates JVM byte code, which
| then can be JIT compiled by the JVM.
|
| CLISP has a byte code machine, for which a JIT can be
| used.
|
| There might be others.
| cube2222 wrote:
| > ahead-of-time compilation is even better known for
| improving performance
|
| Not necessarily, not for dynamic languages.
|
| With very dynamic languages you can make only very limited
| assumptions about e.g. function argument types, which lead
| you to compiled functions that have to handle any possible
| case.
|
| A JIT compiler can notice that the given function is almost
| always (or always) used to operate on a pair of integers, and
| do a vastly superior specialized compilation, with guards to
| fallback on the generic one. With extensive inlining, you can
| also deduplicate a lot of the guards.
| kragen wrote:
| yes, that is true. but aot compilers never make things
| _slower_ than interpretation, and they can afford more
| expensive optimizations
|
| also, even mature jit compilers often only make limited
| improvements; jython has been stuck at near-parity with
| cpython's terrible performance for decades, for example,
| and while v8 was an enormous improvement over old
| spidermonkey and squirrelfish, after 15 years it's still
| stuck almost an order of magnitude slower than c
| https://benchmarksgame-
| team.pages.debian.net/benchmarksgame/... which is
| (handwaving) like maybe a factor of 2 or 3 slower than self
|
| typically when i can get something to work using numpy it's
| only about a factor of 5 slower than optimized c, purely
| interpretively, which is competitive with v8 in many cases.
| luajit, by contrast, is goddam alien technology from the
| future
|
| with respect to your intxint example, if an intxint
| specialization is actually vastly superior, for example
| because the operation you're applying is something like +
| or *, an aot compiler can _also_ insert the guard and
| inline the single-instruction implementation, and it can
| _also_ do extensive inlining and even specialization
| (though that 's rare in aots and common in jits). it can
| insert the guards because if your monomorphic sends of +
| are always sending + to a rational instance or something,
| the performance gain from eliminating megamorphic dispatch
| is comparatively slight, and the performance _loss_ from
| inserting a static hardcoded guess of integer math before
| the megamorphic dispatch is also comparatively slight,
| though nonzero
|
| this can fall down, of course, when your arithmetic
| operations are polymorphic over integer and floating-point,
| or over different types of integers; but it often works far
| better than it has any right to. in most code, most
| arithmetic and ordered comparison is integers, most array
| indexing is arrays, most conditionals are on booleans (and
| smalltalk actually hardcodes that in its bytecode
| compiler). this depends somewhat on your language design,
| of course; python using the same operator for indexing
| dicts, lists, and even strings hurts it here
|
| meanwhile, back in the stop-hitting-yourself-why-are-you-
| hitting-yourself department, fucking cpython is allocating
| its integers on the heap and motherfucking reference-
| counting them
| ptx wrote:
| There is already an AOT compiler for Python: Nuitka[0].
| But I don't think it's much faster.
|
| And then there is mypyc[1] which uses mypy's static type
| annotations but is only slightly faster.
|
| And various other compilers like Numba and Cython that
| work with specialized dialects of Python to achieve
| better results, but then it's not quite Python anymore.
|
| [0] https://nuitka.net/
|
| [1] https://github.com/python/mypy/tree/master/mypyc
| kragen wrote:
| thanks, i'd forgotten about nuitka and didn't know about
| mypyc!
| actionfromafar wrote:
| Check out:
|
| https://shedskin.github.io/
|
| Python to C++ translation
| ngrilly wrote:
| I so much agree with your comment on memory allocation.
| Everybody is focusing on JIT, but allocating everything
| on the heap, with no possibility to pack multiple values
| contiguously in a struct or array, will still be a
| problem for performance.
| vanderZwan wrote:
| > _fucking cpython is allocating its integers on the heap
| and motherfucking reference-counting them_
|
| And here I thought that it was shocking to learn that v8
| allocates doubles on the heap recently. (I mean, I'm not
| a compiler writer, I have no idea how hard it would be to
| avoid this, but it feels like mandatory boxed floats
| would hurt performance a lot)
| kragen wrote:
| nanboxing as used in spidermonkey
| (https://piotrduperas.com/posts/nan-boxing) is a possible
| alternative, but i think v8 works pretty hard to not use
| floats, and i don't think local-variable or temporary
| floats end up on the heap in v8 the way they do in
| cpython. i'm not that familiar with v8 tho (but i'm
| pretty sure it doesn't refcount things)
| vanderZwan wrote:
| > _i think v8 works pretty hard to not use floats_
|
| Correct, to the point where at work a colleague and I
| actually have looked into how to force using floats even
| if we initiate objects with a small-integer number (the
| idea being that ensuring our objects having the correct
| hidden class the first time might help the JIT, and
| avoids wasting time on integer-to-float promotion in
| tight loops). Via trial and error in Node we figured that
| using -0 as a number literal works, but (say) 1.0 does
| not.
|
| > _i don 't think local-variable or temporary floats end
| up on the heap in v8 the way they do in cpython_
|
| This would also make sense - v8 already uses pools to re-
| use common temporary object shapes in general IIRC, I see
| no reason why it wouldn't do at least that with heap-
| allocated doubles too.
| kragen wrote:
| so then the remaining performance-critical case is where
| you have a big array of floats you're looping over. in
| firefox that works fine (one allocation per lowest-level
| array, not one allocation and unprefetchable pointer
| dereference per float), but maybe in chrome you'd want to
| use a typedarray?
| vanderZwan wrote:
| Maybe, at that point it is basically similar to the
| struct-of-arrays vs array-of-structs trade-off, except
| with significantly worse ergonomics and less pay-off.
| IainIreland wrote:
| As I understand it, V8 keeps track of an ElementsKind for
| each array (or, more precisely, for the elements of every
| object; arrays are not special in this sense). If an
| array only contains floats, then they will all be stored
| unboxed and inline. See here: https://source.chromium.org
| /chromium/chromium/src/+/main:v8/...
|
| I assume that integers are coerced to floats in this
| mode, and that there's a performance cliff if you store a
| non-number in such an array, but in both cases I'm just
| guessing.
|
| In SpiderMonkey, as you say, we store all our values as
| doubles, and disguise the non-float values as NaNs.
| kragen wrote:
| thank you for the correction!
| pjmlp wrote:
| It is a very big deal, as it will finally shift the mentality
| regarding:
|
| - "C/C++/Fortran libs are Python"
|
| - "Python is too dynamic", while disregarding Smalltalk, Common
| Lisp, Dylan, SELF, NewtonScript JIT capabilities, all dynamic
| languages where anything can change at any given moment
| japanman185 wrote:
| Disregarding the fact that python is an awful programming
| language for anthing other than jupyter notebooks
| pjmlp wrote:
| Another one that hasn't seen UNIX scripting in shell
| languages or Perl, Apache modules, before Python came to
| be.
| 0x457 wrote:
| I wrote tons of perl in my life. I would rather keep
| writing perl than touching python. Every time I see a
| nice utility and see that it's written in python - tab
| closed.
| BossingAround wrote:
| Facts are objective; "Python is awful" is your opinion.
| moffkalast wrote:
| Ah I'd say the exact opposite, python in general is pretty
| good but jupyter sucks because the syntax isn't compatible
| with regular python and I avoid it like the plague.
| smabie wrote:
| What does a jupyter notebook have to do with python
| syntax?
| moffkalast wrote:
| Take the code you find in an average notebook, copy it to
| a .py text file, run it with python. Does it run? In my
| experience the answer is usually 'no' because of some
| extra-ass syntax sugar jupyter has that doesn't exist in
| python.
| cqqxo4zV46cp wrote:
| This comment is really just bordering on a rule violation
| and doesn't add to the conversation at all.
| pdimitar wrote:
| What do you mean by "it will shift the mentality"? There is
| no magical JIT that will ever make e.g. the data science
| Python & C++ amalgamations slower than a pure Python. Likely
| never happening, too.
|
| Also no mentality shift is expected on the "Python is too
| dynamic" -- which is a strange thing to say anyway -- because
| Python is not getting any more static due to these JIT news.
| pjmlp wrote:
| Python with JIT is faster than Python without JIT.
|
| Having a Python with JIT, in many cases it will be fast
| enough for most cases.
|
| Data science running CUDA workloads isn't the only use case
| for Python.
| eesmith wrote:
| I think Python without a JIT in many cases is already
| fast enough for most cases.
|
| I don't do data science.
| pjmlp wrote:
| Sure, for UNIX scripting, for everything else it is
| plainfully slow.
|
| I know Python since version 1.6, and is my scripting
| language in UNIX like environments, during my time at
| CERN, I was one of the CMT build infrastructure build
| engineer on the ATLAS team.
|
| It was never been the language I would reach for when not
| doing OS scripting, and usually when a GNU/Linux GUI
| application happens to be slow as mollasses, it has been
| written in Python.
| andrewaylett wrote:
| A Python web service my team maintains, running at a
| higher request rate and with lower CPU and RAM
| requirements than most of the Java services I see around
| us, would like a word with you.
| pjmlp wrote:
| I guess those Java developers really aren't.
| IggleSniggle wrote:
| How many requests per second are we talking, ballpark,
| and what's the workload?
| andrewaylett wrote:
| ~5k requests/second for the Python service, we tend to go
| for small instances for redundancy so that's across a few
| dozen nodes. The workload comparison is unfair to the
| Java service, if I'm honest :). But we're running Python
| on single vCPU containers with 2G RAM, and the Java
| service instances are a _lot_ larger than that.
|
| Flask, gunicorn, low single digit millisecond latency.
| Definitely optimised for latency over throughput, but not
| so much that we've replatformed it onto something that's
| actually designed for low latency :P. Callers all cache
| heavily with a fairly high hit ratio for interactive
| callers and a relatively low hit ratio for batch callers.
| sirsinsalot wrote:
| My teams deploy Python web APIs and yes, it is slow
| compared to other languages and runtimes.
|
| But on the whole, machines are cheaper than other
| engineering approaches to scaling.
|
| For us, and many others, fast enough is fast enough.
| eesmith wrote:
| There's a lot of Django going on in the world.
|
| _shrug_. If we 're talking personal experience, I've
| been using Python since 1.4. It's been my primary
| development language since the late 1990s, with of course
| speed critical portions in C or C++ when needed - and I
| know a lot of people who also primarily develop in
| Python.
|
| And there's a bunch of Python development at CERN for
| tasks other than OS scripting. ("The ease of use and a
| very low learning curve makes Python a perfect
| programming language for many physicists and other people
| without the computer science background. CERN does not
| only produce large amounts of data. The interesting bits
| of data have to be stored, analyzed, shared and
| published. Work of many scientists across various
| research facilities around the world has to be
| synchronized. This is the area where Python flourishes" -
| https://cds.cern.ch/record/2274794)
|
| I simply don't see how a Python JIT is going to make that
| much of a difference. We already have PyPy for those
| needing pure Python performance, and Numba for certain
| types of numeric needs.
|
| PyPy's experience shows we'll not be expecting a 5x boost
| any time soon from this new JIT framework, while
| C/C++/Fortran/Rust are significantly faster.
| pjmlp wrote:
| > There's a lot of Django going on in the world.
|
| Unfortunely.
|
| > And there's a bunch of Python development at CERN for
| tasks other than OS scripting
|
| Of course there is, CMT was a build tool, not OS
| scripting.
|
| No need to give me CERN links to me to show me Python
| bindings to ROOT, or Jupyter notebooks.
|
| > PyPy's experience shows we'll not be expecting a 5x
| boost any time soon from this new JIT framework, while
| C/C++/Fortran/Rust are significantly faster.
|
| I really don't get the attitude that if it doesn't 100%
| fix all the world problems, then it isn't worth it.
| eesmith wrote:
| The link wasn't for you - the link was for other HN users
| who might look at your mention of your use at CERN and
| mistakenly assume it was a more widespread viewpoint
| there.
|
| > I really don't get the attitude that if it doesn't 100%
| fix all the world problems, then it isn't worth it.
|
| Then it's a good thing I'm not making that argument, but
| rather that "Having a Python with JIT, in many cases it
| will be fast enough for most cases." has very little
| information content, because Python without a JIT already
| meets the consequent.
| formerly_proven wrote:
| I really wouldn't mind Python being faster than it is and
| I really didn't mind at all getting an practically free
| ~30% performance increase just by updating to 3.11.
| There's tons of applications which just passively benefit
| from these optimizations. Sure, you might argue "but you
| shouldn't have written that parser or that UI handling a
| couple thousand items in Python" but lots of people do
| and did just that.
| eesmith wrote:
| I wouldn't mind either.
|
| Do you agree with me that Python is already fast enough
| for most cases, even without a JIT?
|
| If not, how would a 30% boost improve things enough to
| change the balance?
| superlopuh wrote:
| I'm fairly certain that this is false, and am working on
| proving it. In the cases that Numba is optimised for it's
| already faster than plausible C++ implementations of the
| same kernels.
|
| https://stackoverflow.com/questions/36526708/comparing-
| pytho...
| pas wrote:
| it's not faster, it's about as fast as C++ compiled with
| O3 optimizations. which is great and also much more
| likely to be true.
| eigenspace wrote:
| Numba is basically another language embedded in Python.
| It (sometimes severely) modifies the semantics of code.
| fnord123 wrote:
| > it will finally shift the mentality regarding
| "C/C++/Fortran libs are Python"
|
| But pjmlp, I use Python because it's a wrapper for
| C/C++/Fortran libs. - Chocolate Giddyup
| pjmlp wrote:
| Just like Tcl happens to be.
| astrolx wrote:
| I can dig it!
| lispm wrote:
| By default any code loaded into something like SBCL gets AOT
| compiled.
|
| In Common Lisp not anything can change at any moment.
| Especially not in implementations where one uses AOT
| compilation like SBCL, ECL, LispWorks, Allegro CL, ... and so
| on. They have optimizing compilers which gradually can remove
| dynamic runtime behavior, upto supporting almost no dynamic
| runtime behavior.
|
| Stuff which is supported: type specific code, inlining, block
| compilation, removal of development tools, ...
|
| JIT implementations are rare in the Common Lisp world. They
| are mostly only used in implementations which use a byte-code
| virtual machine (CLISP, ABCL, ...). Common Lisp
| implementations mostly compile either directly to native code
| or via C compilers. The effect is that native AOT compiled
| code is much faster.
| adonese wrote:
| is it any different or comparable to numba or pyjion? Not
| following python closely in recent years but I recount those
| two projects with huge potential
| drbaba wrote:
| I don't know Pyjion, but I have used Numba for real work.
| It's a great package and can lead to massive speed-ups.
|
| However, last time I used it, it (1) didn't work with many
| third-party libraries (e.g. SciPy was important for me), and
| (2) didn't work with object-oriented code (all your @njit
| code had to be wrapped in functions without classes). Those
| two has limited for which projects I could adopt Numba in
| practice, despite loving it in the cases it worked.
|
| I don't know what limitations the built-in Python JIT has,
| but hopefully it might be a more general JIT that works for
| _all_ Python code.
| benrutter wrote:
| This is so true!
|
| A JIT compiler is a big deal for performance improvements,
| especially where it matters (in large repetitive loops).
|
| Anyone cynical about the potential a python JIT offers should
| take a look at pypy which has a 5x speed up over regular
| python, mainly though JIT operations: https://www.pypy.org/
| crabbone wrote:
| I don't see this as an enhancement.
|
| Not pursuing JIT or efficient compilation in general was a
| deliberate decision way back when Python made some kind of
| sense. It was the simplicity of implementation valued over
| performance gains that motivated this decision.
|
| The mantra Python programmers liked to repeat was that "the
| performance is good enough, and if you want to go fast, write
| in C and make a native module".
|
| And if you didn't like that, there was always Java.
|
| Today, Python is getting closer and closer to be "the crappy
| Java with worse syntax". Except we already have that: it's
| called Groovy.
| frakt0x90 wrote:
| What are you talking about? From what I can read here there
| is no syntax change. Just a framework for faster execution.
| Plus, Python's usecase has HEAVILY evolved over the last few
| years since it's now the defacto language for machine
| learning. It's great that the core devs are keeping up with
| the time.
|
| The language is definitely getting more complex
| syntactically, and I'm not a huge fan of some of those
| changes but it's no where near Java or C++ or anything else.
| You can still write simple Python with all of these changes.
| crabbone wrote:
| > What are you talking about?
|
| Read it again. It seems you were reading too fast. I'm
| talking about the future, not the change being discussed
| right now.
|
| > It's great that the core devs are keeping up with the
| time.
|
| You mistake the influence of Microsoft and their desire to
| sell features for progress. Python is actually regressing
| as a system. It's becoming worse, not better. But it's hard
| to see the gestalt of it if all you are looking for is the
| new features.
|
| > it's no where near Java
|
| That is true. Java is a much more simple and regular (not
| in the automata theory sense) language. Today, if you want
| a simpler language, you need to choose Java over Python
| (although neither is very simple, so, preferably, you need
| a third option).
|
| > You can still write simple Python
|
| I can also write simple C++ if I limit what I use from the
| language to a very small subset. This says nothing about
| the simplicity of the language...
| wrd83 wrote:
| Honestly I don't understand the pessimistic view here. I think
| every release since Microsoft started funding python has
| increased high single digit best case performance.
|
| Rather than focussing on the raw number compare to python 3.5 or
| so. It's still getting significantly faster.
|
| If they keep doing this steady pace they are slowly saving the
| planet!
| epcoa wrote:
| Sorry, but reality bites
| https://en.wikipedia.org/wiki/Amdahl%27s_law
| vanderZwan wrote:
| It's not that simple.
|
| Amdahl's Law is about expected speedup/decrease in latency.
| That actually isn't strongly correlated to "saving the
| planet" afaik (where I interpret that as reducing direct
| energy usage, as well as embodied energy usage by reducing
| the need to upgrade hardware).
|
| If anything, increasing speed and/or decreasing latency of
| the whole system often involves adding some form of
| parallelism, which brings extra overhead and requires extra
| hardware. Note that prefetching/speculative execution kind of
| counts here as well, since that is essentially doing
| potentially wasted work in parallel. In the past boosting the
| clock rate the CPU was also a thing until thermodynamics said
| no.
|
| OTOH, letting your CPU go to sleep faster should save energy,
| so repeated single-digit perf improvements via wasting less
| instructions does matter.
|
| But then again, that could lead to Jevons Paradox (the
| situation where increasing the efficiency encourages more
| wasteful than the increase in efficiency saves - Wirth's Law
| but generalized and older, basically).
|
| So I'd say there's too many interconnected dynamics at play
| to really simply state "optimization good" or "optimization
| useless". I'm erring on the side of "faster Python probably
| good".
|
| [0] https://en.wikipedia.org/wiki/Jevons_paradox
| oblio wrote:
| When did Microsoft start funding Python?
|
| Also, such a shame that it takes sooo long for crucial open
| source to be funded properly. Kudos to Microsoft for doing it,
| shame on everyone else for not pitching in sooner.
|
| FYI Python was launched 32 years ago, Python 2 was released 24
| years ago and Python 3 was released 16 years ago.
| systems wrote:
| I think the pessimism really comes from a dislike for Python
|
| While very very very popular, Python is i think is very
| disliked languages, it doesnt have or it is not built around
| the current programming language features that programmers
| like, its not functional or immutable by default, its not fast,
| the tooling is complex, it uses indentation for code blocks
| (this feature was cool in the 90s, but dreaded since at least
| 2010)
|
| so i guess if python become fasters, this will ensure its
| continued dominance, and all those hoping that one day it will
| be replace by a nicer , faster language are disappointed
|
| this pessimism is the aching voice of the developers who were
| hoping for a big python replacement
| dbrueck wrote:
| To each his own, but the things you list are largely
| subjective/inaccurate, and there are many, many, many
| developers who use Python because they enjoy it and like it a
| lot.
| systems wrote:
| Python is a very widely used language, and like any popular
| thing, yes many many many like it , and many many many
| dislike it .. it is that big, python can be disliked by a
| million developer and still be a lot more liked than
| disliked
|
| but i also think that its true that python is not and have
| not been for a while considered as a modern or technically
| advanced language
|
| the hype currently is for typed or gradually typed
| languages, functional languages, immutable data , system
| languages, type safe language, language with advanced
| parallelism and concurrency support etc ..
|
| python is old , boring OOP, if you like it, than like
| millions of developers you are not picky about programming
| language, you use what works, what pays
|
| but for devs passionate about programming languages, python
| is a relic they hope vanish
| dbrueck wrote:
| > devs passionate about programming languages, python is
| a relic they hope vanish
|
| Statements like this are obviously untrue for large
| numbers of people, so I'm not sure of the point you're
| trying to make.
|
| But certainly it's true that there are both objective and
| subjective reasons for using a particular tool, so I hope
| you are in a position to use the tools that you prefer
| the most. Have a great day!
| dataangel wrote:
| > the hype currently is for typed or gradually typed
| languages
|
| So Python with mypy
| Sohcahtoa82 wrote:
| > but for devs passionate about programming languages,
| python is a relic they hope vanish
|
| If you asked me what language I would consider to be a
| relic that I hope would vanish, I'd go with Perl.
| zestyping wrote:
| Python is designed to be "boring" (in other words,
| straightforward and easy to understand). It is admittedly
| less so, now that it has gained many features since the
| 2.x days, but it is still part of its pedigree that it is
| supposed to be teachable as a beginner language.
|
| It is still the only beginner language that is also an
| industrial-strength production language. You can learn
| Python as your first language and also make an entire
| career out of it. That can't really be said about the
| currently "hyped" languages, even though those are very
| fun and cool and interesting!
| dataangel wrote:
| > (this feature was cool in the 90s, but dreaded since at
| least 2010)
|
| LOL this is a dead giveaway you haven't been around long.
| There have been people kvetching about the whitespace since
| the beginning. Haskell went on to be the next big thing for
| reddit/HN/etc for years and it also uses whitespace.
| klyrs wrote:
| Julia is my source of pessimism. Julia is super fast once it's
| warmed up, but before it gets there, it's painfully slow. They
| seem to be making progress on this, but it's been gradual. I
| understand that Java had similar growing pains, but it's better
| now. Combined with the boondoggle of py3, I'm worried for the
| future of my beloved language as it enters another phase of
| transformation.
| leephillips wrote:
| Would you say of the latest release (v1.10) that Julia is
| painfully slow until it gets "warmed up"? If so, what exactly
| does this mean?
| klyrs wrote:
| I'm not that up to date on the language, it's been a few
| years since I did anything nontrivial with it because the
| experience was so poor. And while that might not seem fair
| to Julia, it's my honest experience: my concern isn't a
| pissing match between Julia and the world, it's that bad
| JIT experience is a huge turnoff and I'm worried about
| Python's future as it goes down this road.
| leephillips wrote:
| There has been so much progress in Julia's startup
| performance in the past "few years" that someone's
| qualitative impressions from several major releases
| before the current one are of limited relevance.
| klyrs wrote:
| You're making this about Julia despite my repeated
| statements to the contrary. Please reread what I've
| written, you aren't responding to the actual point I've
| made twice now. A reminder: I'm talking specifically
| about my outlook on the future of Python, vis a vis my
| historical experience with how other JIT languages have
| developed.
|
| If you wanted to rebut this, you'd need to argue that
| Julia has always been awesome and that my experience with
| a slow warmup was atypical. But that would be a lie,
| right?
|
| And, subtext: when I wrote my first commebt in this
| thread, its highest sibling led with
|
| > I think the pessimism really comes from a dislike for
| Python
|
| So I weighed in as a Python lover who is pessimistic for
| reasons other than a bias against the language.
| leephillips wrote:
| > I'm talking specifically about my outlook on the future
| of Python, vis a vis my historical experience with how
| other JIT languages have developed.
|
| But your assessment of the other language you mentioned
| is several years out of date and made largely irrelevant
| by the fast pace of progress. Therefore your conclusions
| about the probable future of Python, which may be
| correct, nevertheless do not follow.
| vegesm wrote:
| Because it only increases high single digit each release. If
| they keep up the 10% improvement for the next 10 release, we
| will reach a speedup of around 2.5 times. That's very small,
| considering how Python is like 10-20 times slower than JS (not
| even talking about C or Java like speeds).
| pjmlp wrote:
| Finally!
|
| Regardless of the work being done in PyPy, Jython, GraalPy and
| IronPython, having a JIT in CPython seems to be the only way
| beyond "C/C++/Fortran libs are Python" mindset.
|
| Looking forward to its evolution, from 3.13 onwards.
| dlahoda wrote:
| rust libs recently. pydantic for example
| Iridescent_ wrote:
| Wasn't CPython supposed to remain very simple in its codebase,
| with the heavy optimization left for other implementations to
| tackle? I seem to remember hearing as much a few years back.
| toyg wrote:
| That was the original idea, when Python started attracting
| interest from big corporations. It has however become clear
| that maintaining alternative implementations is very difficult
| and resource-intensive; and if you have to maintain
| compatibility with the wider ecosystem anyway (because that's
| what users want), you might as well work with upstream to find
| solutions that work for everyone.
| lifthrasiir wrote:
| The copy-and-patch approach was explicitly chosen in order to
| minimize additional impacts on non-JIT-specific code base.
| hangonhn wrote:
| Does Python even have a language specification? I've been told
| that CPython IS the specification. I don't know if this is
| still true. In the Java world there is a specification and a
| set of tests to test for conformation so it's easier to have
| alternative implementations of the JVM. If what I said is
| correct, then I can see how the optimized alternative
| implementation idea is less likely to happen.
| oskarkk wrote:
| Well, for Python the language reference in the docs[0] is the
| specification, and many things there are described as CPython
| implementation details. Like: "CPython implementation detail:
| For CPython, id(x) is the memory address where x is stored."
| And as another example, dicts remembering insertion order was
| CPython's implementation detail in 3.6, but from 3.7 it's
| part of the language.
|
| [0] https://docs.python.org/3/reference/index.html
| recursivecaveat wrote:
| There is a pretty detailed reference that distinguishes
| between cpython implementation details and language features
| at least. There was a jvm python implementation even. The
| problem is more that a lot of the libraries that everyone
| wants to use are very dependent on cpython's ffi which bleeds
| a lot of internals.
| ynik wrote:
| The problem is that: * CPython is slow, making extension
| modules written in C(++) very attractive * The CPython
| extension API exposes many implementation details * Making use
| of those implementation details helps those extension modules
| be even faster
|
| This resulted in a situation where the ecosystem is locked-in
| to those implementation details: CPython can't change many
| aspects of its own implementation without breaking the
| ecosystem; and other implementations are forced to introduce
| complex and slow emulation layers if they want to be compatible
| with existing CPython extension modules.
|
| The end result is that alternative implementations are not
| viable in practice, as most existing libraries don't work
| without their CPython extension modules -- users of alternative
| implementations are essentially stuck in their own tiny
| ecosystem and cannot make use of the large existing (C)Python
| ecosystem.
|
| CPython at least is in a position where they can push a
| breaking change to the extension API and most libraries will be
| forced to adapt. But there's very little incentive for library
| authors to add separate code paths for other Python
| implementations, so I don't think other implementations can
| become viable until CPython cleans up their API.
| bastawhiz wrote:
| PyPy was released 17 years ago
|
| Jython was released 22 years ago
|
| IronPython was released 17 years ago
|
| To date, no Python implementation has managed to hit all three:
|
| 1. Stay compatible with any recent, modern CPython version
|
| 2. Maintain performance for general-purpose usage (it's fast
| enough without a warmup, and doesn't need to be heavily
| parallelized to see a performance benefit)
|
| 3. Stayed alive
|
| Which, frankly, is kind of a shame. But the truth of the matter
| is that it was a high bar to hit in the first place, and even
| PyPy (which arguably had the biggest advantages: interest,
| mindshare, compatibility, meaningful wins) managed to barely
| crack a fraction of a percent of Python market share.
|
| If you bet on other implementations being the source of
| performance wins, you're betting on something which essentially
| doesn't exist at this point.
| vanderZwan wrote:
| I think it's really cool that Haoran Xu and Fredrik Kjolstad's
| copy-and-patch technique[0] is catching on, I remember
| discovering it through Xu's blog posts about his LuaJIT remake
| project[1][2], where he intends to apply these techniques to Lua
| (and I probably found those through a post here). I was just
| blown away by how they "recycled" all these battle-tested
| techniques and technologies, and used it to synthesize something
| novel. I'm not a compiler writer but it felt really clever to me.
|
| I highly recommend the blog posts if you're into learning how
| languages are implemented, by the way. They're incredible deep
| dives, but he uses the details-element to keep the metaphorical
| descents into Mariana Trench optional so it doesn't get too
| overwhelming.
|
| I even had the privilege of congratulating him the 1000th star of
| the GH repo[3], where he reassured me and others that he's still
| working on it despite the long pause after the last blog post,
| and that this mainly has to do with behind-the-scenes rewrites
| that make no sense to publish in part.
|
| [0] https://arxiv.org/abs/2011.13127
|
| [1] https://sillycross.github.io/2022/11/22/2022-11-22/
|
| [2] https://sillycross.github.io/2023/05/12/2023-05-12/
|
| [3] https://github.com/luajit-remake/luajit-remake/issues/11
| checker659 wrote:
| M. Anton Ertl and David Gregg. 2004. Retargeting JIT Compilers
| by using C-Compiler Generated Executable Code. In Proceedings
| of the 13th International Conference on Parallel Architectures
| and Compilation Techniques (PACT '04). IEEE Computer Society,
| USA, 41-50.
|
| https://dl.acm.org/doi/10.5555/1025127.1025995
| vanderZwan wrote:
| Anton Ertl! <3
|
| Context: I've been on a concatenative language binge
| recently, and his work on Forth is awesome. In my defense he
| doesn't seem to list this paper among his publications[0].
| Will give this paper a read, thanks for linking it! :)
|
| If they missed the boat on getting credit for their
| contributions then at least the approach finally starts to
| catch on I guess?
|
| (I wonder if he got the idea from his work on optimizing
| Forth somehow?)
|
| [0] https://informatics.tuwien.ac.at/people/anton-ertl
| lifthrasiir wrote:
| While bears a significant resemblance, Ertl and Gregg's
| approach is not automatic and every additional architecture
| requires a significant understanding of the target
| architecture---including an ability to ensure that fully
| relocable code can be generated and extracted. In comparison,
| the copy-and-patch approach can be thought as a simple
| dynamic linker, and objects generated by unmodified C
| compilers are far more predictable and need much less
| architecture-specific information for linking.
| vanderZwan wrote:
| Does Ertl and Gregg's approach have any "upsides" over
| copy-and-patch? Or is it a case of _just_ missing those one
| or two insights (or technologies) that make the whole thing
| a lot simpler to implement?
| lifthrasiir wrote:
| I think so, but I can't say this any more confident until
| I get an actual copy of their paper (I used other review
| papers to get the main idea instead).
| pierrebai wrote:
| The copy-and-patch also assumes the compiler will generate
| patchable code. For example, on some architecture, have a
| zero operand might have a smaller or different opcode
| compared to a more general operand. Same issue for relative
| jumps or offset ranges. It seems the main difference is
| that the patch approach also patches jumps to absolute
| addresses instead of requiring instruction-counter relative
| code.
| naasking wrote:
| Full copy of the paper: https://www2.cs.arizona.edu/~collbe
| rg/Teaching/553/2011/Reso...
|
| There's also this which seems to use the same technique:
|
| Templates-based portable just-in-time compiler,
| https://dl.acm.org/doi/abs/10.1145/944579.944588
|
| Nice to see there's still room for innovation in the VM
| space!
| giancarlostoro wrote:
| Reminds me of David K who is local to me in Florida, or was,
| last I spoke to him. He has been a Finite State Machine
| advocate for ages, and its a well known concept, but you'd be
| surprised how useful they can be. He pushes it for front-end a
| lot, and even implemented a Tic Tac Toe sample using it.
|
| https://twitter.com/DavidKPiano
| bonzini wrote:
| Copy and patch is a variant of QEMU's original "dyngen" backend
| by Fabrice Bellard[1][2], with more help from the compiler to
| avoid the maintainability issues that ultimately led QEMU to
| use a custom code generator.
|
| [1]
| https://www.usenix.org/legacy/event/usenix05/tech/freenix/fu...
|
| [2]
| https://review.gerrithub.io/plugins/gitiles/spdk/qemu/+/5a24...
| twbarr wrote:
| Ultimately, most good ideas were first implemented by Fabrice
| Bellard.
| KRAKRISMOTT wrote:
| Copy and patch goes all the way back to Grace Hopper's
| original compiler implementation
| darknavi wrote:
| I am happy to see him working on QuickJS in the last month
| or so. It could really use some ES2023 love!
| matheusmoreira wrote:
| Thanks a lot!! I'm something of a beginner language developer
| and I've been collecting papers, articles, blog posts, anything
| that provides accessible, high level description of these
| optimization techniques.
| mg wrote:
| I love Python and use it for everything other than web
| development.
|
| One reason is performance. So if Python has a faster future ahead
| of it: Hurray!
|
| The other reason is that the Python ecosystem moved away from
| stateless requests like CGI or mod_php use and now is completely
| set on long running processes.
|
| Does this still mean you have to restart your local web
| application after any change you made to it? I heard that some
| developers automate that, so that everytime they save a file, the
| web application is restarted. That seems pretty expensive in
| terms of resource consumption. And complex as you would have to
| run some kind of watcher process which handles watching your
| files and restarting the application?
| onetoo wrote:
| The restart isn't expensive in absolute terms, on a human level
| it's practically instant. You would only do this during
| development, hopefully your local machine isn't the production
| environment.
|
| It's also very easy, often just adding a CLI flag to your local
| run command.
|
| edit: Regarding performance, Python today can _easily_ handle
| at least 1k requests per second. The vast vast vast majority of
| web applications today don 't need anywhere near that kind of
| performance.
| ubercore wrote:
| Been working with python for the web for over a decade. This
| is basically a solved issue, and the performance is a non-
| issue day to day.
| mg wrote:
| The thing is, I don't run my applications locally with a
| "local run command".
|
| I prefer to have a local system set up just like the
| production server, but in a container.
|
| Maybe using WSGI with MaxConnectionsPerChild=1 could be a
| solution? But that would start a new (for example) Django
| instance for every request. Not sure how fast Django starts.
|
| Another option might be to send a HUP signal to Apache:
| apachectl -k restart
|
| That will only kill the worker threats. And when there are
| none (because another file save triggered it already), this
| operation might be almost free in terms of resource usage.
| This also would require WSGI or similar. Not sure if that is
| the standard approach for Django+Apache.
| onetoo wrote:
| I would still recommend running it properly locally, but
| whatever. Pseudo-devcontainer it is. I assume the code is
| properly volume mounted.
|
| In production, you would want to run your app through
| gunicorn/uvicorn/whatever on an internal-only port, and
| reverse-proxy to it with a public-facing apache or similar.
|
| Set up apache to reverse proxy like you would on prod, and
| run gunicorn/uvicorn l/whatever like you would on prod,
| except you also add the autoreload flag. E.g.
| uvicorn main:app --host 0.0.0.0 --port 12345 --reload
|
| If production uses containers, you should keep the python
| image slim and simple, including only gunicorn/uvicorn and
| have the reverse proxy in another container. Etc.
| traverseda wrote:
| Is the problem you're having that you feel the need to
| expose a WSGI/ASGI interface instead of just a reverse
| proxy? Take a look at gunicorn, and for static files server
| you can use whitenoise.
|
| With those two you can just stand up an python program in a
| container that serves html, and put it behind whatever
| reverse proxy you want.
| edgyquant wrote:
| I hate this argument that "most web apps don't need that kind
| of performance." For one thing, with responsive apps that are
| the norm it wouldn't be surprising for a session to begin
| with multiple requests or to even have multiple requests per
| second. At that point all it takes is a few hundred active
| users to hit that 1k limit.
|
| But even leaving that aside, you never know when your
| application will be linked somewhere or go semi-viral and not
| being able to serve 1000 users is all it takes for your app
| to go down and your one shot at a successful company to die a
| sad death.
| onetoo wrote:
| I didn't say python can handle <=1K, I was saying >=1K. I
| feel confident that I am orders of magnitude off the real
| limit you'd meet.
|
| The specifics of that aside, any unprepared application is
| going to buckle at a sudden mega-surge of users. The
| solution remains largely the same, regardless of
| technology: Make sure everything that can be cached is
| cached, scale the hardware vertically until it stops
| helping, optimize your code, scale horizontally until you
| run out of money. I imagine the DB will be the actual
| bottleneck, most of the time.
|
| There are other reasons to not choose python for greenfield
| application, but performance should rarely be one IMO.
| neurostimulant wrote:
| If you run the debug web server, (e.g. Django's `manage.py
| runserver`) command, yes it has watcher that will automatically
| restart the web server process if there is a code changes.
|
| Once you deploy it to production, you usually run it using a
| WSGI/ASGI server such as Gunicorn or Uvicorn and let whatever
| deployment process you use handles the lifecycle. You usually
| don't use watcher in production.
|
| Basically similar stuff with nodejs, rails, etc.
| BiteCode_dev wrote:
| In dev, this is handled mostly by the OS with things like
| inotify, so it has little perf impact.
|
| In prod, you don't do it. Deployment implies sending a signal
| like HUP to your app, so that it reloads the code gracefully.
|
| All in all, everybody is moving to thid, even php. This allows
| for persitent connexion, function memoization, delegation to
| threadpools, etc
| gklitz wrote:
| > That seems pretty expensive in terms of resource consumption.
| And complex as you would have to run some kind of watcher
| process which handles watching your files and restarting the
| application?
|
| What? No, in reality it's just running your app in debug mode
| (just a cli flag), and when you save the files the next refresh
| of the browser has the live version of the app. It's neither
| expensive nor complex.
| swingingFlyFish wrote:
| Python is amazing and shines for Web development. I'd recommend
| taking a look at
| https://www.tornadoweb.org/en/stable/index.html. I use this in
| production on my pet project at https://www.meecal.co/. Put
| Nginx in front and you're golden.
|
| Definitely take a look, it's come a long way from ten years
| ago.
| aftbit wrote:
| I personally like Quart, which is like Flask, but with
| asyncio. Django is also incredibly popular and has been
| around forever, so it is very battle-tested.
| acdha wrote:
| > Does this still mean you have to restart your local web
| application after any change you made to it? I heard that some
| developers automate that, so that everytime they save a file,
| the web application is restarted. That seems pretty expensive
| in terms of resource consumption.
|
| All of the popular frameworks automatically reload. It's not
| instantaneous but with e.g. Django it was less than the time I
| needed to switch windows a decade ago and it hadn't gotten
| worse. If you're used to things like NextJS it will likely be
| noticeably faster.
| qwertox wrote:
| With `reload(module)` you don't even have to restart the
| server if you structure it properly. Think server.py and
| server_handlers.py, where server.py contains logic to detect
| a modification of server_handlers.py (like via inotify) and
| the base handlers which then call the "modifiable" handlers
| in server_handlers.py. This is not limited to servers
| (anything that loops or reacts to events) and can be nested
| multiple levels deep and is among the top 3 reasons of why i
| use Python.
| declaredapple wrote:
| > The other reason is that the Python ecosystem moved away from
| stateless requests like CGI or mod_php use and now is
| completely set on long running processes.
|
| The long-running process is a WSGI/ASGI process that handles
| spawning the actual code, similar to CGI. The benefit is that
| it can handle how it spawns the request workers via multiple
| runtimes, process/threads, etc. It's similar to CGI but instead
| of nginx handling it, it's a special program that specializes
| in the different options for python specifically.
|
| > Does this still mean you have to restart your local web
| application after any change you made to it? I heard that some
| developers automate that, so that everytime they save a file,
| the web application is restarted. That seems pretty expensive
| in terms of resource consumption. And complex as you would have
| to run some kind of watcher process which handles watching your
| files and restarting the application?
|
| Only for development!
|
| To update your code in production you first deploy the new code
| onto the machine, and then you tell the WSGI/ASGI such as
| Gunicorn to reload. This will cause it to use the new code for
| new request, without killing current requests.
|
| It's a graceful reload, with no file watching needed. Just a
| "systemctl reload gunicorn"
| eterevsky wrote:
| I'm even more excited to noGIL in 3.13. I wonder how these two
| features will play together?
| lucidguppy wrote:
| If python became fast, there's a chance it may become a language
| eater.
| csjh wrote:
| 2-9% isn't changing any language hierarchies
| wiseowise wrote:
| Groundwork for the future.
| poncho_romero wrote:
| What languages do you think it could realistically eat (that it
| hasn't already)?
| politelemon wrote:
| Javascript, if it becomes viable for web development?
| Sohcahtoa82 wrote:
| I'd love to see it eat JavaScript and Java for back-end code.
|
| But I doubt that's going to ever happen.
| janalsncm wrote:
| I like python but I would never choose it for anything more
| than trivial on the backend. I want to know what types are
| being passed around from one middleware function to the
| next. Yes python has annotations but that's not enough.
| acdha wrote:
| Just wait until you see what the enterprise Java
| developers passing around with type Object and encoded
| XML blobs. Type checking is really useful but it can be
| defeated in any language if you don't have a healthy
| technical culture.
| Sohcahtoa82 wrote:
| Is that why I see much object
| serialization/deserialization in Java?
|
| They're trying to pass data between layers of middleware,
| but Java has very strict typing, and the middleware
| doesn't know what kind of object it will get, so it has
| to do tons of type introspection and reflection to do
| anything with the data?
| Sohcahtoa82 wrote:
| Doubtful.
|
| There are enough people that REALLY hate whitespace-as-syntax.
| matsemann wrote:
| What is it really JIT-ing? Given it says that it's only relevant
| for those building CPython. So it's not JIT-ing my Python code,
| right? And the interpreter is in C. So what is it JIT-ing? Or am
| I misunderstanding something?
|
| > _A copy-and-patch JIT only requires the LLVM JIT tools be
| installed on the machine where CPython is compiled from source,
| and for most people that means the machines of the CI that builds
| and packages CPython_
| lifthrasiir wrote:
| Code fragments that implement each opcode in the core
| interpreter loop are additionally compiled in the way that each
| fragment is compiled into a relocatable binary. Once processed
| in that way, the runtime code generator can join required
| fragments by patching relocations, essentially doing the job of
| dynamic linkers. So it _is_ compiling your Python code, but the
| compiled result is composed of pre-baked fragments with
| patches.
| hawk01 wrote:
| It never hurts for any language to get an uplift in performance.
| Exciting to see python getting that treatment
| nritchie wrote:
| At the end of the day, the number of optimizations that even a
| JIT can do on Python is limited because all variables are boxed
| (each time the variable is accessed the type of the variable
| needs to be checked because it could change) and then function
| dispatches must be chosen based on the type of the variable.
| Without some mechanism to strictly type variables, the number of
| optimizations will always be limited.
| vanderZwan wrote:
| Per the spec all JS values are boxed too (aside from values in
| TypedArrays). The implementations managed to work their way
| around that too for the most part.
| johncolanduoni wrote:
| Couldn't you say the same for e.g. JavaScript? The variables
| aren't typed there either and prototypes are mutable. I could
| definitely see things being harder with Python which has a lot
| of tricky metaprogramming available that other interpreted
| languages don't but I don't think it's as simple as a lack of
| explicit types.
| btown wrote:
| Can't the happy path be branch predicted and speculatively
| executed, though? AFAIK V8 seems to do this:
| https://web.dev/articles/speed-v8#the_optimizing_compiler
| Joker_vD wrote:
| IIRC Instagram's flavour of Python had unboxed primitives (if
| the types were constricted enough).
| make3 wrote:
| javascript is insanely more optimized but has the same
| limitations as Python. So there is likely a lot more you can do
| despite the flexibility, like figure out in hot code what
| flexibility features are not used, & optimize around that
| crabbone wrote:
| Don't worry. Python already has syntactical constructs with
| mandatory type annotations. I will not be surprised if few
| years from now those type annotations will become mandatory in
| other contexts as well.
| jmdeschamps wrote:
| Maybe the article should be dated "January 9, 2024" ??? (or is it
| really a year old article?)
| jmakov wrote:
| How's this different than running pypy?
| dagw wrote:
| It supports all your existing python code and python libraries
| (at the cost of being significantly slower than PyPy).
| jjice wrote:
| The last two-ish years have been insane for Python performance.
| Something clicked with the core team and they obviously made this
| a serious goal of theirs and the last few years have been
| incredible to see.
| pletnes wrote:
| Microsoft are paying core devs to work on it full time, for
| one.
| eigenvalue wrote:
| It's because the total dollars of capitalized software deployed
| in the world using Python has absolutely exploded from AI
| stuff. Just like how the total dollars of business conducted on
| the web was a big driver of JS performance earlier.
| dehrmann wrote:
| But all the AI heavy lifting is done in native code.
| dron57 wrote:
| AI heavy lifting isn't just model training. There's about a
| million data pipelines and processes before the training
| data gets loaded into a PyTorch tensor.
| HDThoreaun wrote:
| also done in native code
| tomjakubowski wrote:
| Ehhh... if you're lucky. I've seen (and maybe even
| written) plenty of we-didn't-have-time-to-write-this-
| properly-with-dataframes Python data munging code, banged
| out once and then deployed to production. I'll take
| performance gains there.
| janalsncm wrote:
| From personal experience, no. I ended up writing rust
| bindings to call from python that turned minutes of
| loading into seconds.
| crabbone wrote:
| There were no noticeable performance improvements in the course
| of the last two years. I have no idea what you are talking
| about.
|
| The major change that's been going on in Python core
| development team is that Microsoft gets more and more power
| over what happens to Python. Various PSF authorities had strong
| links to Microsoft until today the head of PSF is straight up a
| Microsoft employee. Microsoft doesn't like to advertise this
| fact, because it rightfully suspects that rebranding Python as
| "Microsoft Python" would scare off some old-timers at least,
| but de facto it is "Microsoft Python".
|
| The community gone from bad to worse. Any real discussion about
| the language stopped years ago. Today it's a pretty top-down
| decision making process where there's no feedback, no criticism
| is allowed etc. My guess is that Microsoft doesn't have a plan
| for the third "E" here, but who knows? Maybe eventually they'll
| find a way to move Python to CLR and will peddle their version
| of it? -- I wouldn't be surprised if that happened, actually.
| attractivechaos wrote:
| > _There were no noticeable performance improvements in the
| course of the last two years._
|
| In fairness, Python did get faster. Python 3.9 took 82
| seconds for sudoku solving and 62 seconds for interval query.
| Python 3.11 took 53 and 43 seconds, respectively [1]. v3.12
| may be better. That said, whether the speedup is noticeable
| can be subjective. 10x vs 15x slower than v8 may not make
| much difference mentally.
|
| [1] https://github.com/attractivechaos/plb2
| mattgruskin wrote:
| Microsoft already tried Python on the CLR! They didn't stick
| with it. https://en.wikipedia.org/wiki/IronPython
| crabbone wrote:
| That's why I said _another_.
|
| It was a different time. Microsoft had a different strategy
| towards languages not developed by Microsoft. Similar to
| how there also used to be JScript, but now Node.js is
| basically a Microsoft's pet project.
|
| There are actually plenty of popular Microsoft's projects
| that took even more than two tries. Azure is like their
| third attempt at cloud services, iirc. Credit where credit
| is due, they learn from mistakes... unfortunately, that
| only makes them more insidious.
| Pxtl wrote:
| I still don't get why they didn't reduce API of the interpreter
| internals in Python 3 so that things like this would be more
| achievable.
|
| If you're going to break backwards compatibility, it's not like
| Unicode was the _only_ foundational problem Python 2 had.
| peterfirefly wrote:
| They did change the API for Python modules implemented in C.
| That was actually part of the reason why the 2->3 transition
| went so badly.
|
| It wasn't realistic to switch to 3.x when the libraries either
| weren't there or were a lot slower (due to using pure Python
| instead of C code).
|
| It also wasn't realistic to rewrite the libraries when the
| users weren't there.
|
| It was in many respects a perfect case study in how not to do
| version upgrades.
| agounaris wrote:
| a 2-9% improvement at global scale is insane! This is not a small
| number by any means.
| hyperman1 wrote:
| The article presents a copy and patch jit as something new, but I
| remember DOS's quickbasic doing the same thing. It generated very
| bad assembly code in memory by patching together template
| assembly blocks with filled in values, with a lot of INT
| instructions toward the quickbasic runtime, but it did compile,
| not interprete.
| chc4 wrote:
| Template JITs in general aren't a new technique, but Copy-and-
| Patch is a specific method of implementing it (leveraging a
| build time step to generate the templates from C code + ELF
| relocations).
| PeterisP wrote:
| TIL! I had used qbasic back in school, but I somehow always
| assumed that these basics were interpreters.
| rpeden wrote:
| QBasic was a slightly cut down version of Quickbasic that
| didn't include the compiler, so your assumption was correct
| in that case. QBasic was bundled with DOS but you had to buy
| Quickbasic.
| DeathArrow wrote:
| If JIT is a good thing for Python, why don't just compile to Java
| or .NET bytecode and use their already optimized infrastructure?
| crabbone wrote:
| Given how many Microsoft employees today steer the Python
| decision making process, I'm sure in not so distant future, we
| might see a new CLR-based Python implementation.
|
| Maybe Microsoft don't know yet how to sell this thing, or maybe
| they are just boiling the frog. Time will tell. But I'm pretty
| sure your question will be repeated as soon as people will get
| used to the idea of Python on JIT.
| _bohm wrote:
| Doesn't this already exist in IronPython?
| Timothycquinn wrote:
| Correct. I just checked the project yesterday and they are
| presently at 3.4 :-|
| crabbone wrote:
| It's a lot about popularity / stigma.
|
| Microsoft developed both JScript and Node.js. They could've
| continued with JScript, but obviously decided against it
| because JScript didn't earn the reputation they might have
| hoped for. Even if they invested efforts into rectifying
| the flaws of JScript, it would've been just too hard to
| undo the reputation damage.
|
| Microsoft made multiple attempts to "befriend" Python.
| IronPython was one of the failures. They also tried to
| provide editing tools (eg. intellisense in MSVS), but kind
| of given up on that too (but succeeded to a large degree
| with VSCode).
|
| The whole long-term Microsoft's strategy is to capture and
| put the developers on a leash. They won't rest until
| there's a popular language they don't control.
| Gollapalli wrote:
| "Python code runs 15% faster and and 20% cheaper on azure
| than aws, thanks to our optimized azurePython runtime. Use it
| for azure functions and ml training"
|
| Just a guess at the pitch.
| dbrueck wrote:
| If you're interested in learning more about the challenges and
| tradeoffs, both Jython (https://www.jython.org/) and IronPython
| (https://ironpython.net/) have been around for a long time and
| there's a lot of reading material on that subject.
| jsight wrote:
| Graal Python exists too: https://www.graalvm.org/python/
|
| It beats Python on performance, supposedly, but compatibility
| has never been great.
| SpaghettiCthulu wrote:
| I've found the startup time for Graal Python to be terrible
| compared with other Graal languages like JS. When I did
| some profiling, it seemed that the vast majority of the
| time was spent loading the standard library. If implemented
| lazily, that should have a negligible performance impact.
| monlockandkey wrote:
| This is what you are looking for. Python on the GraalVM
|
| https://github.com/oracle/graalpython
| rogerbinns wrote:
| Python is a convenient friendly syntax for calling code
| implemented in C. While you can easily re-implement the syntax,
| you then have to decide how much of that C to re-implement. A
| few of the builtin types are easy (eg strings and lists), but
| it soon becomes a mountain of code and interoperability,
| especially if you want to get the semantics exactly right. And
| that is just the beginning - a lot of the value of Python is in
| the extensions, and many popular ones (eg numpy, sqlite3) are
| implemented in C and need to interoperate with your re-
| implementation. Trying to bridge from Java or .NET to those
| extensions will overwhelm any performance advantages you got.
|
| This JIT approach is improving the performance of bits of the
| interpreter while maintaining 100% compatibility with the rest
| of the C code base, its object model, and all the extensions.
| jokoon wrote:
| what are those future optimization he talks about?
|
| he talks about an IL, but what's that IL? does that mean that the
| future optimization will involve that IL?
| bbojan wrote:
| > At the moment, the JIT is only used if the function contains
| the JUMP_BACKWARD opcode which is used in the while statement but
| that will change in the future.
|
| Isn't this the main reason why it's only a 2-9% improvement? Not
| much Python code uses the _while_ statement in my experience.
| darrenBaldwin03 wrote:
| Woah - very interesting!
| haberman wrote:
| The article describes that the new JIT is a "copy-and-patch JIT"
| (I've previously heard this called a "splat JIT"). This is a
| relatively simple JIT architecture where you have essentially
| pre-compiled blobs of machine code for each interpreter
| instruction that you patch immediate arguments into by copying
| over them.
|
| I once wrote an article about very simple JITs, and the first
| example in my article uses this style:
| https://blog.reverberate.org/2012/12/hello-jit-world-joy-of-...
|
| I take some issue with this statement, made later in the article,
| about the pros/cons vs a "full" JIT:
|
| > The big downside with a "full" JIT is that the process of
| compiling once into IL and then again into machine code is slow.
| Not only is it slow, but it is memory intensive.
|
| I used to think this was true also, because my main exposure to
| JITs was the JVM, which is indeed memory-intensive and slow.
|
| But then in 2013, a miraculous thing happened. LuaJIT 2.0 was
| released, and it was incredibly fast to JIT compile.
|
| LuaJIT is undoubtedly a "full" JIT compiler. It uses SSA form and
| performs many optimizations
| (https://github.com/tarantool/tarantool/wiki/LuaJIT-
| Optimizat...). And yet feels no more heavyweight than an
| interpreter when you run it. It does not have any noticeable warm
| up time, unlike the JVM.
|
| Ever since then, I've rejected the idea that JIT compilers have
| to be slow and heavyweight.
| cbmuser wrote:
| >>LuaJIT is undoubtedly a "full" JIT compiler.<<
|
| Yes, and it's practically unmaintained. Pull requests to add
| support for various architectures have remained largely
| unanswered, including RISC-V.
| dataangel wrote:
| Doesn't change parent's point, clearly proves it's possible
| peterfirefly wrote:
| I think Mike Pall has done enough work on LuaJIT for several
| lifetimes. If nobody else wants to merge pull requests and
| make sure everything still works then maybe LuaJIT isn't
| important enough to the world.
| jpfr wrote:
| The commit history looks pretty active...
|
| https://github.com/LuaJIT/LuaJIT/commits/v2.1/
| kzrdude wrote:
| How do you access optimizations such as dead code removal and
| constant propagation using this technique?
| SpaghettiCthulu wrote:
| I believe a JIT using this technique could eliminate dead code
| at the Python bytecode level, but not at the machine code
| level. That seems pretty reasonable to me.
| kzrdude wrote:
| Not sure, these optimizations multiply in power when used
| together. Propagate constants and fold constants, after that
| you can remove things like "if 0 > 0", both the conditional
| check and the whole block below it, and so on.
| divbzero wrote:
| I love the description in the draft PR: 'Twas the
| night before Christmas, when all through the code Not a
| core dev was merging, not even Guido; The CI was spun on
| the PRs with care In hopes that green check-markings soon
| would be there; ... ...
| ... --enable-experimental-jit, then made it,
| And away the JIT flew as their "+1"s okay'ed it. But they
| heard it exclaim, as it traced out of sight, "Happy JIT-mas
| to all, and to all a good night!"
|
| https://github.com/python/cpython/pull/113465
| thetinymite wrote:
| The PR message with a riff off the Night Before Christmas is
| gold.
|
| https://github.com/python/cpython/pull/113465
| julienchastang wrote:
| "by Anthony Shaw, January 9, 2023"
|
| 2024, right?
| EmilStenstrom wrote:
| It's interesting to see these 2-9% improvements from version to
| version. They are always talked about with disappointment, as if
| they are too small, but they also keep coming, with each version
| being faster than the previous one. I prefer a steady 10% per
| version over breaking things because you are hoping for bigger
| numbers. Those percentages add up!
| technocratius wrote:
| Well I think they even multiply, making it even better news!
| chmod775 wrote:
| I'd rather they add up. Minus -5% runtime there, another -5%
| there... Soon enough, python will be so fast my scripts
| terminate before I even run them, allowing me to send
| messages to my past self.
| Qem wrote:
| log(2)/log(1.1) ~= 7.27, so in principle sustained 10%
| improvements could double performance every 7 releases. But
| at some point we're bound to face diminishing returns.
| hartator wrote:
| Because it took 10 years to have Python 3 being as fast as
| Python 2 while being more strict. 2-9% means it will be another
| 10 years to have Python 3 being significantly faster.
|
| Ref: https://mail.python.org/pipermail/python-
| dev/2016-November/1...
| chalst wrote:
| 5.5% compounded over 5 years is a bit over 30%: not a huge
| amount but an easily noticeable speed-up. What were you
| thinking of when you typed "significantly faster"?
| aktenlage wrote:
| Compunding a decrease works differently than an increase.
| If something gets 10% faster twice it actually got 19%
| faster. In other words, the runtime is 90% of 90%, i.e.
| 81%.
| ummonk wrote:
| Not if "faster" refers to computation rate rather than
| runtime, in which case it becomes 100/81 i.e. 23% faster.
| bilsbie wrote:
| What! Why? (I couldn't figure it out from your link)
| Retr0id wrote:
| The link seems fairly clear to me - One explanation given
| is that python3 represents _all_ integers in a "long"
| type, whereas python2 defaulted to small ints. This gave
| (gives?) python2 an advantage on tasks involving
| manipulating lots of small integers. Most real-world python
| code isn't like this, though.
|
| Interestingly they singled out pyaes as one of the worst
| offenders. I've also written a pure-python AES
| implementation, one that deliberately takes advantage of
| the "long" integer representation, and it beats pyaes by
| about 2000%.
| nextaccountic wrote:
| This is happening mostly because Guido left, right? The take
| that CPython should be a reference implementation and thus slow
| always aggravated me (because, see, no other implementation can
| compete because every package depends on CPython kirks, in such
| a way that we're now removing the GIL of CPython rather than
| migrating to Pypy for example)
| bb88 wrote:
| Guido is still involved, but he's no longer the BDFL.
| Cupprum wrote:
| Just to clarify BDFL.
|
| [1]:
| https://en.wikipedia.org/wiki/Benevolent_dictator_for_life
| korijn wrote:
| Partly, yes, but do note he is still very much involved with
| the faster-cpython project via Microsoft. Google faster
| cpython and van rossum to find some interviews and talks. You
| can also check out the faster-cpython project on github to
| read more.
| girvo wrote:
| It's fascinating to me that this process seems to rhyme with
| that of the path PHP took, with HHVM being built as a second
| implementation, proving that PHP could be much faster -- and
| the main project eventually adopting similar approaches. I
| wonder if that's always likely to happen when talking about
| languages as big as these are? Can a new implementation of it
| ever really compete?
| bilsbie wrote:
| Someone please compare 3.13 to 2.3! I'd love to see how far
| we've come.
| chalst wrote:
| Good idea! It can be done fairly easily by people who are
| good with changelogs.
|
| FWIW, the most recent changelog is at
| https://docs.python.org/3.13/whatsnew/3.13.html
| ska wrote:
| I suspect parent meant a performance comparison...
| matheusmoreira wrote:
| I _envy_ these small and steady improvements!!
|
| I spent about one week implementing PyPy's storage strategies
| in my language's collection types. When I finished the vector
| type modifications, I benchmarked it and saw the ~10% speed up
| claimed in the paper1. The catch is performance increased
| _only_ for unusually large vectors, like thousands of elements.
| Small vectors were actually slowed down by about the same
| amount. For some reason I decided to press on and implement it
| on my hash table type too which is used everywhere. That slowed
| the _entire_ interpreter down by nearly 20%. The branch is
| still sitting there, unmerged.
|
| I can't imagine how difficult it must have been for these guys
| to write a compiler and _succeed_ at speeding up the Python
| interpreter.
|
| 1
| https://tratt.net/laurie/research/pubs/html/bolz_diekmann_tr...
| meisel wrote:
| Why has it taken so much longer for CPython to get a JIT than,
| say, PyPy? I would imagine the latter has far less engineering
| effort and funding put into it.
| kstrauser wrote:
| For the longest time, CPython was deliberately optimized for
| simplicity. That's a perfectly reasonable choice: it's easier
| to reason about, easier for new maintainers to learn it, easier
| to alter, easier to fix when it breaks, etc. Also, CPUs are
| pretty good at running simple code very quickly.
|
| It's only fairly recently that there's been critical mass of
| people who thought that performance trumps simplicity, and even
| then, it's only to a point.
| nomel wrote:
| > It's only fairly recently that there's been critical mass
| of people who thought that performance trumps simplicity
|
| This definitely wasn't true, from the user perspective. And,
| I'm not even convinced it's some "critical mass" of
| developers. These changes aren't coming from some mass of
| developers, there's coming from a few experts that had a
| clear plan, backed by the sanity of the huge disconnect that
| languages are actually meant for users of the language, not
| the developers of the language.
| arccy wrote:
| critical mass of cpython devs
| ya3r wrote:
| Maybe relevant blogpost is
| https://devblogs.microsoft.com/python/python-311-faster-cpyt...
| dated Oct 2022. The team behind this and some other recent
| improvements to Python are at Microsoft.
| t43562 wrote:
| I wish the money could be spent on PyPy but pypy has its problems
| - you don't get a big boost on small programs that run often
| because the warmup time isn't that fabulous.
|
| For larger programs like you sometimes it some incredibly
| complicated incompatibility problem. For me bitbake was one of
| those - could REALLY benefit from pypy but didn't work properly
| and I couldn't fix it.
|
| If this works more reliably or has a faster warmup then....well
| it could help to fill in some gaps.
| GGerome wrote:
| What an horrible language..
| matheusmoreira wrote:
| So they compile the C implementation of every opcode into
| templates and then patch in the actual values from the functions
| being compiled. That's genius, massive inspiration for me. It's
| automatically ABI compatible with the rest of CPython too.
|
| Is there an similarly accessible article about the specializing
| adaptive interpreter? It's mentioned in this article but not much
| detail is given, only that the JIT builds upon it.
|
| I wonder if I can skip the bytecode compilation phase.
___________________________________________________________________
(page generated 2024-01-09 23:00 UTC)