https://lwn.net/Articles/1029307/

LWN.net Logo LWN
.net News from the source LWN

  * Content
      + Weekly Edition
      + Archives
      + Search
      + Kernel
      + Security
      + Events calendar
      + Unread comments
      + -------------------------------------------------------------
      + LWN FAQ
      + Write for us
  * Edition
      + Return to the Front page

User: [        ] Password: [        ] [Log in]
|
[Subscribe]
|
[Register]
Subscribe / Log in / New account

Following up on the Python JIT

    Did you know...?

    LWN.net is a subscriber-supported publication; we rely on
    subscribers to keep the entire operation going. Please help out
    by buying a subscription and keeping LWN on the net.

By Jake Edge
July 14, 2025
---------------------------------------------------------------------
PyCon US

Performance of Python programs has been a major focus of development
for the language over the last five years or so; the Faster CPython
project has been a big part of that effort. One of its subprojects is
to add an experimental just-in-time (JIT) compiler to the language;
at last year's PyCon US, project member Brandt Bucher gave an
introduction to the copy-and-patch JIT compiler. At PyCon US 2025, he
followed that up with a talk on "What they don't tell you about
building a JIT compiler for CPython" to describe some of the things
he wishes he had known when he set out to work on that project. There
was something of an elephant in the room, however, in that Microsoft
dropped support for the project and laid off most of its Faster
CPython team a few days before the talk.

Bucher only alluded to that event in the talk, and elsewhere has made
it clear that he intends to continue working on the JIT compiler
whatever the fallout. When he gave the talk back in May, he said that
he had been working with Python for around eight years, as a core
developer for six, part of the Microsoft CPython performance
engineering team for four, and has been working on the JIT compiler
for the last two years. While the team at Microsoft is often equated
with the Faster CPython project, it is really just a part of it; "
`our team collaborates with lots of people outside of Microsoft'".

Faster CPython results

[Brandt Bucher]

The project has seen some great results over the last few Python
releases. Its work first appeared in 2022 as part of Python 3.11,
which averaged 25% faster than 3.10, depending on the workload; "`no
need to change your code, you just upgrade Python and everything
works'". In the years since, there have been further improvements:
Python 3.12 was 4% faster than 3.11, and 3.13 improved by 7% over
3.12. Python 3.14, which is due in October, will be around 8% faster
than its predecessor.

In aggregate, that means Python has gotten nearly 50% faster in less
than four years, he said. Around 93% of the benchmarks that the
project uses have improved their performance over that time; nearly
half (46%) are more than 50% faster. 20% of the benchmarks are more
than 100% faster. Those are not simply micro-benchmarks, the
benchmarks represent real workloads; Pylint has gotten 100% faster,
for example.

All of those increases have come without the JIT; they come from all
of the other changes that the team has been working on, while "
`taking a kind of holistic approach to improving Python
performance'". Those changes have a meaningful impact on performance
and were done in such a way that the community can maintain them. "
`This is what happens when companies fund Python core development'",
he said, "`it's a really special thing'". On his slides, that was
followed by the crying emoji  accompanied by an uncomfortable laugh.

Moving on, he gave a "duck typing" example that he would refer to
throughout the talk. It revolved around a duck simulator that would
take an iterator of ducks and "quack" each one, then print the sound.
As an additional feature, if a duck has an "echo" attribute that
evaluates to true, it would double the sound:

    def simulate_ducks(ducks):
        for duck in ducks:
            sound = duck.quack()
            if duck.echo:
                sound += sound
            print(sound)

That was coupled with two classes that produced different sounds:

    class Duck:
        echo = False

        def quack(self):
            return "Quack!"

    class RubberDuck:
        echo = True

        def __init__(self, loud):
            self.loud = loud

        def quack(self):
            if self.loud:
                return "SQUEAK!"
            return "Squeak!"

He stepped through an example execution of the loop in simulate_ducks
(). He showed the bytecode for the stack-based Python virtual machine
that was generated by the interpreter and stepped through one
iteration of the loop describing the changes to the stack and to the
duck and sound local variables. That process is largely unchanged "
`since Python was first created'".

Specialization

The 3.11 interpreter added specialized bytecode into the mix, where
some of the bytecode operations are changed to assume they are using
a specific type--chosen based on observing the execution of the code a
few times. Python is a dynamic language, so the interpreter always
needs to be able to fall back to, say, looking up the proper binary
operator for the types. But after running the loop a few times, it
can assume that "sound += sound" will be operating on strings so it
can switch to a bytecode with a fast path for that explicit
operation. "`You actually have bytecode that can still handle
anything, but has inlined fast paths for the shape of your actual
objects and data structures and memory layout.'"

All of that underlies the JIT compiler, which uses the specialized
bytecode interpreter, and can be viewed as being part of the same
pipeline, Bucher said. The JIT compiler is not enabled by default in
any build of Python, however. As he described in last year's talk,
the specialized bytecode instructions get further broken down into
micro-ops, which are "`smaller units of work within an individual
bytecode instruction'". The translation to micro-ops is completely
automatic because the bytecodes are defined in terms of them, "`so
this translation step is machine-generated and very very fast'", he
said.

The micro-ops can be optimized, that is basically the whole point of
generating them, he said. Observing the different types and values
that are being encountered when executing through the micro-ops will
show optimizations that can be applied. Some micro-ops can be
replaced with more efficient versions, others can be eliminated
because they "`are doing work that is entirely redundant and that we
can prove we can remove without changing the semantics'". He showed a
slide full of micro-ops that corresponded to the duck loop and slowly
replaced and eliminated something approaching 25% of them, which
corresponds to what the 3.14 version of the JIT does.

The JIT will then translate the micro-ops into machine code
one-by-one, but it does so using the copy-and-patch mechanism. The
machine-code templates for each of the micro-ops are generated at
CPython compile time; it is somewhat analogous to the way the
micro-ops themselves are generated in a table-driven fashion. Since
the templates are not hand-written, fixing bugs in the micro-ops for
the rest of the interpreter also fixes them for the JIT; that helps
with the maintainability of the JIT, but also helps lower the barrier
to entry for working on it, Bucher said.

Region selection

With that background out of the way, he moved on to some "
`interesting parts of working on a JIT compiler'" that are often
overlooked, starting with region selection. Earlier, he had shown a
sequence of micro-ops that needed to be turned into machine code, but
he did not describe how that list was generated; "`how did we get
there in the first place?'"

The JIT compiler does not start off with such a sequence, it starts
with code like in his duck simulation. There are several questions
that need to be answered about that code based on its run-time
activity. The first is: "`what do we want to compile?'" If something
is running only a few times, it is not a good candidate for JIT
compilation, but something that is running a lot is. Another question
is where should it be compiled? A function can be compiled in
isolation or it can be inlined into its callers and those can be
compiled instead.

When should the code be compiled? There is a balance to be struck
between compiling things too early, wasting that effort because the
code is not actually running all that much, and too late, which may
not actually make the program any faster. The final question is "why?
", he said; it only makes sense to compile code if it is clear that
compiling will make the code more efficient. "`If they are using
really dynamic code patterns or doing weird things that we don't
actually compile well, then it's probably not worth it.'"

One approach that can be taken is to compile entire functions, which
is known as "method at a time" or "method JIT". It "`maps naturally
to the way we think about compilers'" because it is the way that many
ahead-of-time compilers work. So, when the JIT looks at
simulate_ducks(), it can just compile the entire function (the for
loop) wholesale, but there are some other opportunities for
optimization. If it recognizes that most of the time the loop
operates on Duck objects, it can inline the quack() function from it:

    for duck in ducks:
        if duck.__class__ is Duck:
            sound = "Quack!"
        else:
            sound = duck.quack()
        ...

If there are lots of RubberDuck objects too, that class's quack()
method could be inlined as well. Likewise, the attribute lookup for
duck.echo could be inlined for one or both cases, but that all starts
to get somewhat complicated, he said; "`it's not always super-easy to
reason about, especially for something that is running while you are
compiling it'".

Meanwhile, what if ducks is not a list, but is instead a generator?
In simple cases, with a single yield expression, it is not that much
different from the list case, but with multiple yield expressions and
loops in the generator, it also becomes hard to reason about. That
creates a kind of optimization barrier and that kind of code is not
uncommon, especially in asynchronous programming contexts.

Another technique, and the one that is currently used in the CPython
JIT, is to use a "tracing JIT" instead of a method JIT. The technique
takes linear traces of the program's execution, so it can use that
information to make optimization decisions. If the first duck is a
Duck, the code can be optimized as it was earlier, with a guard based
on the class and inlining the sound assignment. Next up is a lookup
for duck.echo, but the code in the guarded branch has perfect type
information; it already knows that it is processing a Duck, so it
knows echo is false, and that if can be removed, leaving:

    for duck in ducks:
        if duck.__class__ is Duck:
            sound = "Quack!"
            print(sound)

"`This is pretty efficient. If you have just a list of Ducks, you're
going to be doing kind of the bare minimum amount of work to actually
quack all those ducks.'"

The code still needs to handle the case where the duck is not a Duck,
but it does not need to compile that piece; it can, instead, just
send it back to the interpreter if the class guard is false. If the
code is also handling RubberDuck objects, though, eventually that
else branch will get "hot" because it is being taken frequently.

At that point, the tracing can be turned back on to see what the code
is doing. If we assume that it mostly has non-loud RubberDuck
objects, the resulting code might look like:

    elif duck.__class__ is RubberDuck:
        if self.loud: ...
        sound = "Squeak!Squeak!"
        print(sound)
    else: ...

The two branches that are not specified would simply return to the
regular interpreter when they are executed. Since the tracing has
perfect type information, it knows that echo is true, so the sound
should be doubled, but there is no need to actually use "+=" to get
the result. So, now the function has the minimum necessary code to
quack either a Duck or a non-loud RubberDuck. If those other branches
start getting hot at some point, tracing can once again be used
optimize it further.

One downside of the tracing JIT approach is that it can compile
duplicates of the same code, as with "print(sound)". In "`very
branchy code'" Bucher said, "`some things near the tail of those
traces can be duplicated quite a bit'". There are ways to reduce that
duplication, but it is a downside to the technique.

Another technique for selecting regions is called "meta tracing", but
he did not have time to go into it. He suggested that attendees ask
their LLM of choice "`about the 'first Futamura projection' and don't
misspell it like me, it's not 'Futurama''", Bucher said to some
chuckles around the room.

Memory management

JIT compilers "`do really weird things with memory'". C programmers
are familiar with readable (or read-only) data, such as a const
array, and data that is both readable and writable is the normal
case. Memory can be dynamically allocated using malloc(), but that
kind of memory cannot be executed; since a JIT compiler needs memory
that it can read, write, and execute, it requires "`the big guns'":
mmap(). "`If you know the right magic incantation, you can whisper to
this thing with all these secret flags and numbers'" to get memory
that is readable, writable, and executable:

    char *data = mmap(NULL, 4096,
                      PROT_READ | PROT_WRITE | PROT_EXEC,
                      MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);

One caveat is that memory from mmap() comes in page-sized chunks,
which is 4KB on most systems but can be larger. If the JIT code is,
say, four bytes in length, that can be wasteful, so it needs to be
managed carefully. Once you have that memory, he asked, how do you
actually execute it? It turns out that "`C lets us do crazy things'":

    typedef int (*function)(int);
    ((function)data)(42);

That first line creates a type definition named "function", which is
a pointer to a function that takes an integer argument and returns an
integer. The second line casts the data pointer to that type and then
calls the function with an argument of 42 (and ignores the return
value). "`It's weird, but it works.'"

He noted that the term "executable data" should be setting off alarm
bells in people's heads; "`if you're a Rust programmer, this is what
we call 'unsafe code''" he said to laughter. Being able to write to
memory that can be executed is "`a scary thing; at best you shoot
yourself in the foot, at worst it is a major security
vulnerability'". For this reason, operating systems often require
that memory not be in that state. He said that the memory should be
mapped readable and writable, then filled in, and switched to
readable and executable using mprotect(); if there is a need to
modify the data later, it can be switched back and forth between the
two states.

Debugging and profiling

When code is being profiled using one of the Python profilers, code
that has been compiled should call all of the same profiling hooks.
The easiest way to do that, at least for now, is to not JIT code that
has profiler hooks installed. In recent versions of Python, profiling
is implemented by using the specializing adaptive interpreter to
change certain bytecodes to other, instrumented versions of them,
which will call the profiler hooks. If the tracing encounters one of
these instrumented bytecodes, it can shut the JIT down for that part
of the code, but it can still run in other, non-profiled parts of the
code.

A related problem occurs when someone enables profiling for code that
has already been JIT-compiled. In that case, Python needs to get out
of the JIT code as quickly as possible. That is handled by placing
special _CHECK_VALIDITY micro-ops just before "`known safe points'"
where it can jump out of the JIT code and back to the interpreter.
That micro-op checks a one-bit flag; if it is set, the execution
bails out of the JIT code. That bit gets set when profiling is
enabled, but it is also used when code executes that could change the
JIT optimizations (e.g. a change of class attributes).

Something that just kind of falls out of that is the ability to
support "`the weirder features of Python debuggers'". The JIT code is
created based on what the tracing has seen, but someone running pdb
could completely upend that state in various ways (e.g. "duck = Goose
()"). The validity bit can be used to avoid problems of that sort as
well.

For native profilers and debuggers, such as perf and GDB, there is a
need to unwind the stack through JIT frames, and interact with JIT
frames, but "`the short answer is that it's really really
complicated'". There are lots of tools of this sort, for various
platforms, that all work differently and each has its own APIs for
registering debug information in different formats. The project
members are aware of the problem, but are trying to determine which
tools need to be supported and what level of support they actually
need.

Looking ahead

The current Python release is 3.13; the JIT can be built into it by
using the --enable-experimental-jit flag. For Python 3.14, which is
out in beta form and will be released in October, the Windows and
macOS builds have the JIT built-in, but it must be enabled by setting
PYTHON_JIT=1 in the environment. He does not recommend enabling it
for production code, but the team would love to hear about any
results from using it: dramatic improvements or slowdowns, bugs,
crashes, and so on. Other platforms, or people creating their own
binaries, can enable the JIT with the same flag as for 3.13.

For 3.15, which is in a pre-alpha stage at this point, there are two
GitHub issues they are focusing on: "Supporting stack unwinding in
the JIT compiler" and "Make the JIT thread-safe". The first he had
mentioned earlier with regard to support for native debuggers and
profilers. The second is important since the free-threaded build of
CPython seems to be working out well and is moving toward becoming
the default--see PEP 779 ("Criteria for supported status for
free-threaded Python"), which was recently accepted by the steering
council. The Faster CPython developers think that making the JIT
thread-safe can be done without too much trouble; "`it's going to
take a little bit of work and there's kind of a long tail of figuring
out what optimizations are actually still safe to do in a
free-threaded environment'". Both of those issues are outside of his
domain of expertise, however, so he hoped that others who have those
skills would be willing to help out.

In addition, there is a lot of ongoing performance work that is going
into the 3.15 branch, of course. He noted, pointedly, that fast
progress, especially on larger projects, will depend on the
availability of resources. The words on his slide saying that changed
to bold and he gave a significant cough to further emphasize the
point.

As he wrapped up, he suggested PEP 659 ("Specializing Adaptive
Interpreter") and PEP 744 ("JIT Compilation") for further
information. For those who would rather watch something, instead of
reading about it, he recommended videos of his talks (covered by LWN
and linked above) from 2023 on the specializing adaptive interpreter
and from 2024 on adding a JIT compiler. The YouTube video of this
year's talk is available as well.

[Thanks to the Linux Foundation for its travel sponsorship that
allowed me to travel to Pittsburgh for PyCon US.]


Index entries for this article
Conference      PyCon/2025
Python          JIT


-----------------------------------------
[Log in] to post comments

Exciting changes for python

Posted Jul 14, 2025 9:59 UTC (Mon) by Niflmir (subscriber, #175249) [
Link] (59 responses)

I'm really happy to see all these performance related enhancements to
python. I always found it strange that the answer to python related
problems was to design your application in a very specific
(multiprocessing) manner, rewrite parts of it in a different language
when necessary, and hope that you never needed shared memory model
concurrency after choosing multiprocessing. These were the sorts of
project risks that made me consider other languages as better suited
for large scale engineering but python had so many good points of its
own, it didn't deserve to be relegated as some niche language.
[Reply to this comment]
Exciting changes for python

Posted Jul 14, 2025 12:29 UTC (Mon) by ballombe (subscriber, #9523) [
Link] (53 responses)

python is a good beginner language, but once you master it you should
learn another performance-minded language.

[Reply to this comment]
Exciting changes for python

Posted Jul 14, 2025 14:55 UTC (Mon) by anselm (subscriber, #2796) [
Link] (51 responses)

    but once you master it you should learn another
    performance-minded language

Python performance is perfectly adequate for a large number of use
cases, especially with the recent improvements.

[Reply to this comment]
Exciting changes for python

Posted Jul 14, 2025 16:48 UTC (Mon) by npws (subscriber, #168248) [
Link] (48 responses)

The problem is more, it is adequate until it isn't. And then you are
in for a lot of pain.
[Reply to this comment]
Exciting changes for python

Posted Jul 14, 2025 17:01 UTC (Mon) by anselm (subscriber, #2796) [
Link] (47 responses)

    The problem is more, it is adequate until it isn't. And then you
    are in for a lot of pain.

Perhaps. Perhaps not.

I'm with Donald E. Knuth - "Premature optimization is the root of all
evil".

[Reply to this comment]
Exciting changes for python

Posted Jul 14, 2025 17:47 UTC (Mon) by pizza (subscriber, #46) [Link]
(4 responses)

> Perhaps. Perhaps not.

The _only_ reason Python has any "performance" chops is because any
serious workload calls into non-python libraries to do the actual
work.
[Reply to this comment]
Exciting changes for python

Posted Jul 14, 2025 21:49 UTC (Mon) by anselm (subscriber, #2796) [
Link] (3 responses)

Frankly, I don't know what it is with you people. I've been
programming in Python, mostly web stuff, for many years now, and its
performance is usually not something I seem to need to worry a lot
about - certainly not to a point where I would want to ditch Python
for some compiled language. YMMV, of course.

[Reply to this comment]
Exciting changes for python

Posted Jul 14, 2025 22:32 UTC (Mon) by ballombe (subscriber, #9523) [
Link]

I wrote 'you should learn another performance-minded language.', not
that you should stop using python.
[Reply to this comment]
Exciting changes for python

Posted Jul 14, 2025 22:41 UTC (Mon) by pizza (subscriber, #46) [Link]

> Frankly, I don't know what it is with you people.

Which "people" are those, exactly?

> I've been programming in Python, mostly web stuff, for many years
now, and its performance is usually not something I seem to need to
worry a lot about

There are many, many more workloads than yours.

Here's an example. Several years ago I inherited a bit of python2
code that fired off a bunch of external commands and parsed the
output to generate a data file that was consumed by a PHP
dashboard-type web application. It typically took about ~30s to run.
When self-respecting Linux distros finally dropped python2, it needed
to be ported to something more modern. Unfortunately the input was
not unicode-clean, which made python3 very unhappy, so in frustration
I said "screw this" and rewrote it in perl. Despite being
algorithmically identical, the runtime dropped to about 3s -- a
literal order of magnitude faster.

This vast improvement in runtime performance allowed me to move to a
synchronous invocation instead of asynchronous (with state tracking
and the other complexity that entailed), resulting in an overall
system that was simpler, more robust, and more performant.

Python definitely has its strengths. But it also has its weaknesses.
[Reply to this comment]
Exciting changes for python

Posted Jul 17, 2025 8:06 UTC (Thu) by Niflmir (subscriber, #175249) [
Link]

I tried to pre-address your question. In order to have reasonable
performance, you need to choose multi-processing as your concurrency
paradigm (you might have a workload where async/await can help, but
better to just choose multi-processing). So this corners your design.
Now you need to do resource sharing in a third party application
(like pgbouncer) that isn't written in python, if you are working
with a protocol (say IMAP) where there isn't an existing resource
pool available, well you will have to write it yourself or give up on
resource pooling. These are GIL problems.

For the lack of JIT, you will see performance problems if you do any
sort of computation in python. That is why so many python workloads
are actually backed by fortran, not even C. But why can't a pure
python code base compete with that performance? Because the work
hasn't been done until now. Moreover, the polyglot codebase raises a
bunch of packaging issues that an interpreted language shouldn't
face.

This isn't an issue of compiled or not. JVM bytecode is interpreted
but jitted. Python on the jvm also doesn't have these issues. Pypy
fixed some of them. This is an issue of the cpython runtime. I know
and am comfortable with a bunch of languages, and all else being
equal, these risks are reasons not to choose python, and trying to
argue that all else isn't equal is just the realm of language flame
wars.
[Reply to this comment]
Exciting changes for python

Posted Jul 14, 2025 22:35 UTC (Mon) by jmalcolm (subscriber, #8876) [
Link] (36 responses)

While I agree with your comment, and I am in no position to question
Knuth, I am not sure that your response makes sense here.

Are we sure this is what Knuth even meant?

What does Knuth mean by "pre-mature optimization"? I would think he
means getting too clever with your algorithms or and too many layers
of design.

I doubt he means vetting that your core architectural approach or
data structures even remotely valid or, more on topic, that you
select a language with an appropriate set of properties for the task.

Sure, I can choose Python to build a real-time operating system
kernel. But it is a poor choice. I am going to be getting to that
non-evil optimization stuff pretty quickly.

The same is true in the opposite direction. I can choose C to write a
6 page web application with some simple CRUD logic. But it is a poor
choice.

As Facebook (Meta) taught us, you can always create your own compiler
after you have written your planet scale social media platform in an
interpreted language like PHP. But that is a pretty big task to have
to take on and probably best avoided if you can. If you knew you were
setting out to make Facebook to begin with, I think PHP (or Python)
would be a poor choice. I mean, I guess the benefit would be that the
rest of us might get a fast Python JIT compiler after you are forced
to build one. And writing compilers is fun. But that is not the point
we are making.

I mostly agree with your point. I agree because few of our projects
are going to grow to the scale of Facebook. And, if they do, we can
afford a team of compiler writers. But I hope those compiler writers
do not try to choose Python as the language for the compiler telling
me "Premature optimization is the root of all evil".

Python is fast enough for most of what most of us need to do. And it
can call into something faster for the few things that really do need
to be faster if we run into bottlenecks. Way too often we reach for
Rust, and containers, and K8S, and WASM because we want to be able to
go Goolge-Scale. More often, we should probably use Python (or
something with similar productivity / performance trade-offs). So,
again, I mostly agree with you.

However, "it is adequate until it isn't" also contains some wisdom.
Python is not for everything. It is not what I would reach for when I
go to design my next 3D game engine.
[Reply to this comment]
Exciting changes for python

Posted Jul 14, 2025 23:25 UTC (Mon) by anselm (subscriber, #2796) [
Link] (20 responses)

    Way too often we reach for Rust, and containers, and K8S, and
    WASM because we want to be able to go Goolge-Scale. More often,
    we should probably use Python (or something with similar
    productivity / performance trade-offs). So, again, I mostly agree
    with you.

+1

As far as I'm concerned, starting by deliberately picking a
programming language that is more inconvenient to develop with than
Python, on the vague off-chance that Python may in the end possibly
turn out to not be fast enough for the task at hand, is a form of
premature optimisation that should be avoided. There are usually
various things one can do to speed up a Python application short of
rewriting all of it in a "performance" language - but, as always when
doing optimisation work, one should work from actual performance
data, not gut feelings.

Of course this does not detract from the fact that there are
application areas where it is pretty clear from the start that Python
is probably not the best choice, as in your 3D game engine example.
Picking something else in such a situation is obviously reasonable.

[Reply to this comment]
Exciting changes for python

Posted Jul 15, 2025 7:08 UTC (Tue) by taladar (subscriber, #68407) [
Link] (19 responses)

On the other hand, is Python really convenient to develop with? I
find its model of checking almost nothing before runtime quite
inconvenient because it means most testing during development has to
be done by actually running the program instead of relying on a
compiler to catch my mistakes early.
[Reply to this comment]
Convenience of developing in Python

Posted Jul 15, 2025 10:53 UTC (Tue) by farnz (subscriber, #17727) [
Link] (17 responses)

Python has a very fast edit/run cycle (especially compared to a
language with an adequate to good type system, like C++, Rust, or
Idris), which makes it great for the sort of development where you
don't yet know what the logic of the system should look like -
effectively, you're using Python as your design language, instead of
trying to come up with a design before writing code.

In theory, you then end up with a program that works for the happy
path, and you "just" need to fix up all the failure handling logic;
in practice, it's often simpler to rewrite into another language at
that point.
[Reply to this comment]
Convenience of developing in Python

Posted Jul 15, 2025 12:08 UTC (Tue) by Wol (subscriber, #4433) [Link]

> in practice, it's often simpler to rewrite into another language at
that point.

"Always plan to throw the first one away - it's almost inevitable you
will" :-)

Cheers,
Wol
[Reply to this comment]
Convenience of developing in Python

Posted Jul 16, 2025 8:00 UTC (Wed) by taladar (subscriber, #68407) [
Link] (15 responses)

But Rust has a fast edit/don't even need to run cycle where in Python
you often need to test run your application at that point. Not to
mention the fact that you need to write Python with Python
conventions and those are just not compatible with e.g. Rust with
Rust conventions if you start in one language and then rewrite in
another.
[Reply to this comment]
Convenience of developing in Python

Posted Jul 16, 2025 8:41 UTC (Wed) by farnz (subscriber, #17727) [
Link] (14 responses)

You miss the point - until you run the code and see what it does, you
do not know whether what it does is useful or not. You have to run
the code to find out whether you're on the right track or not.

Rust makes it a lot easier to take a rough design, and implement a
quality program; Python makes it easier to go from a poorly written
spec ("make me something that helps me make cool posters. I'll tell
you if the things you're doing are helpful or not if you show me a
running program") to a rough design.
[Reply to this comment]
Convenience of developing in Python

Posted Jul 18, 2025 7:53 UTC (Fri) by taladar (subscriber, #68407) [
Link] (7 responses)

I would disagree with the assessment that Python makes it easier to
get from poorly written spec to a running program with Python, that
was pretty much my entire point.

The often claimed "advantage" of dynamic languages of a faster
iteration cycle relies entirely on the lack of compilation time while
ignoring the fact that you can cut short the iterations in strict
compiled languages at the point where the compiler tells you where
you made a mistake.

And that is even entirely ignoring that a stricter type system is
especially useful in the early design phase where you want to figure
out which assumptions and invariants the spec misses. For evidence of
that look at how broken many OpenAPI files or similar schemas are
that resulted from dynamic language projects because accidental
sloppiness of the dynamic type system slipped into the output (e.g. a
field that sometimes has one type, sometimes another, is sometimes
left out and sometimes null,...).
[Reply to this comment]
Convenience of developing in Python

Posted Jul 18, 2025 9:03 UTC (Fri) by farnz (subscriber, #17727) [
Link] (6 responses)

I do not see how you cut out the iterations with a stricter language
- the specification you're working to is "show me it running, I'll
tell you what I like and do not like".

You don't yet have assumptions and invariants at all. You have an
oracle (a human being) who will tell you if what you're showing them
is closer or further from what they imagined.

Python is excellent at this stage, because of the speed with which
you can effectively query the oracle; once you've built up a picture
of what the human wants, you may be able to go faster in another
language, but you cannot go faster in the early stages, since you
literally do not have a spec - you're working with an oracle, not a
document.
[Reply to this comment]
Convenience of developing in Python

Posted Jul 21, 2025 7:43 UTC (Mon) by taladar (subscriber, #68407) [
Link] (5 responses)

In case you are talking about the customer, I would avoid showing
anything running to them at all unless I plan to support it in the
long term because it is extremely hard to convince customers that
something still needs work once they have seen something running (or
even just something that visually looks like it is running even if
most of the underlying business logic is missing).
[Reply to this comment]
Convenience of developing in Python

Posted Jul 21, 2025 9:03 UTC (Mon) by farnz (subscriber, #17727) [
Link] (3 responses)

Then you've just breached the contract with the customer, and you're
fired from the job. They've agreed that you'll show them something
every week, and that you'll take feedback each week, and you're now
refusing to do so.

Remember that this is in the context of a spec that looks like
"there's some annoying things about my job; make the computer do them
the way I would", not a decent spec that you can work from. If you
don't show where you are to the customer regularly, you simply don't
know what they want.

And it's entirely possible in any programming language to only show
the things that are complete, and to not even have missing business
logic - if it's visible in the demo, it's ready for the customer to
accept, or for them to request changes to. This is where languages
with a fast edit/run cycle have an advantage; if the customer sees
the working (and complete, ready to ship) version you're showing them
this week, but wants one small change (where you define small, not
the customer), you can make the change on-the-fly and show them the
result of making that change. Customers, being human, then often
decide that actually the change was a bad idea, and want a different
change, but that's also easy to do.
[Reply to this comment]
Convenience of developing in Python

Posted Jul 22, 2025 7:23 UTC (Tue) by taladar (subscriber, #68407) [
Link] (2 responses)

But that is exactly my point, at that point you would have to ship
the Python version, the one people earlier claimed was "just for
prototyping". That is why I wouldn't want to show that one to the
customer because at that point I am stuck supporting that language
that is completely unmaintainable.
[Reply to this comment]
Convenience of developing in Python

Posted Jul 22, 2025 9:33 UTC (Tue) by farnz (subscriber, #17727) [
Link]

You can ship the Python version as 1.0, and rewrite (using PyO3 or
similar if you want to use Rust, or PyBind11 if you want to use C++)
for 1.1.

The other thing you can do to reduce your Python maintenance burden
(and move it to other languages) is port stable areas of the code
into your other language as you go along. Once you're confident that
you've extracted all the requirements relating to a given area of the
codebase, you can wrap it up for Python to access, and stop
maintaining the Python version. You don't have to wait until the
project as a whole is done to do this to components.
[Reply to this comment]
Convenience of developing in Python

Posted Jul 22, 2025 9:38 UTC (Tue) by anselm (subscriber, #2796) [
Link]

    that language that is completely unmaintainable

Speak for yourself. Where I work, we make a pretty good living out of
maintaining and extending Python code that has been around - at least
in part - for a very long time. IMHO, there are popular programming
languages which are way less maintainable than Python.

[Reply to this comment]
Convenience of developing in Python

Posted Jul 21, 2025 12:32 UTC (Mon) by mathstuf (subscriber, #69389)
[Link]

One tactic there is to make things that are undone *look* undone. For
example, instead of a polished icon from the designer (that you may
indeed already have), use a crayon-like representation in the
meantime (probably better if the developers make it themselves at
that point).
[Reply to this comment]
Convenience of developing in Python

Posted Jul 18, 2025 8:36 UTC (Fri) by Wol (subscriber, #4433) [Link]
(5 responses)

> You miss the point - until you run the code and see what it does,
you do not know whether what it does is useful or not. You have to
run the code to find out whether you're on the right track or not.

So how come my ex-boss (we're talking 50 years ago) spent his first
six months programming without a computer, and when the computer
turned up and the program was typed in (by secretaries - those people
who were very good at making perfect copies first time round), it
worked perfectly?

The main reason you need to run and test today, is because we rely on
too much 3rd-party code where you cannot trust the authors to have
either (a) documented it properly, or (b) checked it properly for
bugs. It's amazing the difference you can make to a program just by
printing it out, READING the code CAREFULLY, and running a linter/
compiler-with-warnings-at-max over it.

Any decent programmer should do that as a matter of course, but the
amount of code I work with, even today, where that clearly hasn't
been done is amazing. And depressing.

Cheers,
Wol
[Reply to this comment]
Convenience of developing in Python

Posted Jul 18, 2025 8:59 UTC (Fri) by farnz (subscriber, #17727) [
Link] (1 responses)

No - the main reason you need to run and test today is that the spec
you're given now is "show me it doing something, and I'll tell you if
it's the right thing or not".

When the spec is literally "make a program that does something that
$boss likes", you can't do decent software engineering; you're
running code to convert "show me a program that does something I
like" into a more formal spec, from where you can start the process
your ex-boss did.

What's changed is that people see the computer as a "magic" machine
that can do anything they can imagine, rather than as a tool to
calculate things.
[Reply to this comment]
Convenience of developing in Python

Posted Jul 18, 2025 9:25 UTC (Fri) by anselm (subscriber, #2796) [
Link]

    No - the main reason you need to run and test today is that the
    spec you're given now is "show me it doing something, and I'll
    tell you if it's the right thing or not".

This is because the people who want you to write software for them
generally find it difficult to explain to you precisely and
unambiguously enough, up front, what they want that software to
actually do in the end and how.

Software development would be so much easier if all one had to do was
to write some code based on a pre-existing complete, correct, and
unambiguous specification of the job to be done. The problem is that
writing that type of specification in the first place is about as
difficult, time-consuming, expensive, and error-prone as writing the
code itself, and therefore this tends to be attempted only in
exceptional circumstances.

In the meantime, the rest of the software development world has
adopted "agile" development practices which usually involve
iteratively building increasingly refined versions of the code until
the customer is satisfied, at which point the result may or may not
have anything to do with what the customer would have been able to
describe meaningfully at the start.

[Reply to this comment]
Convenience of developing in Python

Posted Jul 18, 2025 10:20 UTC (Fri) by interalia (subscriber, #26615)
[Link] (2 responses)

Seems unlikely that everyone was just like your ex-boss and got it
right first go, every time. His competence and that of all the
secretaries and other programmers across the industry, did it follow
a bell curve or was it really a vertical line at 100% correctness?

As to your implied point that people back then took more care before
running code, how weird that people back then adapted and adjusted
their behaviour to match the limitations and running costs of their
computers. It's about as weird as the reason why people in the 1800s
didn't fly on a plane when travelling to other countries. They
couldn't do it, so they didn't.
[Reply to this comment]
Convenience of developing in Python

Posted Jul 18, 2025 13:23 UTC (Fri) by Wol (subscriber, #4433) [Link]
(1 responses)

> Seems unlikely that everyone was just like your ex-boss and got it
right first go, every time. His competence and that of all the
secretaries and other programmers across the industry, did it follow
a bell curve or was it really a vertical line at 100% correctness?

I suspect it's what I call "The Word Effect". Word caused a massive
crash in professional literacy because it enabled managers (and other
workers) to write their own letters, but they didn't have all the
skills in layout and what actually makes a letter readable. That era
was awful for all the stuff that was a nightmare to read. (It's still
naff, but nowhere near as bad. MOST people have a better feel, but
they still don't know the rules and mess up ...)

ALL programmers of that era were more than competent at doing that
sort of stuff. I remember my fellow pupils at school taking stacks of
punched cards up to the local Oceanographic Institute to run on their
computer when they had spare capacity. Yes you're right, they did it
because they had to. BUT...

I suspect it's a case of that same law that Khim mentioned. All these
modern programming tools actually *hinder* productivity, but they
fool the less experienced/competent (and even the competent) into
thinking that these tools help.

Even farnz' example - I'm sure spending half an hour with the boss
saying "what are you trying to achieve" would save many hours of
waterfall. But the boss views your time as less valuable than his own
and won't spend that little bit of time with you, despite the fact it
will probably cost him hours viewing and rejecting your
mis-understanding. And yes, I've had that happened to me, a boss who
couldn't accept that I didn't understand what he wanted, and I wasn't
prepared to waste MY time getting it wrong ...

At the end of the day, if the boss doesn't know what he wants, how on
earth does he expect you to know? Yes I know it's difficult dealing
with an incompetent boss, but that's what it boils down to ...

Cheers,
Wol
[Reply to this comment]
Getting 30 minutes with the big boss

Posted Jul 18, 2025 14:23 UTC (Fri) by farnz (subscriber, #17727) [
Link]

Thing is that getting 30 minutes of uninterrupted time from the boss
is expensive and difficult. You add quite a bit more value if you can
do the right thing from an off-the-cuff idea and a 2 minute demo
session (with you changing the code live as the boss makes
suggestions in the demo session) every week, even if it takes 30
weeks to get to a final result rather than 10 weeks with a 2 minute
demo session at the 8 week mark.

This happens because the boss's time gets more expensive to schedule
the more of it you want; a 2 minute demo session can be squeezed in
between other meetings, where a 30 minute session needs the boss to
commit to a full meeting, replacing some other work they'd be doing
with that time.

And that's ignoring cases where the boss literally needs you to tell
them what the computer is, and is not, good at. In my experience,
there's plenty of people where it's not worth having the 30 minute
up-front conversation with them, because they have no sense of what's
reasonable to ask for, and expect that if it's easy to state the
problem, it'll be easy to solve. With these people, keeping the time
down focuses them on asking for relatively small things (since they
don't have long to state the problem), and giving them an example of
what can be done tends to focus their ideas on things like what
they've just seen, rather than on all the things they could be asking
for.

And yes, this is people management; in the end, it's about managing
the person with the money such that they are happy to pay you to
solve the problem, rather than choosing to pay someone else. It may
well be easier on you to demand more from the person with money, but
if that leads to them deciding that they're not paying you, but
paying someone else, that's a problem for you, even if you can
produce results quicker than the person they're paying.
[Reply to this comment]
Exciting changes for python

Posted Jul 16, 2025 8:40 UTC (Wed) by interalia (subscriber, #26615)
[Link]

I added type hints to my Python code when using an IDE and found it
really helps a lot. The IDE can flag when I've passed the wrong
variable to a function, made a typo, or tried to access an attribute/
method that doesn't exist on an object of that that type.

The fact that type hints are an optional feature can make it sort of
a helpful compromise between full static typing and the convenience/
ease of dynamic typing. I find sometimes it's nice when prototyping a
new function to be able to omit types, and get a feel for whether the
basic approach works without having to fully specify everything the
way I'd have to in order to make a C/C++/Rust compiler happy. A bit
later when the new Python function is more settled, I can formalise
it by adding the final type hints in so that the IDE can check
existing and future callers for me.

The fact type hints are optional means you can add them to an
existing code base piecemeal by starting with hints on some targeted
functions but not all, so you don't have to convert the whole thing
at once to start getting some benefits. I still probably wouldn't
write a large code base in Python but with type hints they're
probably at least tractable, and made my medium size scripts better
and more maintainable.
[Reply to this comment]
Exciting changes for python

Posted Jul 15, 2025 11:16 UTC (Tue) by khim (subscriber, #9252) [Link
]

> Are we sure this is what Knuth even meant?

We can be 100% sure Knuth haven't meant what most perusers of that
quote mean.

You may start by actually reading the article that gave us that
quote.

Just even the name of the article should give you the hint:
Structured programming with go to statements.

And these "premature optimizations"? They are about tricks that are
incompatible with "normal" structured programming and things like
manual loop unrolling.

Plus you may find another quote in this exact article: In established
engineering disciplines a 12% improvement, easily obtained, is never
considered marginal; and I believe the same viewpoint should prevail
in software engineering.

Or, maybe, more expanded form:

    The improvement in speed from Example 2 to Example 2a is only
    about 12%, and many people would pronounce that insignificant.
    The conventional wisdom shared by many of today's software
    engineers calls for ignoring efficiency in the small; but I
    believe this is simply an overreaction to the abuses they see
    being practiced by penny-wise-and-pound-foolish programmers, who
    can't debug or maintain their "optimized" programs. In
    established engineering disciplines a 12% improvement, easily
    obtained, is never considered marginal; and I believe the same
    viewpoint should prevail in software engineering. Of course I
    wouldn't bother making such optimizations on a one-shot job, but
    when it's a question of preparing quality programs, I don't want
    to restrict myself to tools that deny me such efficiencies.

I would say that using that quote to justify Python is as far from
what Knuth had in mind as one may imagine.

We are not talking about something that leaves 12% improvement in the
table, but about something that makes things tens or, maybe (if we
think about multicore CPUs) hundreds of times slower!

[Reply to this comment]
Exciting changes for python

Posted Jul 15, 2025 11:48 UTC (Tue) by excors (subscriber, #95769) [
Link] (11 responses)

> Are we sure this is what Knuth even meant? What does Knuth mean by
"pre-mature optimization"?

What he actually says is: (https://dl.acm.org/doi/10.1145/
356635.356640)

> The conventional wisdom shared by many of today's software
engineers calls for ignoring efficiency in the small; but I believe
this is simply an overreaction to the abuses they see being practiced
by penny-wise-and-pound-foolish programmers, who can't debug or
maintain their "optimized" programs. In established engineering
disciplines a 12% improvement, easily obtained, is never considered
marginal; and I believe the same viewpoint should prevail in software
engineering. [...] I don't want to restrict myself to tools that deny
me such efficiencies.
>
> There is no doubt that the grail of efficiency leads to abuse.
Programmers waste enormous amounts of time thinking about, or
worrying about, the speed of noncritical parts of their programs, and
these attempts at efficiency actually have a strong negative impact
when debugging and maintenance are considered. We _should_ forget
about small efficiencies, say about 97% of the time: premature
optimization is the root of all evil.
>
> Yet we should not pass up our opportunities in that critical 3%. A
good programmer will not be lulled into complacency by such
reasoning, he will be wise to look carefully at the critical code;
but only after that code has been identified.

He's not saying performance doesn't matter - he explicitly says it
shouldn't be ignored. He's arguing for a more thoughtful approach:
most code doesn't need to be heavily optimised, but some code does,
and it's important to have the tools to identify those bottlenecks
and then to optimise them. Premature optimisation is when you skip
the identification step and optimise regardless, but the other
extreme is bad too.

In that paper he's specifically talking about optimisations like
unrolling loops and turning 'while' into 'go to' to reduce the number
of control flow instructions. You should "start with a
well-structured program and then use well-understood transformations
that can be applied mechanically". Nowadays most languages have
optimising compilers that can apply those transformations
automatically, so we don't have to sacrifice the source code's
readability for performance in that way.

I think that's a significant weakness of Python: once you identify
your program's bottlenecks, they cannot be straightforwardly
transformed (by either human or compiler) into efficient code, as
demonstrated by the struggles with implementing a decent JIT. You
have to rewrite the bottlenecks in a different programming language,
which is far from ideal.

It sounds like Python also makes it hard to identify those
bottlenecks: this article says enabling the Python profiler will
disable the JIT, and implies native profilers won't work well with
the JIT any time soon, so you can't measure the program's actual
performance. And the non-deterministic nature of JIT makes profiling
hard even in the best case. Compiled languages are generally much
better for both identifying bottlenecks and then optimising them with
incremental changes, because there's a more direct correspondence
between source code and machine code; I think that's more important
than their baseline performance for unoptimised code.

But also, Knuth is talking about optimisations on the scale of 12%.
The performance cost of using Python instead of a compiled language
is regularly >1000% - I imagine he'd be shocked that a good software
engineer would even consider that. (At least, I imagine he would have
been when he wrote this 50 years ago. A modern desktop PC is maybe
1,000,000% faster than a supercomputer from back then, so perhaps
that changes one's perspective on the tradeoffs.)
[Reply to this comment]
Exciting changes for python

Posted Jul 15, 2025 13:13 UTC (Tue) by anselm (subscriber, #2796) [
Link] (4 responses)

    Knuth is talking about optimisations on the scale of 12%. The
    performance cost of using Python instead of a compiled language
    is regularly >1000%

Knuth is talking about "a 12% improvement, easily obtained" (my
emphasis). I suspect that whether rewriting a Python program in a
compiled language leads to "easily obtained" performance improvements
(of 12% or 1000% or anything in between) is debatable and also
depends on the specific circumstances, including, e.g., how often it
will be needed. For something that is only supposed to be run once,
spending a day or two to make it run in 5 seconds in something like
Rust instead of a couple of hours to make it run in a minute in
Python (just for the sake of an example, to get the 1000% in) may not
be worthwhile, but if it is something that is supposed to be run via
cron every minute, every day, then very probably yes.

[Reply to this comment]
Exciting changes for python

Posted Jul 16, 2025 8:06 UTC (Wed) by taladar (subscriber, #68407) [
Link] (3 responses)

I doubt Python development is faster to the degree claimed,
especially compared to a modern language with good error messages and
tooling like Rust (as opposed to the mess that is C/C++ build systems
and errors).

But even if it was, not ever spending time on learning Python would
be a pretty big time saver that probably made up for those few hours
on the few programs small enough to make Python viable.
[Reply to this comment]
Exciting changes for python

Posted Jul 16, 2025 9:36 UTC (Wed) by farnz (subscriber, #17727) [
Link] (2 responses)

My lived experience is that Python is valuable when you're working
with someone who "doesn't program", but where the spec for the
program is "make this colleague happy". You're not going to teach
them Rust - but they may well be trainable to the point where they
hack around at the part of the Python program that doesn't work for
them (breaking the rest into the bargain - they're not going to fix
bugs, because that's programming to them) so that they can show you
what they meant.

The resulting Python code is not debuggable; it's full of quirks and
bugs, mixed together so that you can't tell if a given bit of code is
purely a bug, or if it has side effects that are necessary to make
another piece of code do the right thing, and with a very healthy
helping of bad naming.

Once you've gone through this to a point where they're happy with
what the program does when it doesn't crash, you've got a pile of
code that needs rewriting (whether you're sticking with Python, or
changing language); but at this point, you have a better spec that
"make my colleague happy with what the program does", because the
spec is now "do what the Python program did, but without the bugs".
[Reply to this comment]
Exciting changes for python

Posted Jul 16, 2025 11:33 UTC (Wed) by anselm (subscriber, #2796) [
Link]

Python code pays for my salary and that of my colleagues. We get
stuff done. Our customers are happy. New ones are coming in all the
time. Our management loves our team because we perform way better
than the budget projections say we should, and have done so for
several years. As far as we're concerned, Python is just fine.

[Reply to this comment]
Exciting changes for python

Posted Jul 17, 2025 4:01 UTC (Thu) by raven667 (subscriber, #5198) [
Link]

Good description, this is something I do too in my work where many of
my colleagues are not software engineers, but they can do scripting
in shell and perl and I encourage them to write code to solve their
problems, which immediately provides value to them, and if that needs
to be promoted into an ongoing system then I can use their working
prototype as a starting point to refactor using our house style and
software engineering practices.
[Reply to this comment]
Premature optimization

Posted Jul 15, 2025 14:01 UTC (Tue) by marcH (subscriber, #57642) [
Link] (1 responses)

> He's arguing for a more thoughtful approach: most code doesn't need
to be heavily optimised, but some code does, and it's important to
have the tools to identify those bottlenecks and then to optimise
them. Premature optimisation is when you skip the identification step
and optimise regardless, but the other extreme is bad too.

BTW "premature optimization" is just one particular case of: many
developers don't like testing. They just want to (re-)write code.

"Oh, look: this code is not optimal. I can easily make it faster!"
Never mind it's miles away from any critical path and that rewrite
will yield zero user-visible improvement. Sometimes the developer
will not even microbenchmark their rewrite in isolation...

I think most people I ever discussed Knuth's quote with understood
that accurately. Whether that actually stopped them from the selfish
pleasure of (re-)writing code is a different question :-)

And yes: this is also why a lot of Python software does not need to
be converted to a different language - or any JIT compilation even.
Because the critical paths are already in a different language (in an
ideal world, it would be easier for anyone to mix different
languages).

[Reply to this comment]
Premature optimization

Posted Jul 15, 2025 15:35 UTC (Tue) by farnz (subscriber, #17727) [
Link]

This is also why tools like PyO3 for Rust, and Boost.Python, pybind11
or PyCXX for C++ are so useful.

It's common for a 10% speedup in 20% of your code base to have more
impact on users than zeroing out the runtime of the other 80% of the
code; why rewrite all of your code into some other language, when you
can rewrite the 20% that has a significant impact on your users and
get over half the benefits?
[Reply to this comment]
Exciting changes for python

Posted Jul 15, 2025 14:03 UTC (Tue) by marcH (subscriber, #57642) [
Link] (3 responses)

> Compiled languages are generally much better for both identifying
bottlenecks and then optimising them with incremental changes,
because there's a more direct correspondence between source code and
machine code;

Errr... are you really sure about that? For sure debugging optimized
C code is a lost cause.
[Reply to this comment]
Exciting changes for python

Posted Jul 16, 2025 0:34 UTC (Wed) by Paf (subscriber, #91811) [Link]
(2 responses)

Yes, as someone who writes in C for a living in a high performance
setting, the correspondence to machine code is only rarely relevant
to the performance unless I'm staring at some realllllly tight bit of
code. More often it's much more about approach - *what* are you doing
or perhaps could not be doing, not the fine details of how the
machine is doing it.

And I say that as someone who finds that optimization fun. But it
doesn't come up much.
[Reply to this comment]
Exciting changes for python

Posted Jul 16, 2025 22:21 UTC (Wed) by raven667 (subscriber, #5198) [
Link] (1 responses)

> More often it's much more about approach - *what* are you doing or
perhaps could not be doing, not the fine details of how the machine
is doing it.

I've found that to be very true, the first draft of code often lives
for years if it works but can be very meandering as the initial
developer figured how to solve the problem as they went, once you
know what the output is supposed to be refactoring it can be as
simple as replacing many loops through a data set one loop checking
multiple conditions or a hash key lookup replacing a complex set of
expressions, sometimes made easier by a simple reformatting of the
input data to make it easier to look up. The end result is asking the
computer to do much less work which is almost always faster and as
you say, doesn't require deep microarchitectural knowledge, just a
basic sense of proportion.
[Reply to this comment]
Exciting changes for python

Posted Jul 17, 2025 8:29 UTC (Thu) by Wol (subscriber, #4433) [Link]

lol!

My second job, my boss came to me and said "we have this job. It's a
six-week deadline. Can you do it?", and I responded "I'll give it a
damn good try". I gave myself a five-week deadline, to give the rest
of the team time to do the remaining work once the program spat out
the results.

Four weeks into the job, the program ran. Estimated run time to
completion? SIX WEEKS! Cue three days panic as I optimised the hell
out of it - I finally handed the program over Wednesday morning week
5 (and went sick :-). The rest of the team made the deadline.

And I basically did pretty much exactly what you said ...

Cheers,
Wol
[Reply to this comment]
Exciting changes for python

Posted Jul 22, 2025 21:38 UTC (Tue) by anton (subscriber, #25547) [
Link] (1 responses)

    If you knew you were setting out to make Facebook to begin with,
    I think PHP (or Python) would be a poor choice.

Judging by the fact that Facebook/Meta has been enormously successful
in a highly competetive field, it may have been a good choice. It may
have given them the flexibility to determine the requirements, while
competitors who used a language like C++ or Java might have been too
slow in adapting to the users and consequently did not gain the
critical mass.
[Reply to this comment]
Exciting changes for python

Posted Jul 23, 2025 9:51 UTC (Wed) by paulj (subscriber, #341) [Link]

Note that Facebook long ago rewrote the PHP runtime, and extended the
PHP language to add typing. They first had a transpiler for PHP->C++,
then wrote their own PHP JIT VM (HHVM) somewhere around 2010, along
with support for an house dialect of PHP that adds static typing -
called "Hack" (open sourced around 2014 it seems).
[Reply to this comment]
Exciting changes for python

Posted Jul 15, 2025 21:34 UTC (Tue) by Bluehorn (subscriber, #17484)
[Link] (4 responses)

I can confirm that there are scenarios where you get into a lot of
pain with Python. I bought the conventional wisdom, saying

 1. Premature optimization is the root of all evil
 2. use Python for the high-level, glue code, and drop into C/C++ for
    speed-critical pieces.

This might work for number-crunching applications or games, where
there is just a time critical core that can be optimized. In the
application I am working on, we don't have such pieces. We don't
compute much, we just keep data organized for the user. So there is
no single bottleneck, we can only optimize peaks of 7% at most from
the flame graph. I wish I hadn't fallen for that advice. We don't
really have the time to port to another language, and the main
critique about our software is about performance.
[Reply to this comment]
Exciting changes for python

Posted Jul 15, 2025 23:39 UTC (Tue) by marcH (subscriber, #57642) [
Link] (3 responses)

> We don't compute much, we just keep data organized for the user. So
there is no single bottleneck, we can only optimize peaks of 7% at
most from the flame graph.

If you don't "compute" much, then why is the programming language the
bottleneck?

All languages are equal when waiting for storage, network, databases,
user input, etc. In that case, optimizing requires re-architecturing
data and caching and that's generally not language specific. Except
higher level languages make it much easier, safer and faster to
experiment with different designs and strategies.
[Reply to this comment]
Exciting changes for python

Posted Jul 16, 2025 8:10 UTC (Wed) by taladar (subscriber, #68407) [
Link] (2 responses)

I wouldn't say Python makes it safe or easy to change designs and
strategies. That is precisely where you want a compiler that tells
you all the spots where you accidentally broke something while
refactoring.
[Reply to this comment]
Exciting changes for python

Posted Jul 16, 2025 15:14 UTC (Wed) by marcH (subscriber, #57642) [
Link] (1 responses)

I wrote "experiment". Obviously, interpreted languages catch much
fewer bugs at compile time and require a lot more test coverage. You
absolutely need a ton of test coverage when refactoring and it tends
to be never enough. Introducing regressions in less usual cases is
bad indeed, but that does not get in the way when experimenting new
designs to address performance bottlenecks in _common_ use cases. It
only comes back and bites you later but that's not specific to
performance, that's just the nature of all interpreted languages and
that's the price to pay for a much faster development and prototyping
loop.

When I wrote "safer" I had memory corruption and C/C++ specifically
in mind - these are always the most-time consuming bugs by several
orders of magnitude. There are indeed many other options besides
Python and C/C++ and some may be better than either depending on the
use case.

Compared to lower level languages, Python saves time prototyping not
just because you write less code but also because there are very high
level libraries and approaches to choose from, especially for I/O and
concurrency.
[Reply to this comment]
Exciting changes for python

Posted Jul 16, 2025 18:24 UTC (Wed) by mathstuf (subscriber, #69389)
[Link]

> especially for I/O and concurrency.

Except for inotify for some reason. Granted, this was in 2015 when
the Great Python 3 Disruption was still in full swing, but the only
inotify libraries I could find were either wrapped up in
way-too-large frameworks (e.g., Twisted) or abandoned by their
maintainers on Python2 (possibly until asyncio was established?).
[Reply to this comment]
Exciting changes for python

Posted Jul 14, 2025 18:51 UTC (Mon) by mb (subscriber, #50428) [Link]
(1 responses)

Python is a good language and I enjoy programming in it.
But all of my bigger projects eventually ran into performance and/or
threading problems. No exceptions.
That's the main reason why I don't use Python for anything that has
the potential to outgrow 1kLOC today.

And performance improvements of 50% or 100% don't really help,
because compiled languages are *so* much faster.
I have reached almost-compiled performance with Cython in certain
projects. But why would I want to write code in Cython, if much
better compiled languages with much better type systems exist?
[Reply to this comment]
Alternatives to Python

Posted Jul 15, 2025 9:53 UTC (Tue) by farnz (subscriber, #17727) [
Link]

One thing I've done in the past, with PyCXX when I used C++ and PyO3
now that I use Rust most of the time is to rewrite the "critical
path" code from Python into something more amenable to high
performance.

The core idea is to use cProfile to find the bits of Python code that
are bottlenecking you, then work out what architecture changes are
needed to move the bottleneck code into a different language - which,
of course, can include moving a chunk of non-bottleneck code across,
too - leaving just glue code behind in Python.
[Reply to this comment]
Exciting changes for python

Posted Jul 23, 2025 16:04 UTC (Wed) by danchev (guest, #151356) [Link
]

Once you are proficient in English and you can read medical
literature, you are pretty much a MD and you should try something
else.

P.S. I have yet to meet anyone who can genuinely claim to have fully
mastered a major programming language.
[Reply to this comment]
Exciting changes for python

Posted Jul 14, 2025 12:35 UTC (Mon) by khim (subscriber, #9252) [Link
] (4 responses)

> python had so many good points of its own, it didn't deserve to be
relegated as some niche language.

It's strange to call one of the most popular languages "niche
language". It's not a "niche language", but more of a "glue
language": you don't use it for anything where many developers may
touch the same code with not dedicated owner (there are statically
typed languages for that), but it's very nice if the code is not
supposed to be supported by more than one guy.

[Reply to this comment]
Exciting changes for python

Posted Jul 14, 2025 18:53 UTC (Mon) by rjones (subscriber, #159862) [
Link] (3 responses)

Describing one of the most popular programming languages that ever
existed as "niche" is pretty silly.

If Python is niche then what adjectives would one use to describe
Golang or C#? Toys or hobby languages?
[Reply to this comment]
Exciting changes for python

Posted Jul 15, 2025 10:57 UTC (Tue) by khim (subscriber, #9252) [Link
] (2 responses)

If we go by popularity then "toys" or "non-professional" things are
always the more popular ones. Everywhere.

There are much more toy cars in existence than real cars, e.g.

Thus, if anything, popularity of Python puts it squarely in the "toy"
area: serious languages couldn't win by sheer number as with any
other thing. "Professional" instruments are much less common that
"consumer ones", after all.

[Reply to this comment]
Exciting changes for python

Posted Jul 17, 2025 7:46 UTC (Thu) by rjones (subscriber, #159862) [
Link] (1 responses)

By that logic brainfuck must be one of the most serious languages
ever created.
[Reply to this comment]
Exciting changes for python

Posted Jul 17, 2025 11:44 UTC (Thu) by khim (subscriber, #9252) [Link
]

Nope. A implies B is not the same as B implies A.

If tools are only used by professionals in a professional settings
then they are, normally, less popular than toys. But that doesn't
mean that something that's rarely used is, by necessity, a
professional tool. There are many other things, besides professional
tools, that are rare, after all.

[Reply to this comment]
How to handle dynamic content ?

Posted Jul 14, 2025 10:51 UTC (Mon) by claudex (subscriber, #92510) [
Link] (3 responses)

I understand that the examples are not actual code. But how do the
JIT handle the changes to a type. For example, with this
implementation:

for duck in ducks:
  if duck.__class__ is Duck:
     sound = "Quack!"
  else:
     sound = duck.quack()
   ...

What happens if after a few iterations, I change the definition of
the quack method with something like:

Duck.quack = RubberDuck.quack()

How the JIT will detect that it should revert back to the interpreter
?

[Reply to this comment]
How to handle dynamic content ?

Posted Jul 14, 2025 12:30 UTC (Mon) by khim (subscriber, #9252) [Link
]

> How the JIT will detect that it should revert back to the
interpreter ?

In the exact same way it does that in a JavaScript?

It's not a rocket science, just a simple if_class_is_still_the_same
check added to the generated code. With near-zero cost because in
most programs it's predicted 100% correctly always.

[Reply to this comment]
How to handle dynamic content ?

Posted Jul 14, 2025 16:41 UTC (Mon) by Cyberax ( supporter , #
52523) [Link] (1 responses)

> What happens if after a few iterations, I change the definition of
the quack method with something like

The generated code has a guard flag that switches execution back to
interpretation, and the class assignments have (generated) setters
for this flag.

This was first pioneered in Squeak compilers in early 90-s, then Java
used it in their Hotspot JIT for de-virtualization, and finally V6
JavaScript JIT extended it for the fully dynamic JS.
[Reply to this comment]
How to handle dynamic content ?

Posted Jul 14, 2025 22:18 UTC (Mon) by claudex (subscriber, #92510) [
Link]

Thanks, that's the explanation I was missing.
[Reply to this comment]
Tracing JITs have their downsides

Posted Jul 14, 2025 17:46 UTC (Mon) by kleptog (subscriber, #1183) [
Link] (4 responses)

The comment about how "branchy code" has issues with tracing JITs
kind of waves over that some common use-cases do this. PyPy also uses
a tracing JIT and we had problematic experiences there.

The example we had was code that was parsing mailbox files.
Basically, we were looping over the email headers and doing stuff as
you go, and the tracing JIT went along and started creating
specialisations for every order that the headers could appear in. The
resulting JIT generated code was much faster, but the memory usage
grew even faster, so that for a fixed amount of memory we were better
off just running more non-JIT python interpreters in parallel.

This same problem applies to any kind of interpreter or parsing where
you're scanning through an input and selecting one of many functions
based on that. What we really wanted was a special STOP_JIT_HERE
marker to place at the beginning/end of the loop to stop the JIT
compiler from tracing across that and producing many traces that were
rarely used. IIRC at the time the PyPy developers were not interested
in such a hack, but maybe in the meantime the state-of-the-art has
advanced to solve this.

Raw performance is important, but in some contexts, performance per
GB of memory used is also relevant.

(Though, thinking on it now, the issue is exacerbated by the GIL
meaning that parallel processing must happen in separate processes.
If you could parallel process in a single Python process with
threads, then the JIT could share the rarely used traces over all the
threads and the memory usage would probably be much less problematic.
Still, I think I'd prefer to sacrifice some performance for known
bounded memory usage.)
[Reply to this comment]
Tracing JITs have their downsides

Posted Jul 14, 2025 20:54 UTC (Mon) by roc (subscriber, #30627) [Link
] (1 responses)

Mozilla's Spidermonkey invested big in tracing and eventually had to
switch to more traditional approaches because of the trace explosion
problem.

However, the performance expectations for JS are much much higher
than for Python: JS has much more competition, and you can't ever be
much slower than your competitors. The situation in Python is very
different; it's never going to be fast and users can't easily switch
to a better alternative implementation, so eking out small gains by
tracing in limited situations might be the way to go.
[Reply to this comment]
Tracing JITs have their downsides

Posted Jul 15, 2025 8:38 UTC (Tue) by Sesse (subscriber, #53779) [
Link]

I believe that indeed, it was someone from TraceMonkey who pointed
out why tracing JITs don't work that well in practice: "You fall off
the trace".

But fundamentally, what people want is "faster Python", not "a Python
JIT", and even though one's used to JITs giving massive speed boosts,
the latter is not necessarily the most productive way to get to the
former.
[Reply to this comment]
Hyperblock scheduling?

Posted Jul 15, 2025 18:13 UTC (Tue) by DemiMarie (subscriber, #
164188) [Link] (1 responses)

Apparently hyperblock scheduling is a solution to the trace explosion
problem, but to the best of my knowledge no production compiler has
implemented it.
[Reply to this comment]
Hyperblock scheduling?

Posted Jul 15, 2025 21:07 UTC (Tue) by roc (subscriber, #30627) [Link
]

The references to "hyperblock scheduling" that I see online all mean
"turn conditional basic blocks into predication" which only makes
sense to apply selectively, and on traditional CPU architectures,
*very* selectively because of their limited predication support. It's
not going to solve all trace explosion problems.
[Reply to this comment]
Use Julia

Posted Jul 16, 2025 4:13 UTC (Wed) by thomas.poulsen (subscriber, #
22480) [Link] (3 responses)

I think these efforts of optimizing python are obsolete now that
Julia has reached maturity. It "walks like python and runs like C".
https://julialang.org/.
It is not only great for scientific computing, but also has a great
package system (1), plotting libraries (2) and web frameworks (3).
The latency problems of early Julia are mostly a thing of the past
now.
The composeability made possible by multiple dispatch promotes a very
modular package ecosystem.

1. https://docs.julialang.org/en/v1/stdlib/Pkg/
2. https://makie.org/website/
3. https://genieframework.com/

[Reply to this comment]
Use Julia

Posted Jul 16, 2025 7:11 UTC (Wed) by Sesse (subscriber, #53779) [
Link]

Given the amount of Python code out there, even if people completely
stopped writing Python in favor of Julia tomorrow (which isn't going
to happen, sorry), making Python faster would still have a value. I
mean, C is obsolete now that we have Ocaml, but somehow, people are
still interested in faster C compilers :-P
[Reply to this comment]
Use Julia

Posted Jul 16, 2025 16:43 UTC (Wed) by azumanga (subscriber, #90158)
[Link] (1 responses)

I find Julia no-where near as friendly as Python. The most obvious
choice I hit early (but, I was looking at it for teaching a maths
class), is that the default integer type is int64, so you get all the
usual silent truncation issues that causes.

I feel a language which is aiming to be a user friendly python
replacement shouldn't be silently wrapping integers at 2^63. You can
get big ints, but why not default the other way around, where you get
big ints by default, and can choose 64 bit ints if you need them?
[Reply to this comment]
Use Julia

Posted Jul 18, 2025 3:02 UTC (Fri) by thomas.poulsen (subscriber, #
22480) [Link]

One of the cool features of Julia is the REPL-modes.
Here's and example of a mode that does all arithmetic in arbitrary
precision: https://github.com/MasonProtter/ReplMaker.jl?tab=
readme-o...

[Reply to this comment]
Function pointers are not always just a pointer to the instruction

Posted Jul 16, 2025 19:28 UTC (Wed) by jrtc27 (subscriber, #107748) [
Link] (5 responses)

> One caveat is that memory from mmap() comes in page-sized chunks,
which is 4KB on most systems but can be larger. If the JIT code is,
say, four bytes in length, that can be wasteful, so it needs to be
managed carefully. Once you have that memory, he asked, how do you
actually execute it? It turns out that "C lets us do crazy things":
>
> typedef int (*function)(int);
> ((function)data)(42);
>
> That first line creates a type definition named "function", which
is a pointer to a function that takes an integer argument and returns
an integer. The second line casts the data pointer to that type and
then calls the function with an argument of 42 (and ignores the
return value). "It's weird, but it works."

This isn't always true. Most of the time it is, but there are a
couple of cases where it's not.

Firstly, on some architectures, low bits of the address for indirect
jumps, and thus function pointers, are used to indicate which
execution mode the processor should use. For example, on 32-bit Arm,
the LSB is 1 for T32/Thumb and 0 for A32/Arm, and on 32-bit MIPS it
similarly distinguishes between MIPS32 and microMIPS32 (or the prior
MIPS16e which microMIPS32 replaced).

Secondly, some ABIs use function descriptors to represent
language-level function pointers. Here, the function pointer is not a
pointer to the instructions to execute but is a pointer to a
structure that contains such a pointer alongside one or more other
pointers, typically some kind of per-library global pointer. This is
the case on PA-RISC, Itanium, and 64-bit PowerPC if using version 1
of its ELF ABI (version 2 drops this, and is what most modern
distributions use for 64-bit PowerPC), but also in embedded contexts
multiple other ISAs have an ABI variant (sometimes called "FDPIC" for
"function descriptor position-independent code") that does so, since
it allows you to share a single copy of library code between
processes in no-MMU systems.
[Reply to this comment]
Function pointers are not always just a pointer to the instruction

Posted Jul 22, 2025 21:31 UTC (Tue) by anton (subscriber, #25547) [
Link] (4 responses)

Your post makes me appreciate our use goto * (instead of C function
calls) to enter generated machine code in gforth.

As for ARM, we have used the address of the first byte of the code as
target, and that has worked whether the code used T32 or A32. Are you
sure that the mode stuff applies to T32 (Thumb2) and not just to
Thumb1?
[Reply to this comment]
Function pointers are not always just a pointer to the instruction

Posted Jul 23, 2025 9:45 UTC (Wed) by farnz (subscriber, #17727) [
Link]

The mode stuff on ARM applies to all processors that have both T32
and A32 modes. See the BX instruction. If you're jumping to a
register (absolute address), not to an inline offset, the bottom bit
determines whether the destination is T32 or A32 code.

This happens to not be a problem for jumps to intentionally generated
machine code, because T32 instructions must be 16-bit aligned, and
A32 instructions must be 32-bit aligned.
[Reply to this comment]
Function pointers are not always just a pointer to the instruction

Posted Jul 23, 2025 10:01 UTC (Wed) by excors (subscriber, #95769) [
Link] (2 responses)

Thumb-2 is the same. When you use an interworking branch instruction
(`bx`, `blx`, `ldr pc`, `pop {pc}`, etc), the LSB determines whether
the CPU switches into A32 or T32 mode. When you use non-interworking
branches (`b`, `bl`, `mov pc`, etc), the CPU remains in its current
mode, and the bottom 1-2 LSBs of the address are replaced with 0.

In C, function pointers to Thumb functions will have the LSB set to 1
(so they're not the actual address of the instruction in memory), and
the compiler will emit `blx` instructions.

From some quick testing in GCC and Clang, it looks like computed goto
pointers *don't* have the LSB set. If the function is in Thumb mode
then the compiler will emit either `orr r0, #1; bx r0` (setting LSB
before interworked branch) or `mov pc, r0` (non-interworked branch).
You're not allowed to computed-goto between different functions, and
I don't think a single function can use a mixture of A32 and T32
instructions (except with inline assembly etc), so it's safe for the
compiler to assume it's not going to switch mode.
[Reply to this comment]
Function pointers are not always just a pointer to the instruction

Posted Jul 23, 2025 15:08 UTC (Wed) by anton (subscriber, #25547) [
Link] (1 responses)

What I actually see in code compiled with -mthumb (apparently -marm
is the default for gcc-10 at least on Debian; I have seen Thumb2 code
produced by default in earlier times):

orr.w   r3, r1, #1
bx      r3

Yes, if gcc produced values with a set LSB when you do &&mylabel in a
Thumb-compiled function, it could then avoid the orr.w instruction in
the code for goto *. However, Gforth uses the values produced by &&
mylabel for determining where the code snippets (for code-copying)
start and end, so if the value for the label pointed one byte later,
we would need to add a workaround (e.g., force the function to
compile to A32).
[Reply to this comment]
Function pointers are not always just a pointer to the instruction

Posted Jul 23, 2025 16:01 UTC (Wed) by anton (subscriber, #25547) [
Link]

Correction: gcc-10 on Debian still compiles to T32 by default (I
overlooked a -marm in Gforth).
[Reply to this comment]

                  Copyright (c) 2025, Eklektix, Inc.
  This article may be redistributed under the terms of the Creative
                    Commons CC BY-SA 4.0 license
   Comments and public postings are copyrighted by their creators.
          Linux is a registered trademark of Linus Torvalds