[HN Gopher] How often does Python allocate?
___________________________________________________________________
How often does Python allocate?
Author : ingve
Score : 76 points
Date : 2025-11-01 22:34 UTC (5 days ago)
(HTM) web link (zackoverflow.dev)
(TXT) w3m dump (zackoverflow.dev)
| zahlman wrote:
| A caveat applies to the entire analysis that CPython may be the
| reference implementation, but it's still just one implementation.
| This sort of thing may work totally differently in PyPy, and
| especially in implementations that make use of another garbage-
| collecting runtime, such as Jython.
|
| > Let's take out the print statement and see if it's just the
| addition:
|
| Just FWIW: the assignment is _not_ required to prevent optimizing
| out the useless addition. It isn 't doing any static analysis, so
| it doesn't know that `range` is the builtin, and thus doesn't
| know that `i` is an integer, and thus doesn't know that `+` will
| be side-effect-free.
|
| > Nope, it seems there is a pre-allocated list of objects for
| integers in the range of -5 -> 1025. This would account for 1025
| iterations of our loop but not for the rest.
|
| 1024 iterations, because the check is for numbers strictly less
| than `_PY_NSMALLPOSINTS` and the value computed is `i + 1` (so,
| `1` on the first iteration).
|
| Interesting. I knew of them only ranging up to 256
| (https://stackoverflow.com/questions/306313).
|
| It turns out
| (https://github.com/python/cpython/commit/7ce25edb8f41e527ed4...)
| that the change is barely a month old in the repository; so it's
| not in 3.14
| (https://github.com/python/cpython/blob/3.14/Include/internal...)
| and won't show up until 3.15.
|
| > Our script appears to actually be reusing most of the
| PyLongObject objects!
|
| The interesting part is that it can somehow do this even though
| the values are increasing throughout the loop (i.e., to values
| not seen on previous iterations), and it also doesn't need to
| allocate for _the value of `i`_ retrieved from the `range`.
|
| > But realistically the majority of integers in a program are
| going to be less than 2^30 so why not introduce a fast path which
| skips this complicated code entirely?
|
| This is the sort of thing where PRs to CPython are always
| welcome, to my understanding. It probably isn't a priority, or
| something that other devs have thought of, because that
| allocation presumably isn't a big deal compared to the time taken
| for the actual conversion, which in turn is normally happening
| because of some kind of I/O request. (Also, real programs
| probably do simple arithmetic on small numbers much more often
| than they string-format them.)
| petters wrote:
| > that make use of another garbage-collecting runtime, such as
| Jython
|
| I think that is mostly of historical interest. For example, it
| still does not support Python 3 and has not been updated in a
| very long time
| zahlman wrote:
| Ah, sorry to hear it. I'd lost track. It looks like
| IronPython is still active, but way behind. Or rather, maybe
| they have no _interest_ in implementing Python 3 features
| beyond 3.4.
| cogman10 wrote:
| Graalpy [1] is where it's at if you want python on a JVM.
|
| [1] https://www.graalvm.org/python/
| sushibowl wrote:
| With respect to tagged pointers, there seems to be some recent
| movements on that front in CPython:
| https://github.com/python/cpython/issues/132509
| notatallshaw wrote:
| Unfortunately that was posted 1 month before the Faster CPython
| project was disbanded by Microsoft, so I imagine things have
| slowed.
| nu11ptr wrote:
| I admit it may just be because I'm a PL nerd, but I thought it
| was general knowledge that pretty much EVERYTHING in Python is an
| object, and an object in Python is always heap allocated AFAIK.
| This goes deeper than just integers. Things most think of as
| declarative (like classes, modules, etc.) are also objects, etc.
| etc. It is both the best thing (for dynamism and fun/tinkering)
| and worst thing (performance optimization) about Python.
|
| If you've never done it, I recommend using the `dir` function in
| a REPL, finding interesting things inside your objects, do `dir`
| on those, and keep the recursion going. It is a very eye opening
| experience as to just how deep the objects in Python go.
| zbentley wrote:
| Yeah, spelunking around this is a great way to learn a
| language!
|
| That said, "is an object" is a bit of an overloaded term in
| this context. Most things in Python can be _semantically
| addressed_ as objects (they have fields and methods, "dir"
| interrogates them, and so on), but not everything is
| necessarily _backed by the same kind of heap-allocated memory
| object_. Some builtins and bytecode-level optimizations,
| similar to (but not quite the same as) the ones discussed in
| TFA, result in things being less object-like (that is,
| PyObject*-ish structs with refcounting and /or pointer chasing
| on field access) in some cases.
|
| And that's an important difference, not a nitpick! It means
| that performance-interested folks can't always use the same
| intuition for allocation/access behavior that they do when
| determining what fields are available on a syntactic object.
|
| (And before the inevitable "if you want performance don't write
| Python/use native libraries" folks pipe up: people still
| regularly have to care about making plain Python code fast. You
| might wish it weren't so, but it is.)
| nu11ptr wrote:
| Yeah, I probably should have been careful there. Good call
| out that not every "object" is necessarily a heap allocated
| entity. I learned something, thx.
| winstonewert wrote:
| > but not everything is necessarily backed by the same kind
| of heap-allocated memory object.
|
| Do you have an example, I thought literally everything in
| Python did trace to a PyObject*.
| zahlman wrote:
| Quite frankly, as someone with 20+ years of experience I have
| no idea what the basis is for what you're saying here.
|
| > not everything is necessarily backed by the same kind of
| heap-allocated memory object.
|
| `None`, `True` and `False` are; integers are; functions,
| bound methods, properties, classes and modules are... what
| sort of "everything" do you have in mind? Primitive objects
| don't get stored in a special way that can avoid the per-
| object memory overhead, and such objects exist for things
| that can't be used as values at all (e.g. passed to or
| returned from a function) in many other languages.
|
| Some use fields like `tp_slots` in special ways; the
| underlying implementation of their methods is in C and the
| method attributes aren't looked up in the normal way. But
| there is still a PyObject* there.
|
| ... On further investigation (considering the values reported
| by `id`) I suppose that (at least in modern versions) the
| pre-created objects may be in static storage... ? Perhaps
| that has something to do with the implementation of
| https://peps.python.org/pep-0683/ ?
|
| > Some builtins and bytecode-level optimizations
|
| String concatenation may mutate a string object that's
| presented to Python as immutable, so long as there aren't
| other references to it. But it's _still a heap-allocated
| object_. Similarly, Python pre-allocates objects for small
| integer values, but they 're still stored on the heap. Very
| short strings may, along with those integers, go in a special
| memory pool, but that pool is still on the heap and the
| allocation still contains all the standard PyObject fields.
|
| > people still regularly have to care about making plain
| Python code fast.
|
| Can you give examples of the people involved, or explain why
| their use cases are not satisfied by native libraries?
| rogerbinns wrote:
| Heap allocation does also allow other things. For example stack
| frames (PyFrame) are also heap allocated. When there is an
| exception, the C stack gets unwound but the heap allocated
| frames are retained forming the traceback. You can then examine
| the values of the local variables at each level of the
| traceback.
|
| And it also allows async functions, since state is held off the
| C stack, so frames can be easily switched when returning to the
| event loop.
|
| The other thing made easy is C extension authoring. You compile
| CPython without free lists and an address sanitizer, and
| getting reference counting wrong shows up.
| godshatter wrote:
| C gets a lot of crap, sometimes for good reason, but one thing I
| like about it is that the question of whether C is allocating
| something is easy to answer, at least for your own code.
| manwe150 wrote:
| Python is entirely a C program, ergo by this article, this
| seems like one of those fallacies C programs believe justifies
| using C
| alt187 wrote:
| Yeah, if you consider the Python interpreter "your own code".
| bee_rider wrote:
| It is nice.
|
| Although, there are also modern, beautiful, user friendly
| languages where allocation is mostly obvious. Like Fortran.
| mr_mitm wrote:
| Oh my. You don't see Fortran being called modern very often.
| cozzyd wrote:
| Writing C like Fortran (no heap) is even better when you can!
| pklausler wrote:
| This comment is 35 years out of date.
| mgkuhn wrote:
| There are reasons why the same program in Julia can be 60x faster
| than in Python, see e.g. slide 5 in
| https://www.cl.cam.ac.uk/teaching/2526/TeX+Julia/julia-slide...
| for an example.
| jakobnissen wrote:
| The function on that slide is dominated by the call to rand,
| which uses quite different implementations in Julia and Python,
| so may not be the best example.
|
| Julia is compiled and for simple code like that example code
| will have performance on par with C, Rust etc.
| boothby wrote:
| Once upon a time, I wanted to back a stack with a linked list in
| Python. I had been reading a lot of compiled bytecode, and had
| recently learned that CPython is a stack-based language capable
| of unrolling and popping tuples as singular bytecode
| instructions. I also learned about the freelist.
|
| I ended up with the notation Initialization:
| head = () Push: head = data, head Safe Pop:
| if head: data, head = head Safe Top:
| head[0] if head else None
|
| And for many stack-based algorithms, I've found this to be quite
| optimal in part because the length-2 tuples get recycled (also
| due to a lack of function calls, member accesses, etc). But I'm
| rather embarrassed to put it into a codebase due to others'
| expectations that Python should be beautiful and this seems
| weird.
| plant-ian wrote:
| I guess the only downside might be if you need to push a bunch
| of things at once. But maybe just a loop is faster than calling
| extend on a list? This is cool.
| boothby wrote:
| Yep, that's the main exception when I said "many stack-based
| algorithms". It's worth mentioning that examining the stack
| while debugging is annoying because of the number of parens
| involved.
| whilenot-dev wrote:
| A bit beside the point, but this caught my eye:
|
| > Integers are likely the most used data type of any program,
| that means a lot of heap allocations.
|
| I would guess strings come first, then floats, then booleans, and
| then integers. Are there any data available on that?
| nemothekid wrote:
| Given that most string implementations have both a `len` and
| `cap` field (or eve just `len`), means that for every string
| data type, there is one more integer variables being used.
|
| So integer definitely gets used more than strings, and I'd
| argue way more than floats in most programs.
| zahlman wrote:
| In CPython, a string's length is stored internally as a
| machine integer (`Py_ssize_t`); when the `len` builtin is
| called, the corresponding integer object is created (or more
| likely, retrieved from cache) on the fly.
| ragnese wrote:
| Ints probably get a big boost in languages where the only
| built-in for-loop syntax involves incrementing an index
| variable, like C. And, speaking of C, specifically, even the
| non-int types are actually ints or isomorphic to ints: enums,
| bools, char, _pointers_ , _arrays_ (which are just pointers if
| you squint), etc...
|
| But, otherwise, I'd agree that strings probably win, globally.
| tialaramex wrote:
| Actually because of provenance the C _pointers_ are their
| only type which isn 't just basically the machine integers
| again.
|
| A char is just a machine integer with implementation
| specified signedness (crazy), bools are just machine integers
| which aren't supposed to have values other than 0 or 1, and
| the floating point types are just integers reinterpreted as
| binary fractions in a strange way.
|
| Addresses are just machine integers of course, but pointers
| have _provenance_ which means that it matters why you have
| the pointer, whereas for the machine integers their value is
| entirely determined by the bits making them up.
| ragnese wrote:
| It's been a long time since I've done C/C++, but I'm not
| sure what you're saying with regard to provenance. I was
| pretty sure that you were able to cast an arbitrary integer
| value into a pointer, and it really didn't have to "come
| from" anywhere. So, all I'm saying is that, under-the-hood,
| a C pointer really is just an integer. Saying that a
| pointer _means_ something beyond the bits that make up the
| value is no more relevant than saying a bool means
| something other than its integer value, which is also true.
| tialaramex wrote:
| Start your journey here: https://www.open-
| std.org/jtc1/sc22/wg14/www/docs/dr_260.htm
|
| That's defect report #260 against the C language. One
| option for WG14 was to say "Oops, yeah, that should work
| in our language" and then modern C compilers have to be
| modified and many C programs are significantly slower.
| This gets the world you (and you're far from alone among
| C programmers) thought you lived in, though probably your
| C programs are slower than you expected now 'cos your
| "pointers" are now just address integers and you've never
| thought about the full consequences.
|
| But they didn't, instead they wrote "They may also treat
| pointers based on different origins as distinct even
| though they are bitwise identical" because by then that
| is in fact how C compilers work. That's them saying
| pointers have provenance, though they do not describe
| (and neither does their ISO document) how that works.
|
| There is currently a TR (I think, maybe wrong letters)
| which explains PNVI-ae-udi, Provenance Not Via Integers,
| Addresses Exposed, User Disambiguates which is the
| current preferred model for how this could possibly work.
| Compilers don't implement that properly either, but they
| _could_ in principle so that 's why it is seen as a
| reasonable goal for the C language. That TR is not part
| of the ISO standard but in principle one day it could be.
| Until then, provenance is just a vague shrug in C. What
| you said is wrong, and er... yeah that is awkward for
| everybody.
|
| Rust does specify how this works. But the bad news (I
| guess?) for you is that it too says that provenance is a
| thing, so you cannot just go around claiming the address
| you dredged up from who knows where is a valid pointer,
| it ain't. Or rather, in Rust you can write out explicitly
| that you do want to do this, but the specification is
| clear that you get a pointer but not necessarily a
| _valid_ pointer even if you expected otherwise.
| Joker_vD wrote:
| Shouldn't having a freelist and pushing and popping blocks of
| memory to and from it for re-use count as a form of memory
| allocation?
| jonhohle wrote:
| I've tried optimizing some python that's in the hot path of a
| build system and a few dozen operations out of over 2K nodes in
| the ninja graph account for 25-30% of the total build time.
|
| I've found python optimization to be nearly intractable. I've
| spent a significant amount of time over the past two decades
| optimizing C, Java, Swift, Ruby, SQL and I'm sure more. The
| techniques are largely the same. In Python, however, everything
| seems expensive. Field lookup on an object, dynamic dispatch,
| string/array concatenation. After optimization, the code is no
| longer "pythonic" (which has come to mean slow, in my
| vernacular).
|
| Are there any good resources on optimizing python performance
| while keeping idiomatic?
| ragnese wrote:
| > Are there any good resources on optimizing python performance
| while keeping idiomatic?
|
| At the risk of sounding snarky and/or unhelpful, in my
| experience, the answer is that you don't try to optimize Python
| code beyond fixing your algorithm to have better big-O
| properties, followed by calling out to external code that isn't
| written in Python (e.g., NumPy, etc).
|
| But, I'm a hater. I spent several years working with Python and
| hated almost every minute of it for various reasons. Very few
| languages repulse me the way Python does: I hate the syntax,
| the semantics, the difficulty of distribution, and the
| performance (memory and CPU, and is GIL disabled by default
| yet?!)...
| CraigJPerry wrote:
| >> optimizing python performance while keeping idiomatic?
|
| That's impossible[1].
|
| I think it is impossible because when i identify a slow
| function using cProfile then use dis.dis() on it to view the
| instructions executed, most of the overhead - by that i mean
| time spent doing something other than the calculation the code
| describes - is spent determining what each "thing" is. It's all
| trails of calls trying to determine "this thing can't be
| __add__() to that thing but maybe that thing can be __radd()__
| to this thing instead. Long way to say most of the time wasting
| instructions i see can be attacked by providing ctypes or some
| other approach like that (mypyc, maybe cython, etc etc) - but
| at this point you're well beyond "idiomatic".
|
| [1] I'm really curious to know the answer to your question so
| i'll post a confident (but hopefully wrong) answer so that
| someone feels compelled to correct me :-)
| kingstnap wrote:
| You can't really. The slow parts of computers are memory and
| branches.
|
| Branches cannot be reasoned about since they are all in the
| Cpython state machine.
|
| Memory performance is about being efficient with allocations,
| cache locality, and pointer chasing.
|
| Python has huge amounts of allocations and pointer chasing. Its
| by design slow.
| jerf wrote:
| "Are there any good resources on optimizing python performance
| while keeping idiomatic?"
|
| It is essentially impossible, "essentially" here using not the
| modern sense of "mostly", but "essential" as in _baked into the
| essence of the language_. It has been the advice in the Python
| community pretty much since the beginning that the solution is
| to go to another language in that case. There are a number of
| solutions to the problem, ranging from trying PyPy,
| implementing an API in another language, Cython /Pyrex, up to
| traditional embedding of a C/C++ program into a Python module.
|
| However this is one of those cases where there are a lot of
| solutions precisely because none of them are quite perfect and
| all have some sort of serious downside. Depending on what you
| are doing, you may find one whose downside you don't care
| about. But there's no simple, bullet-proof cookbook answer for
| "what do I do when Python is too slow even after basic
| optimization".
|
| Python is fundamentally slow. Too many people still hear that
| as an attack on the language, rather than an engineering fact
| that needs to be kept in mind. Speed isn't everything, and as
| such, Python is suitable for a wide variety of tasks even so.
| But it is, still, fundamentally slow, and those who read that
| as an attack rather than an engineering assessment are more
| likely to find themselves in quite a pickle (pun somewhat
| intended) one day when they have a mass of Python code that
| isn't fast enough and no easy solutions for the problem than
| those who understand the engineering considerations in choosing
| Python.
| its-summertime wrote:
| Of solutions, Numba, Nuitka, are probably also worth
| mentioning.
|
| Then just looking at those, I now know of Shedskin and
| ComPyler.
|
| I do feel like one nice thing about so many people working on
| solutions to every problem in python, is that it means that
| when you do encounter a serious downside, you have a lot more
| flexibility to move forwards along a different path.
| franktankbank wrote:
| Python is a superior bash that still wants a basic shell to
| run in.
| shoo wrote:
| I agree with this. People advocate for "pick a better
| algorithm", and sometimes that can help dramatically, but at
| times the best algorithm implemented in python is still too
| slow, but can be made 500x faster by reimplementing the exact
| same algorithm in cython or C or fortran or so on.
|
| Python is a great language for rapidly bashing out
| algorithmic code, glue scripts, etc, unfortunately due to how
| dynamic it is, it is a language that fundamentally doesn't
| translate well to operations CPUs can perform efficiently.
| Hardly any python programs ever need to be as dynamic as what
| the language allows.
|
| I've had very good experiences applying cython to python
| programs that need to do some kind of algorithmic number
| crunching, where numpy alone doesn't get the job done.
|
| With cython you start with your python code and incrementally
| add static typing that reduces the layers of python
| interpreter abstractions and wrappings necessary. Cython has
| a very useful and amusing output where it spits out a html
| report of annotated cython source code, with lines
| highlighted in yellow in proportion to the amount of python
| overhead. you click on any line of python in that report and
| it expands to show how many Cpython API operations are
| required to implement it, and then you add more static type
| hints and recompile until the yellow goes away and the
| compute heavy kernel of your script is a C program that
| compiles to operations that real world CPUs can execute
| efficiently.
|
| Downside of cython is the extra build toolchain and
| deployment concerns it drags in - if you previously had a
| pure python module, now you've got native modules so you need
| to bake platform specific wheels for each deployment target.
|
| For Python that's being used as a glue scripting language,
| not number crunching, worth considering rewriting the script
| in something like Go. Go has a pretty good standard library
| to handle many tasks without needing to install many 3rd
| party packages, and the build, test, deploy story is very
| nice.
| Figs wrote:
| Depending on the nature of your code, just throwing pypy at it
| instead of cpython can be a _huge_ win. This very much depends
| on what you 're doing though. For example, I have a graphics
| editor I wrote that is _unbearably_ slow with cpython; with
| pypy and no other changes it 's usable.
|
| If what you're trying to do involves tasks that can be done in
| parallel, the multithreading (if I/O bound) or multiprocessing
| (if compute bound) libraries can be very useful.
|
| If what you're doing isn't conducive to either, you probably
| need to rewrite at least the critical parts in something else.
| thomasmg wrote:
| Pypy is great for performance. I'm writing my own programming
| language (that transpiles to C) and for this purpose
| converted a few benchmarks to some popular languages (C,
| Java, Rust, Swift, Python, Go, Nim, Zig, V). Most languages
| have similar performance, except for Python, which is about
| 50 times slower [1]. But with PyPy, performance is much
| better. I don't know the limitations of PyPy because these
| algorithms are very simple.
|
| But even thougt Python is very slow, it is still very
| popular. So the language itself must be very good in my view,
| otherwise fewer people would use it.
|
| [1] https://github.com/thomasmueller/bau-
| lang/blob/main/doc/perf...
| pphysch wrote:
| > I've found python optimization to be nearly intractable.
|
| Move your performance critical kernels outside Python. Numpy,
| Pytorch, external databases, Golang microservice, etc.
|
| It's in fact an extremely successful paradigm. Python doesn't
| need to be fast to be used in every domain (though async/nogil
| is still worth advancing to avoid idle CPUs).
| tialaramex wrote:
| However note that (and this isn't just true for Python) if
| your "performance critical kernel" is "the whole program"
| then this does have the implication you think it does, you
| should not write Python.
|
| If your Python software is spending 99% of its time doing X,
| you can rewrite X in Rust and now the Python software using
| that is faster, but if your Python software is spending 7% of
| its time doing X, and 5% doing Y, and 3% doing Z then even if
| you rewrite X, and Y and Z you've shaved off no more than 15%
| and so it's probably time to stop writing Python at all.
|
| This is part of why Google moved to Go as I understand it.
| paulddraper wrote:
| > In Python, however, everything seems expensive.
|
| And that wasn't true in Ruby(?!)
|
| The first thing to do when optimizing Python is to use a JIT
| (PyPy). Like, yes Python is slow if you don't allow a JIT.
|
| Alternatively, go the other JIT-less direction but use C (and
| AOT language).
| th0ma5 wrote:
| I spent a decade optimizing Python data science projects for a
| Fortune 100. I agree with the other sentiments that you can't
| really optimize Python itself because of these weird behaviors
| that seem to change the more you look at them... Mostly all of
| my optimization work was generic computer science stuff across
| the entire system and, in the data science world, this usually
| meant optimizing the database queries or the io throughput, or
| the overall shape of the kind of work that they were doing in
| order to parallelize it, or figure out how to simply do way
| less, etc. If someone actually came up to me with an actual
| problem with a Python code and it was the Python code that was
| not performing correctly, I would pretty immediately say let's
| use something else. I think the efforts to make a more
| efficient Python are really great, but I think the extreme
| utility of Python as a light glue to pull together lots of
| other things that are performing well in other computer
| languages is a widely recognized strength of the Python
| ecosystem.
| weinzierl wrote:
| Not sure how well-known this is, but you can run Python with a
| different allocator by LD_PRELOADing it. We had good results
| using jemalloc this way.
| melenaboija wrote:
| > I will never forget Rust for...
|
| Fuck, this made me feel old. It blows my mind that the learning
| path Rust then Python even exists.
|
| When I started school in 2000, it was C, then C++, and then Java.
| Oh, and then compilers. It's impossible for me not to think about
| allocation when doing anything, it's just rooted in my mind.
| zozbot234 wrote:
| Rust is a lot _simpler_ than C++ and arguably Python itself, so
| it 's not a huge surprise that it turns out to be viable as a
| first programming language. Especially since some parts of it
| can be taught as pretty much a pure functional language at the
| outset (actually this works quite well, right up until you
| really need interior mutability - which is not that often),
| which makes for a good intuition of what keeps the borrow
| checker happy.
___________________________________________________________________
(page generated 2025-11-06 23:01 UTC)