hngopher.com

       [HN Gopher] Jemalloc Postmortem
       ___________________________________________________________________
        
       Jemalloc Postmortem
        
       Author : jasone
       Score  : 687 points
       Date   : 2025-06-13 01:37 UTC (21 hours ago)
        
 (HTM) web link (jasone.github.io)
 (TXT) w3m dump (jasone.github.io)
        
       | gdiamos wrote:
       | Congrats on the great run and the future. Jemalloc was an
       | inspirational to many memory allocators.
        
         | kstrauser wrote:
         | I was using FreeBSD back when jemalloc came along, and it blew
         | my mind to imagine swapping out just that one (major) part of
         | its libc. Honestly, it hadn't occured to me, and made me wonder
         | what else we could wholesale replace.
        
       | Twirrim wrote:
       | Oh that's interesting. jemalloc is the memory allocator used by
       | redis, among other projects. Wonder what the performance impact
       | will be if they have to change allocators.
        
         | dpe82 wrote:
         | Why would they have to change? Sometimes software development
         | is largely "done" and there isn't much more you need to do to a
         | library.
        
           | jeffbee wrote:
           | For an example of why an allocator is a maintenance
           | treadmill, consider that C++ recently (relatively) added
           | sized delete, and Linux recently gained transparent huge
           | pages.
        
             | Twirrim wrote:
             | It's been 14 years since THP got added to the kernel[1],
             | surely we're past calling that "recent" :)
             | 
             | https://www.kernelconfig.io/config_transparent_hugepage
        
               | jeffbee wrote:
               | But if they'd declared the allocators "done" _15_ years
               | ago, then you wouldn 't have it.
        
             | senderista wrote:
             | Another example is rseq (which was originally implemented
             | for tcmalloc).
        
           | dymk wrote:
           | Technology marches on, and in some number of years other
           | allocators will exist that outperform/outfeature jemalloc.
        
             | jcelerier wrote:
             | This number of years depending on your allocation profile
             | could be something like -10 years easily. New allocators
             | constantly crop up
        
             | edflsafoiewq wrote:
             | Presumably then the performance impact of any switch will
             | be positive.
        
           | Analemma_ wrote:
           | Memory allocators are something I expect to rapidly degrade
           | in the absence of continuous updates as the world changes
           | underneath you. Changing page sizes, new ucode latencies, new
           | security features etc. all introduce either outright breakage
           | or at least changing the optimum allocation strategy and
           | making your old profiling obsolete. Not to mention the
           | article already pointed out one instance where a software
           | stack (KDE, in that case) used allocation profiles that broke
           | an earlier version completely. Even though that's fixed now,
           | any language runtime update or new feature could introduce a
           | new allocation style that grinds you down.
           | 
           | As much as it's nice to think software can be done, I think
           | something so closely tied to the kernel _and_ hardware _and_
           | the application layer, which all change constantly, never can
           | be.
        
             | binary132 wrote:
             | "Software is just done sometimes" is a common refrain I see
             | repeated among communities where irreplaceable software
             | projects are often abandoned. The community consensus has a
             | tendency to become "it is reliable and good enough, it must
             | be done".
        
           | poorman wrote:
           | Jemalloc is used as an easy performance boost probably by
           | every major Ruby on Rails server.
        
           | Twirrim wrote:
           | While I certainly wish that more software would reach a
           | "done" stage, I don't think jemalloc is necessarily there
           | yet. Unfortunately I'm aware of there being bugs in the
           | current version of jemalloc, when run in certain environment
           | configurations, including memory leaks. I know the folks that
           | found it were looking to report it, but I guess that won't
           | happen now.
           | 
           | Even from a quick look at the open issues, I can see
           | https://github.com/jemalloc/jemalloc/issues/2838, and
           | https://github.com/jemalloc/jemalloc/issues/2815 as two
           | examples, but there's a fair number of issues still open
           | against the repository.
           | 
           | So that'll leave projects like redis & valkey with some
           | decisions to make.
           | 
           | 1) Keep jemalloc and accept things like memory leak bugs
           | 
           | 2) Fork and maintain their own version of jemalloc.
           | 
           | 3) Spend time replacing it entirely.
           | 
           | 4) Hope someone else picks it up?
        
             | senderista wrote:
             | jemalloc is used enough at Amazon that it would make sense
             | for them to maintain it, but that's not really their style.
        
           | burnt-resistor wrote:
           | Some people believe everything must always be constantly
           | tweaked, redone, broken and fixed, and churned for no reason.
           | The only things that need to be fixed in mature, working
           | software are bugs and security issues. It doesn't magically
           | stop working or get "stale" unless dependencies, the OS, or
           | build tools break.
        
           | almostgotcaught wrote:
           | > Sometimes software development is largely "done"
           | 
           | Lol absolutely not
        
         | spookie wrote:
         | Firefox as well.
        
         | perbu wrote:
         | Back in 2008-2009 I remember the Varnish project struggled with
         | what looked very much like a memory leak. Because of the
         | somewhat complex way memory was used, replacing the Glibc
         | malloc with jemalloc was an immediate improvement and removed
         | the leak-like behavior.
        
         | technion wrote:
         | I know through years of looking at Ruby on Rails performance a
         | commonly cited quick win was to run with jemalloc.
        
         | swinglock wrote:
         | Last I checked Redis used their own fork of jemalloc. It may
         | not even be updated to the latest release.
        
       | jeffbee wrote:
       | The article mentioned the influence of large-scale profiling on
       | both jemalloc and tcmalloc, but doesn't mention mimalloc. I
       | consider mimalloc to be on par with these others, and now I am
       | wondering whether Microsoft also used large scale profiling to
       | develop theirs, or if they just did it by dead reckoning.
        
         | bch wrote:
         | How does mimalloc telemetry compare to jemalloc?
        
       | poorman wrote:
       | How cool would it be to see Doug Lea pick up the torch and create
       | a modern day multi-threaded dlmalloc2!?
        
         | ecshafer wrote:
         | dl is just an observer on the open jdk governance board now, so
         | he might have enough time.
        
       | nevon wrote:
       | I very recently used jemalloc to resolve a memory fragmentation
       | issue that caused a service to OOM every few days. While jemalloc
       | as it is will continue to work, same as it does today, I wonder
       | what allocator I should reach for in the future. Does anyone have
       | any experiences to share regarding tcmalloc or other allocators
       | that aim to perform better than stock glibc?
        
         | kev009 wrote:
         | snmalloc
        
         | sanxiyn wrote:
         | mimalloc is a good choice. CPython recently switched to
         | mimalloc.
        
         | beyonddream wrote:
         | Try mimalloc. I have prototyped a feature on top of mimalloc
         | and while effort was a dead end, the code (this was around
         | 2020) was nicely written and well maintained and it was fun to
         | hack on it. When I swapped jemalloc in our system with
         | mimalloc, it was on par if not better when it comes to
         | fragmentation growth control and heap usage perspective.
        
       | meisel wrote:
       | I believe there's no other allocator besides jemalloc that can
       | seamlessly override macOS malloc/free like people do with
       | LD_PRELOAD on Linux (at least as of ~2020). jemalloc has a very
       | nice zone-based way of making itself the default, and manages to
       | accommodate Apple's odd requirements for an allocator that have
       | tripped other third-party allocators up when trying to override
       | malloc/free.
        
         | adgjlsfhk1 wrote:
         | I believe mimalloc works here (but might be wrong).
        
         | glandium wrote:
         | Note this requires hackery that relies on Apple not changing
         | things in its system allocator, which has happened at least
         | twice IIRC.
        
       | chubot wrote:
       | Nice post -- so does Facebook no longer use jemalloc at all? Or
       | is it maintenance mode?
       | 
       | Or I wonder if they could simply use tcmalloc or another
       | allocator these days?
       | 
       |  _Facebook infrastructure engineering reduced investment in core
       | technology, instead emphasizing return on investment._
        
         | Svetlitski wrote:
         | As of when I left Meta nearly two years ago (although I would
         | be absolutely shocked if this isn't still the case) Jemalloc is
         | _the_ allocator, and is statically linked into every single
         | binary running at the company.
         | 
         | > Or I wonder if they could simply use tcmalloc or another
         | allocator these days?
         | 
         | Jemalloc is very deeply integrated there, so this is a lot
         | harder than it sounds. From the telemetry being plumbed through
         | in Strobelight, to applications using every highly Jemalloc-
         | specific extension under the sun (e.g. manually created arenas
         | with custom extent hooks), to the convergent evolution of
         | applications being written in ways such that they perform
         | optimally with respect to Jemalloc's exact behavior.
        
         | charcircuit wrote:
         | Meta has a fork that they still are working on, where
         | development is continuing.
         | 
         | https://github.com/facebook/jemalloc
        
           | burnt-resistor wrote:
           | They take everything FLOSS and ruin it with bureaucracy,
           | churn, breakage, and inconsideration to external use. They
           | may claim FOSS broadly but it's mostly FOSS-washed, unusable
           | garbage except for a few popular things.
        
             | umanwizard wrote:
             | React, PyTorch, and RocksDB are all extremely significant.
             | Not to mention them being one of the biggest contributors
             | to the Linux kernel.
        
           | nh2 wrote:
           | The point of the blog post is that repo is over-focused on
           | Facebook's needs instead of "general utility":
           | 
           | > as a result of recent changes within Meta we no longer have
           | anyone shepherding long-term jemalloc development with an eye
           | toward general utility
           | 
           | > we reached a sad end for jemalloc in the hands of
           | Facebook/Meta
           | 
           | > Meta's needs stopped aligning well with those of external
           | uses some time ago, and they are better off doing their own
           | thing.
        
             | nh2 wrote:
             | But I'd like to know exactly what that means.
             | 
             | How can I find out if Facebook's focus is aligned with my
             | own needs?
        
         | anonymoushn wrote:
         | The big recent change is that jemalloc no longer has any of its
         | previous long-term maintainers. But it is receiving more
         | attention from Facebook than it has in a long time, and I am
         | somewhat optimistic that after some recent drama where some of
         | that attention was aimed in a counterproductive direction that
         | the company can aim the rest of it in directions that Qi and
         | Jason would agree with, and that are well aligned with the
         | needs of external users.
        
       | kstrauser wrote:
       | I've wondered about this before but never when around people who
       | might know. From my outsider view, jemalloc looked like a strict
       | improvement over glibc's malloc, according to all the benchmarks
       | I'd seen when the subject came up. So, why isn't it the default
       | allocator?
        
         | sanxiyn wrote:
         | As far as I know there is no technical reason why jemalloc
         | shouldn't be the default allocator. In fact, as pointed out in
         | the article, it IS the default allocator on FreeBSD. My
         | understanding is it is largely political.
        
           | kstrauser wrote:
           | Now that I think about it, I could easily imagine it being
           | left out of glibc because it doesn't build on Hurd or
           | something.
        
             | lloeki wrote:
             | > I could easily imagine it being left out of glibc because
             | [...]
             | 
             | ... its license is BSD-2-Clause ;)
             | 
             | hence "political"
        
               | vkazanov wrote:
               | Huh? Bsd-style licenses are fully compatible with gpl.
               | 
               | The problem is exactly this: Facebook becomes the
               | upstream of a key part of your system.
               | 
               | And Facebook can just walk away from the project. Like it
               | did just now.
        
               | lloeki wrote:
               | They are compatible but that's not the point.
               | 
               | If it were included it would instantly become a LGPL
               | hard-fork because of any subsequently added line of code,
               | if not by "virality" of the glibc license, at least
               | because any glibc author code addition would be LGPL, per
               | GNU project policy/ideology.
               | 
               | Also also this would he a hard bar to pass:
               | https://sourceware.org/glibc/wiki/CopyrightFSForDisclaim
               | 
               | As I recall this is what prevented Apple from
               | contributing C blocks+ back to upstream GCC.
               | 
               | + https://github.com/lloeki/cblocks-clobj
        
               | vkazanov wrote:
               | What prevents apple from working with gpl-style licenses
               | is strict hatred towards code that they can't use without
               | opensourcing it. So this is what prevents them from
               | contributing to gpl projects: the need to control access
               | to code.
               | 
               | Llvm is OK for them from this point of view: upstream is
               | open but they can maintain and distribute their
               | proprietary fork.
        
               | lloeki wrote:
               | > What prevents apple from working with gpl-style
               | licenses is strict hatred towards code that they can't
               | use without opensourcing it.
               | 
               | Specifically regarding the C blocks feature introduced in
               | Snow Leopard, as I recall, Apple wrote implementations
               | for _both_ clang and gcc, attempted to upstream the gcc
               | patchset, said gcc patchset was obviously under a GPL
               | license, but the GCC team threw a fit because it wanted
               | the code copyright to be attributed to the FSF, and that
               | ended up as a stalemate.
               | 
               | If there was any hatred they could literally have skipped
               | the whole gcc implementation + patchset upstreaming
               | attempt altogether. Also they did have patchsets of
               | various sizes on other projects, whose code ends up
               | obviously being GPL as well.
               | 
               | The "hatred" came later with the GPLv3 family and the
               | patent clause, which is a legal landmine, the FSF stating
               | that signing apps is incompatible with the GPLv3, and
               | getting hung up on copyright transfer.
               | 
               | From https://lwn.net/Articles/405417/
               | 
               | > Apple's motives may not be pure, but it has published
               | the code under the license required and it's the FSF's
               | own copyright assignment policies that block the
               | inclusion. The code is available and licensed
               | appropriately for the version of GCC that Apple adopted.
               | It might be nice if Apple took the further step of
               | assigning copyright to the FSF, but the GPLv3 was not
               | part of the bargain that Apple agreed to when it first
               | started contributing to GCC.
               | 
               | The intent behind such copyright transfer is generally so
               | that the recipient of the transfer can relicense without
               | having to ask all contributors. Essentially as a
               | contributor agreeing to a transfer means ceding up
               | control on the license that you initially contributed
               | under.
               | 
               | Read another way:
               | 
               | - the FSF says "this code is GPLv2"
               | 
               | - someone contributes under that GPLv2 promise, cedes
               | copyright to the FSF because it's "the process"
               | 
               | - the FSF says "this code is now GPLv3 exclusively"
               | 
               | - that someone says "but that was not the deal!"
               | 
               | - the FSF says "I am altering the deal, pray I don't
               | alter it any further."
        
               | Y_Y wrote:
               | Big evil FSF, always trying to extract value and increase
               | their stock price.
        
               | lloeki wrote:
               | That was tongue-in-cheek, I thought Darth Vader's voice
               | was enough of a cue.
               | 
               | License switches do happen though, and are the source of
               | outrage. Cue redis.
               | 
               | The cause of transferring copyright is often practical
               | (hard to track down + reach out to + gather answers from
               | all authors-slash-contributors which hampers some
               | critical decisions down the road); for the FSF it's
               | ideological (GCC source code must remain under sole FSF
               | control).
               | 
               | The consequence of the transfer though is not well
               | understood by authors forfeiting their copyright: they
               | essentially agree to work for free for whatever the
               | codebase ends up being licensed to in the future,
               | including possibly becoming entirely closed source.
               | 
               | Think of it next time you sign a CLA!
        
               | Y_Y wrote:
               | Apologies. I am generally against CLAs, and I think it's
               | shitty of GNU/FSF to use them, even if they promise to
               | only do good and free things.
        
               | vkazanov wrote:
               | Fsf is a non-profit organisation that proved numerous
               | times that the point of its existence is making sure that
               | I and we and you have freedom to change things we own.
               | 
               | I contribute and transfer copyright to them for my
               | contributions for this sole reason.
               | 
               | Apple is not about freedoms at all.
               | 
               | That's OK. I mean, these are the meanings of what both
               | orgs do. Understanding the system, I'd rather spend my
               | free time on fsf cause (and make money in a commercial
               | organisation).
        
         | jeffbee wrote:
         | These allocators often have higher startup cost. They are
         | designed for high performance in the steady state, but they can
         | be worse in workloads that start a million short-lived
         | processes in the unix style.
        
           | kstrauser wrote:
           | Oh, interesting. If that's the case, I can see why that'd be
           | a bummer for short-lived command line tools. "Makes ls run
           | 10x slower" would not be well received. OTOH, FreeBSD uses it
           | by default, and it's not known for being a sluggish OS.
        
         | favorited wrote:
         | Disclaimer: I'm not an allocator engineer, this is just an
         | anecdote.
         | 
         | A while back, I had a conversation with an engineer who
         | maintained an OS allocator, and their claim was that custom
         | allocators tend to make one process's memory allocation faster
         | at the expense of the rest of the system. System allocators are
         | less able to make allocation fair holistically, because one
         | process isn't following the same patterns as the rest.
         | 
         | Which is why you see it recommended so frequently with
         | services, where there is generally one process that you want to
         | get preferential treatment over everything else.
        
           | jeffbee wrote:
           | I don't think that's really a position that can be defended.
           | Both jemalloc and tcmalloc evolved and were refined in
           | antagonistic multitenant environments without one
           | overwhelming application. They are optimal for that exact
           | thing.
        
             | favorited wrote:
             | It's possible that they were referring to something
             | specific about their platform and its system allocator, but
             | like I said it was an anecdote about one engineer's
             | statement. I just remember thinking it sounded fair at the
             | time.
        
               | vlovich123 wrote:
               | The "system" allocator is managing memory within a
               | process boundary. The kernel is responsible for managing
               | it across processes. Claiming that a user space allocator
               | is greedily inefficient is voodoo reasoning that suggests
               | the person making the claim has a poor grasp of
               | architecture.
        
               | jdsully wrote:
               | The "greedy" part is likely not releasing pages back to
               | the OS in a timely manner.
        
               | nicoburns wrote:
               | That seems odd though, seeing as this is one of the main
               | criticisms of glibc's allocator.
        
               | jeffbee wrote:
               | In the containerized environments where these allocators
               | were mainly developed, it is all but totally pointless to
               | return memory to the kernel. You might as well keep
               | everything your container is entitled to use, because
               | it's not like the other containers can use it. Someone or
               | some automatic system has written down how much memory
               | the container is going to use.
        
               | toast0 wrote:
               | Returning no longer used anonymous memory is not without
               | benefits.
               | 
               | Returning pages allows them to be used for disk cache.
               | They can be zeroed in the background by the kernel which
               | may save time when they're needed again, or zeroing can
               | be avoided if the kernel uses them as the destination of
               | a full page DMA write.
               | 
               | Also, returning no longer used pages helps get closer to
               | a useful memory used measurement. Measuring memory usage
               | is pretty difficult of course, but making the numbers a
               | little more accurate helps.
        
               | jeffbee wrote:
               | There are shared resources involved though, for example
               | one process can cause a lot of traffic in khugepaged.
               | However I would point out that is an endemic risk of
               | Linux's overall architecture. Any process can cause chaos
               | by dirtying pages, or otherwise triggering reclaim.
        
               | favorited wrote:
               | For context, the "allocator engineer" I was talking to
               | was a kernel engineer - they have an extremely solid
               | grasp of their platform's architecture.
               | 
               | The whole advantage of being the platform's system
               | allocator is that you can have a tighter relationship
               | between the library function and the kernel
               | implementation.
        
             | lmm wrote:
             | > Both jemalloc and tcmalloc evolved and were refined in
             | antagonistic multitenant environments without one
             | overwhelming application. They are optimal for that exact
             | thing.
             | 
             | They were mostly optimised on Facebook/Google server-side
             | systems, which were likely one application per VM, no?
             | (Unlike desktop usage where users want several applications
             | to run cooperatively). Firefox is a different case but
             | apparently mainline jemalloc never matched Firefox
             | jemalloc, and even then it's entirely plausible that
             | Firefox benefitted from a "selfish" allocator.
        
               | jeffbee wrote:
               | Google runs dozens to hundreds of unrelated workloads in
               | lightweight containers on a single machine, in "borg".
               | Facebook has a thing called "tupperware" with the same
               | property.
        
           | mort96 wrote:
           | The only way I can see that this would be true is if a custom
           | allocator is worse about unmapping unused memory than the
           | system allocator. After all, processes aren't sharing one
           | heap, it's not like fragmentation in one process's address
           | space is visible outside of that process... The only aspects
           | of one process's memory allocation that's visible to other
           | processes is, "that process uses N pages worth of resident
           | memory so there's less available for me". But one of the
           | common criticisms against glibc is that it's often really bad
           | at unmapping its pages, so I'd think that most custom
           | allocators are _nicer_ to the system?
           | 
           | It would be interested in hearing their thoughts directly,
           | I'm also not an allocator engineer and someone who maintains
           | an OS allocator probably knows wayyy more about this stuff
           | than me. I'm sure there's some missing nuance or context or
           | which would've made it make sense.
        
         | o11c wrote:
         | For a long time, one of the major problems with alternate
         | allocators is that they would _never_ return free memory back
         | to the OS, just keep the dirty pages in the process. This did
         | eventually change, but it remains a strong indicator of
         | different priorities.
         | 
         | There's also the fact that ... a lot of processes only ever
         | have a single thread, or at most have a few background threads
         | that do very little of interest. So all these "multi-threading-
         | first allocators" aren't actually buying anything of value, and
         | they do have a lot of overhead.
         | 
         | Semi-related: one thing that most people never think about: it
         | is exactly the same amount of work for the kernel to zero a
         | page of memory (in preparation for a future mmap) as for a
         | userland process to zero it out (for its own internal reuse)
        
           | vlovich123 wrote:
           | That's actually particular try to alternate allocators and
           | not true for glibc if I recall correctly (it's much worse at
           | returning memory).
        
           | senderista wrote:
           | > Semi-related: one thing that most people never think about:
           | it is exactly the same amount of work for the kernel to zero
           | a page of memory (in preparation for a future mmap) as for a
           | userland process to zero it out (for its own internal reuse)
           | 
           | Possibly more work since the kernel can't use SIMD
        
             | LtdJorge wrote:
             | Why is that? Doesn't Linux use SIMD for the crypto
             | operations?
        
               | dwattttt wrote:
               | Allowing SIMD instructions to be used arbitrarily in
               | kernel actually has a fair penalty to it. I'm not sure
               | what Linux does specifically, but:
               | 
               | When a syscall is made, the kernel has to backup the user
               | mode state of the thread, so it can restore it later.
               | 
               | If any kernel code could use SIMD registers, you'll have
               | to backup and restore that too, and those registers get
               | big. You could easily be looking at adding a 1kb copy to
               | every syscall, and most of the time it wouldn't be
               | needed.
        
               | kstrauser wrote:
               | Why is that? Couldn't there be push_simd()/pop_simd()
               | that the syscall itself uses around its SIMD calls?
               | 
               | If no syscalls use SIMD today, I'd think we're starting
               | from a safe position.
        
               | durrrrrrrrrrrrr wrote:
               | push_simd/pop_simd exist and are called
               | kernel_fpu_begin/kernel_fpu_end. Their use is practically
               | prohibited in most areas and iiuc not available on all
               | archs, but it's available if needed.
        
               | kstrauser wrote:
               | Today I learned. Thanks!
        
               | durrrrrrrrrrrrr wrote:
               | It's not so much that you can't ever use it, it's more a
               | you really shouldn't. It's more expensive, harder to use
               | and rarely worth it. Main users currently are crypto and
               | raid checksumming.
               | 
               | https://www.kernel.org/doc/html/next/core-api/floating-
               | point...
        
         | toast0 wrote:
         | It is on FreeBSD. :P Change your malloc, change your life? May
         | as well change your libc while you're there and use FreeBSD
         | libc too, and that'll be easier if you also adopt the FreeBSD
         | kernel.
         | 
         | I will say, the Facebook people were very excited to share
         | jemalloc with us when they acquired my employer, but we were
         | using FreeBSD so we already had it and thought it was normal.
         | :)
        
         | b0a04gl wrote:
         | jemalloc's been battle tested in prod at scale, its license is
         | permissive, and performance wins are known. so what exactly are
         | we protecting by clinging to glibc malloc? ideological purity?
         | legacy inertia? who's actually benefiting from this status quo,
         | and why do we still pretend it's about "compatibility"?
        
       | skeptrune wrote:
       | Kind of nuts that he worked on Jemalloc for over a decade while
       | having personal preference for garbage collection. I'm surprised
       | he doesn't have more regret.
        
         | kstrauser wrote:
         | Why are those two mutually exclusive? I'd think that a high
         | performance allocator would be especially crucial in the
         | implementation of a fast garbage collected language. For
         | example, in Python you can't alloc(n * sizeof(obj)) to reserve
         | that much contiguous space for n objects. Instead, you use the
         | builtins which isolate you from that low-level bookkeeping.
         | Those builtins have to be pretty fast or performance would be
         | terrible.
        
         | fermentation wrote:
         | A job is a job
        
       | dikei wrote:
       | I still remember the day when I used jemalloc debug features to
       | triage and resolve some nasty memory bloat issues in our code
       | that use RockDB.
       | 
       | Good times.
        
       | userbinator wrote:
       | A bad choice of title, as "postmortem" made me think there was
       | some severe outage caused by jemalloc.
        
         | chrisweekly wrote:
         | Well, that's not the only meaning of "postmortem". The fine
         | article does open with,
         | 
         |  _" The jemalloc memory allocator was first conceived in early
         | 2004, and has been in public use for about 20 years now. Thanks
         | to the nature of open source software licensing, jemalloc will
         | remain publicly available indefinitely. But active upstream
         | development has come to an end. This post briefly describes
         | jemalloc's development phases, each with some success/failure
         | highlights, followed by some retrospective commentary."_
        
         | stingraycharles wrote:
         | I think this implies your understanding of the term "post-
         | mortem" is incorrect, rather than the title.
        
           | drysine wrote:
           | Or maybe not
        
         | runevault wrote:
         | postmortem is looking back after an event. That can be a
         | security event/outage, it can also be the completion of a
         | project (see: game studios often do postmortems once their game
         | is out to look back on what went wrong and right between
         | preproduction, production, and post launch).
        
           | gilgoomesh wrote:
           | It's weird that we use "postmortem" in those cases since the
           | word literally means "after death"; kind of implying
           | something bad happened. I get that most of these postmortems
           | are done after major development ceases, so it kind of is
           | "dead" but still.
           | 
           | Surely a "retrospective" would be a better word for a look
           | back. It even means "look back.
        
             | simonask wrote:
             | It gets even better. Some companies use "mid-mortems",
             | which are evaluation and reflection processes in the middle
             | of a project...
        
               | meepmorp wrote:
               | sounds like an appropriate way to talk about death march
               | projects, tbh
        
         | bmacho wrote:
         | The last part is unfortunate. However, it is a perfectly fine
         | choice of title, as it does not make the majority of us think
         | that there were an outage caused by jemalloc. You should update
         | how you think of the word, and align it with the majority usage
        
       | Svetlitski wrote:
       | I understand the decision to archive the upstream repo; as of
       | when I left Meta, we (i.e. the Jemalloc team) weren't really in a
       | great place to respond to all the random GitHub issues people
       | would file (my favorite was the time someone filed an issue
       | because our test suite didn't pass on Itanium lol). Still, it
       | makes me sad to see. Jemalloc is still IMO the best-performing
       | general-purpose malloc implementation that's easily usable;
       | TCMalloc is great, but is an absolute nightmare to use if you're
       | not using bazel (this has become _slightly_ less true now that
       | bazel 7.4.0 added cc_static_library so at least you can somewhat
       | easily export a static library, but broadly speaking the point
       | still stands).
       | 
       | I've been meaning to ask Qi if he'd be open to cutting a final
       | 6.0 release on the repo before re-archiving.
       | 
       | At the same time it'd be nice to modernize the default settings
       | for the final release. Disabling the (somewhat confusingly
       | backwardly-named) "cache oblivious" setting by default so that
       | the 16 KiB size-class isn't bloated to 20 KiB would be a major
       | improvement. This isn't to disparage your (i.e. Jason's) original
       | choice here; IIRC when I last talked to Qi and David about this
       | they made the point that at the time you chose this default,
       | typical TLB associativity was much lower than it is now. On a
       | similar note, increasing the default "page size" from 4 KiB to
       | something larger (probably 16 KiB), which would correspondingly
       | increase the large size-class cutoff (i.e. the point at which the
       | allocator switches from placing multiple allocations onto a slab,
       | to backing individual allocations with their own extent directly)
       | from 16 KiB up to 64 KiB would be pretty impactful. One of the
       | last things I looked at before leaving Meta was making this
       | change internally for major services, as it was worth a several
       | percent CPU improvement (at the cost of a minor increase in RAM
       | usage due to increased fragmentation). There's a few other things
       | I'd tweak (e.g. switching the default setting of metadata_thp
       | from "disabled" to "auto", changing the extent-sizing for slabs
       | from using the nearest exact multiple of the page size that fits
       | the size-class to instead allowing ~1% guaranteed wasted space in
       | exchange for reducing fragmentation), but the aforementioned
       | settings are the biggest ones.
        
         | kstrauser wrote:
         | Stuff like this is what keeps me coming back here. Thanks for
         | posting this!
         | 
         | What's hard about using TCMalloc if you're not using bazel?
         | (Not asking to imply that it's not, but because I'm genuinely
         | curious.)
        
           | Svetlitski wrote:
           | It's just a huge pain to build and link against. Before the
           | bazel 7.4.0 change your options were basically:
           | 
           | 1. Use it as a dynamically linked library. This is not great
           | because you're taking at a minimum the performance hit of
           | going through the PLT for every call. The forfeited
           | performance is even larger if you compare against statically
           | linking with LTO (i.e. so that you can inline calls to
           | malloc, get the benefit of FDO , etc.). Not to mention all
           | the deployment headaches associated with shared libraries.
           | 
           | 2. Painfully _manually_ create a static library. I've done
           | this, it's awful; especially if you want to go the extra mile
           | to capture as much performance as possible and at least get
           | _partial_ LTO (i.e. of TCMalloc independent of your
           | application code, compiling all of TCMalloc's compilation
           | units together to create a single object file).
           | 
           | When I was at Meta I imported TCMalloc to benchmark against
           | (to highlight areas where we could do better in Jemalloc) by
           | pain-stakingly hand-translating its bazel BUILD files to
           | buck2 because there was legitimately no better option.
           | 
           | As a consequence of being so hard to use outside of Google,
           | TCMalloc has many more unexpected (sometimes problematic)
           | behaviors than Jemalloc when used as a general purpose
           | allocator in other environments (e.g. it basically assumes
           | that you are using a certain set of Linux configuration
           | options [1] and behaves rather poorly if you're not)
           | 
           | [1] https://google.github.io/tcmalloc/tuning.html#system-
           | level-o...
        
             | kstrauser wrote:
             | Wow. That does sound quite unpleasant.
             | 
             | Thanks again. This is far outside my regular work, but it
             | fascinates me.
        
             | prpl wrote:
             | I've successfully used LLMs to migrate Makefiles to bazel,
             | more or less. I've not tried the reverse but suspect (2)
             | isn't so bad these days. YMMV, of course, but food for
             | thought
        
               | rfoo wrote:
               | Dunno why you got downvoted, but I've also tried to let
               | Claude translate a bunch of BUILD files to equivalent
               | CMakeLists.txt. It worked. The resulting CMakeLists.txt
               | looks super terrible, but so is 95% of CMakeLists.txt in
               | this world, so why bother, it's doomed anyway.
        
               | mort96 wrote:
               | They got downvoted because 1) comments of the form "I
               | gave a chat bot a toy example of a task and it managed
               | it" are tired and uninformative, and 2) because nobody
               | was talking about anything which would make translating a
               | Makefile into Bazel somehow relevant, nobody here has a
               | Makefile which we wish was Bazel, we wish Google code was
               | easier to work with
        
               | jeffbee wrote:
               | The person above was saying they did a tedious manual
               | port of tcmalloc to buck. Since tcmalloc provides both
               | bazel and cmake builds, it seems relevant that in these
               | days a person could have potentially forced a robot to do
               | the job of writing the buck file given the cmake or bazel
               | files.
        
               | prpl wrote:
               | People are discussing things that are tedious work. I
               | think the conversion to Bazel from a makefile is much
               | more tedious and error prone than the reverse, in part
               | because of Bazel sandboxing although that shouldn't make
               | much of a difference for a well-defined collection of
               | Makefiles of a C library.
               | 
               | The reverse should be much easier, which was the point of
               | the post. Pointing it out as a capability (translation of
               | build systems) that is handled well, is, well,
               | informative. The future isn't evenly distributed and
               | people aren't always aware of capabilities, even on HN
        
               | mort96 wrote:
               | What's really tedious is the constant chat bot spam.
        
               | benced wrote:
               | Yep I've done something similar. This is the only way I
               | managed to compile Google's C++ S2 library (spatial
               | indexing) which depends on absl and OpenSSL.
               | 
               | (I managed to avoid infecting my project with boringSSL)
        
             | MaskRay wrote:
             | Thanks for sharing the insight!
             | 
             | As I observed when I was at Google: tcmalloc wasn't a
             | dedicated team but a project driven by server performance
             | optimization engineers aiming to improve performance of
             | important internal servers. Extracting it to
             | github.com/google/tcmalloc was complex due to intricate
             | dependencies (https://abseil.io/blog/20200212-tcmalloc ).
             | As internal performance priorities demanded more focus,
             | less time was available for maintaining the CMake build
             | system. Maintaining the repo could at best be described as
             | a community contribution activity.
             | 
             | > Meta's needs stopped aligning well with those of external
             | uses some time ago, and they are better off doing their own
             | thing.
             | 
             | I think Google's diverged from the external uses even long
             | ago:) (For a long time google3 and gperftools's tcmalloc
             | implementations were so different.)
        
             | mort96 wrote:
             | Everything from Google is an absolute pain to work with
             | unless you're in Google using their systems, FWIW. Anything
             | from the Chromium project is deeply intangled with
             | everything else from the Chromium project as part of one
             | gigantic Chromium source tree with all dependencies and
             | toolchains vendored. They do not care about ABI what so
             | ever, to the point that a lot of Google libraries change
             | their public ABI based on whether address sanitizer is
             | enabled or not, meaning you can't enable ASAN for your code
             | if you use pre-built (e.g package manager provided)
             | versions of their code. Their libraries also tend to break
             | if you link against them from a project with RTTI enabled,
             | a compiler set to a slightly different compiler version, or
             | any number of other minute differences that most other
             | developers don't let affect their ABI.
             | 
             | And if you try to build their libraries from source, that
             | involves downloading tens of gigabytes of sysroots and
             | toolchains and vendored dependencies.
             | 
             | Oh and you probably don't want multiple versions of a
             | library in your binary, so be prepared to use Google's
             | (probably outdated) version of whatever libraries they
             | vendor.
             | 
             | And they make no effort what so ever to distinguish between
             | public header files and their source code, so if you wanna
             | package up their libraries, be prepared to make scripts to
             | extract the headers you need (including headers from
             | vendored dependencies), you can't just copy all of some
             | 'include/' folder.
             | 
             | And their public headers tend to do idiotic stuff like
             | `#include "base/pc.h"`, where that `"base/pc.h"` path is
             | _not_ relative to the file doing the include. So you 're
             | gonna have to pollute the include namespace. Make sure not
             | to step on their toes! There's a lot of them.
             | 
             | I have had the misfortune of working with Abseill, their
             | WebRTC library, their gRPC library and their protobuf
             | library, and it's all terrible. For personal projects where
             | I don't have a very, _very_ good reason to use Google code,
             | I try to avoid it like the plague. For professional
             | projects where I 've had to use libwebrtc, the only
             | reasonable approach is to silo off libwebrtc into its own
             | binary which _only_ deals with WebRTC, typically with a
             | line-delimited JSON protocol on stdin /stdout. For things
             | like protobuf/gRPC where that hasn't been possible, you
             | just have to live with the suffering.
             | 
             | ..This comment should probably have been a blog post.
        
               | pavlov wrote:
               | This matches my own experience trying to use Google's C++
               | open source. You should write the blog post!
        
               | ahartmetz wrote:
               | I think your rant isn't long enough to include everything
               | relevant ;) The Blink web engine (which I sometimes
               | compile for qtwebengine) takes a really long time to
               | compile, several times longer than Gecko according to
               | some info I found online. Google has a policy of not
               | using forward declarations, including everything instead.
               | That's a pretty big WTF for anyone who has ever optimized
               | build time. Google probably just throws hardware and
               | (distributed) caching at the problem, not giving a shit
               | about anyone else building it. Oh, it also needs about 2
               | GB of RAM per build thread - basically nothing else does.
        
               | LtdJorge wrote:
               | Even with Firefox using Rust and requiring a build of
               | many crates, qtwebengine takes more time. It was so bad
               | that I had to remove packaged from my system (Gentoo)
               | that were pulling qtwebengine.
               | 
               | And I build all Rust crates (including rustc) with -O3,
               | same as C/C++.
        
               | bialpio wrote:
               | Chromium deviates from Google-wide policy and allows
               | forward-declarations: https://chromium.googlesource.com/c
               | hromium/src/+/main/styleg..., "Forward declarations vs.
               | #includes".
        
               | ahartmetz wrote:
               | That is really nice to hear, but AFAICS it only means
               | that it may change in the future. Because in current
               | code, it was ~all includes last time I checked.
               | 
               | Well, I remember one - very biased - example where I had
               | a look at a class that was especially expensive to
               | compile, like 40 seconds (on a Ryzen 7950X) and maybe 2
               | GB of RAM. It had under 200 LOC and didn't seem to do
               | anything that's typically expensive to compile... except
               | for the stuff it included. Which also didn't seem to do
               | anything fancy. But transitive includes can snowball if
               | you don't add any "compile firewalls".
        
               | stick_figure wrote:
               | This is actually tracked at a publicly visible URL:
               | https://commondatastorage.googleapis.com/chromium-
               | browser-cl...
               | 
               | And the include graph analysis:
               | https://commondatastorage.googleapis.com/chromium-
               | browser-cl...
               | 
               | The annotated red dots correspond to the last time Chrome
               | developers did a big push to prune the include graph to
               | optimize build time. It was effective, but there was push
               | back. C++ developers just want magic, they don't want to
               | think about dependency management, and it's hard to blame
               | them. But, at the end of the day, builds scale with
               | sources times dependencies, and if you aren't
               | disciplined, you can expect superlinear build times.
        
               | ahartmetz wrote:
               | Good that it's being tracked, but Jesus, these numbers!
               | 
               | 110 CPU hours for a build. (Fortunately, it seems to be a
               | little over half that for my CPU. "Cloud CPUs" are kinda
               | slow.)
               | 
               | I picked the 5001st largest file with includes. It's
               | zoom_view_controller.cc, 140 lines in the .cc file, size
               | with includes: 19.5 MB.
               | 
               | Initially I picked the 5000th largest file with includes,
               | but for devtools_target_ui.cc, I see a bit more
               | legitimacy for having lots of includes. It has 384 "own"
               | lines in he .cc file and, of course, also about 19.5 MB
               | size with includes.
               | 
               | A C++20 source file including some standard library
               | headers easily bloats to a little under 1 MB IIRC, and
               | that's already kind of unreasonable. 20x of that is very
               | unreasonable.
               | 
               | I don't think that I need to tell anyone on the Chrome
               | team how to improve performance in software: you measure
               | and then you grab the dumb low-hanging fruit first. From
               | these results, it doesn't seem like anyone is working
               | with the actual goal to improve the situation as long as
               | the guidelines are followed on paper.
        
               | bialpio wrote:
               | > I picked the 5001st largest file with includes. It's
               | zoom_view_controller.cc, 140 lines in the .cc file, size
               | with includes: 19.5 MB.
               | 
               | > Initially I picked the 5000th largest file with
               | includes, but for devtools_target_ui.cc, I see a bit more
               | legitimacy for having lots of includes. It has 384 "own"
               | lines in he .cc file and, of course, also about 19.5 MB
               | size with includes.
               | 
               | > A C++20 source file including some standard library
               | headers easily bloats to a little under 1 MB IIRC, and
               | that's already kind of unreasonable. 20x of that is very
               | unreasonable.
               | 
               | I think you're not arguing pro-forward-declarations vs
               | anti-forward-declarations here though - it sounds more
               | like an argument for more granular header/source files?
               | In .cc file, each and every include should be necessary
               | for the file to compile (although looking at your
               | example, bind.h seems to be unused and could be removed -
               | looks like the file was refactored and the includes
               | weren't cleaned up).
               | 
               | With that said, in the corresponding
               | zoom_view_controller.h, the tab_interface.h include looks
               | to be unnecessary so you did find one good example. :)
        
               | ahartmetz wrote:
               | Yes, sure! I am arguing for whatever is necessary to
               | reduce the total compilation cost. Pruning headers,
               | rearranging source code to have fewer trivial modules and
               | to reduce the size of very often included headers, even
               | _gasp_ sometimes using pointers just to reduce compile
               | time! I understand that runtime performance is a very
               | high priority for Blink, but it really doesn 't matter
               | sometimes if certain things are heap-allocated. Like
               | things that are very expensive to instantiate anyway and
               | that don't occur often. These will incidentally tend to
               | have "heavy" headers, too.
        
               | bialpio wrote:
               | > Because in current code, it was ~all includes last time
               | I checked.
               | 
               | That's another matter - just because forward-declares are
               | allowed, doesn't mean they are mandated, but in my
               | experience the reviewers were paying attention to that
               | pretty well.
               | 
               | Counter-exeamples to "~all includes": https://source.chro
               | mium.org/chromium/chromium/src/+/main:thi..., https://sou
               | rce.chromium.org/chromium/chromium/src/+/main:thi..., htt
               | ps://source.chromium.org/chromium/chromium/src/+/main:thi
               | ....
               | 
               | I picked couple random headers from the directory where
               | I've contributed the most to blink, and from what I'm
               | seeing, most of the classes that could be forward-
               | declared, were. I have not looked at .cc files given that
               | those tend to need to see the declaration (except when
               | it's unused, but then why have a forward-decl at all?) or
               | the compiler would complain about access into incomplete
               | type.
               | 
               | > Well, I remember one - very biased - example where I
               | had a look at a class that was especially expensive to
               | compile, like 40 seconds (on a Ryzen 7950X) and maybe 2
               | GB of RAM. It had under 200 LOC and didn't seem to do
               | anything that's typically expensive to compile... except
               | for the stuff it included.
               | 
               | Maybe the stuff was actually being compiled because of
               | some member in a class (so it was actually expensive to
               | compile). Or maybe you stumbled upon a place where folks
               | weren't paying attention. Hard to say without a concrete
               | example. The "compile firewall" was added pretty recently
               | I think, but I don't know if it's going to block anything
               | from landing.
               | 
               | Edit: formatting (switched bulleted list into comma-
               | separated because clearly I don't know how to format it).
        
               | rfoo wrote:
               | > they make no effort what so ever to distinguish between
               | public header files and their source code
               | 
               | They did, in a different way. The world is used to
               | distinguish by convention, putting them in different
               | directory hierarchy (src/, include/). google3 depends on
               | the build system to do so, "which header file is public"
               | is documented in BUILD files. You are then required to
               | use their build system to grasp the difference :(
               | 
               | > And their public headers tend to do idiotic stuff like
               | `#include "base/pc.h"`, where that `"base/pc.h"` path is
               | not relative to the file doing the include.
               | 
               | I have to disagree on this one. Relying on relative
               | include paths suck. Just having one `-I/project/root` is
               | the way to go.
        
               | mort96 wrote:
               | > I have to disagree on this one. Relying on relative
               | include paths suck. Just having one `-I/project/root` is
               | the way to go.
               | 
               | Oh to be clear, I'm not saying that they should've used
               | relative includes. I'm complaining that they don't put
               | their includes in their own namespace. If public headers
               | were in a folder called `include/webrtc` as is the
               | typical convention, and they all contained `#include
               | <webrtc/base/pc.h>` or `#include "webrtc/base/pc.h"` I
               | would've had no problem. But as it is, WebRTC's headers
               | are in include paths which it's really difficult to avoid
               | colliding with. You'll cause collisions if your project
               | has a source directory called `api`, or `pc`, or `net`,
               | or `media`, or a whole host of other common names.
        
               | rfoo wrote:
               | Thanks for the clarification. Yeah, that's pretty
               | frustrating.
               | 
               | Now I'm curious why grpc, webrtc and some other Chromium
               | repos were set up like this. Google projects which
               | started in google3 and later exported as an open source
               | project don't have this defect, for example tensorflow,
               | abseil etc. They all had a top-level directory containing
               | all their codes so it becomes `#include "tensorflow/...`.
               | 
               | Feels like a weird collision of coding style and starting
               | a project outside of their monorepo
        
               | alextingle wrote:
               | >> `#include "base/pc.h"`, where that `"base/pc.h"` path
               | is not relative to the file doing the include.
               | 
               | > I have to disagree on this one.
               | 
               | The double-quotes literally mean "this dependency is
               | relative to the current file". If you want to depend on a
               | -I, then signal that by using angle brackets.
        
               | mort96 wrote:
               | Eh, no. The quotes mean "this is not a dependency on a
               | system library". Quotes can include relative to the
               | files, or they can include things relative to directories
               | specified with -I. The only thing they can't is include
               | things relative to directories specified with -isystem
               | and system include directories.
               | 
               | I would be surprised if I read some project's code where
               | angle brackets are used to include headers from within
               | the same project. I'm not surprised when quotes are used
               | to include code from within the project but relative to
               | the project's root.
        
               | fc417fc802 wrote:
               | Reading this perspective was interesting. I can
               | appreciate that things didn't fit into your workflow very
               | well, but my experience has been the opposite. Their
               | projects seem to be structured from the perspective of
               | building literally everything from source on the spot.
               | That matches my mindset - I choose to build from scratch
               | in a network isolated environment. As a result google
               | repos are some of the few that I can count on to be
               | fairly easy to get up and running. An alarming number of
               | projects apparently haven't been tested under such
               | conditions and I'm forced to spend hours patching up
               | cmake scripts. (Even worse are the projects that require
               | 'npm install' as part of the build process. Absurdity.)
               | 
               | > Oh and you probably don't want multiple versions of a
               | library in your binary, so be prepared to use Google's
               | (probably outdated) version of whatever libraries they
               | vendor.
               | 
               | This is the only complaint I can relate to. Sometimes
               | they lag on rolling dependencies forward. Not so
               | infrequently there are minor (or not so minor) issues
               | when I try to do so myself and I don't want to waste time
               | patching my dependencies up so I get stuck for a while
               | until they get around to it. That said, usually rolling
               | forward works without issue.
               | 
               | > if you try to build their libraries from source, that
               | involves downloading tens of gigabytes of sysroots and
               | toolchains and vendored dependencies.
               | 
               | Out of curiosity which project did you run into this
               | with? That said, isn't the only alternative for them
               | moving to something like nix? Otherwise how do you
               | tightly specify the build environment?
        
               | bluGill wrote:
               | > I choose to build from scratch in a network isolated
               | environment. As a result google repos are some of the few
               | that I can count on to be fairly easy to get up and
               | running.
               | 
               | If you are building a single google project they are easy
               | to get up and running. If you are building your own
               | project on top of theirs things get difficult. those
               | library issues will get you.
               | 
               | I don't know about OP, but we have our own in house
               | package manager. If Conan was ready a couple years sooner
               | we would have used that instead.
        
               | mort96 wrote:
               | I don't really have the care nor time to respond as
               | thoroughly as you deserve, but here are some thoughts:
               | 
               | > Out of curiosity which project did you run into this
               | with?
               | 
               | Their WebRTC library for the most part, but also the gRPC
               | C++ library. Unlike WebRTC, grpc++ is in most package
               | managers so the need to build it myself is less, but
               | WebRTC is a behemoth and not in any package manager.
               | 
               | > That said, isn't the only alternative for them moving
               | to something like nix? Otherwise how do you tightly
               | specify the build environment?
               | 
               | I don't expect my libraries to tightly specify the build
               | environment. I expect my libraries to conform to my
               | software's build environment, to use versions of other
               | libraries that I provide to it, etc etc. I don't mind
               | that Google builds their application software the way
               | they do, Google Chrome should tightly constrain its build
               | environment if Google wants; but their libraries should
               | fit in to _my_ environment.
               | 
               | I'm wondering, what is your relationship with Google
               | software that you build from source? Are you building
               | their libraries to integrate with your own applications,
               | or do you just build Google's applications from source
               | and use them as-is?
        
               | ewalk153 wrote:
               | I've hit similar problems with their Ruby gRPC library.
               | 
               | The counter example is the language Go. The team running
               | Go has put considerable care and attention into making
               | this project welcoming for developers to contribute,
               | while still adhering to Google code contribution
               | requirements. Building for source is straightforward and
               | iirc it's one of the easier cross compilers to setup.
               | 
               | Install docs: https://go.dev/doc/install/source#bootstrap
               | FromBinaryRelease
        
               | rstat1 wrote:
               | Go is kinda of a pain to build from source. Build one
               | version to build another, and another..
               | 
               | Or rather it was the last time I tried.
        
               | rstat1 wrote:
               | I agree to a point. grpc++ (and protobuf and boringssl
               | and abseil and....) was the biggest pain in the ass to
               | integrate in to a personal project I've ever seen. I
               | ended up having to write a custom tool to convert their
               | Bazel files to the format my projects tend to use (GN and
               | Ninja). Many hours wasted. There were no library specfici
               | "sysroots" or "toolchains" involved though thankfully
               | because I'm sure that would made things even worse.
               | 
               | Upside is (I guess) if I ever want to use grpc in another
               | project the work's already done and it'll just be a
               | matter of copy/paste.
        
         | matoro wrote:
         | That was me that filed the Itanium test suite failure. :)
        
           | boulos wrote:
           | The Itanic was kind of great :). I'm convinced it helped sink
           | SGI.
        
             | froh wrote:
             | Sunk by the Great Itanic ?
        
             | sitkack wrote:
             | Why was the sinking of SGI great?
        
               | boulos wrote:
               | Oh, that wasn't the intent. I meant two separate things.
               | The Itanic itself was kind of fascinating, but mostly
               | panned (hence the nickname).
               | 
               | SGI's decision to built out Itanium systems may have
               | helped precipitate their own downfall. That was sad.
        
               | cogman10 wrote:
               | Still makes me sad. I partially think a major reason for
               | the demise was that it was simply constructed too soon.
               | Compiler tech wasn't nearly good enough to handle the
               | ISA.
               | 
               | Nowadays because of the efforts that have gone in to
               | making SIMD effective, I'd think modern compilers would
               | have an easier time taking advantage of that unique and
               | strange uarch.
        
             | acdha wrote:
             | SGI and HP! Intel should have a statue of Rick Belluzzo on
             | they'r campus.
        
             | crest wrote:
             | Itanium did its most most important job: it killed
             | everything but ARM and POWER.
        
           | apaprocki wrote:
           | Ah, porting to HP Superdome servers. It's like being handed a
           | brochure describing the intricate details of the iceberg the
           | ship you just boarded is about to hit in a few days.
           | 
           | A fellow traveler, ahoy!
        
             | cogman10 wrote:
             | I worked on the Superdome servers back in the day. What a
             | weird product. I still can't believe it was a profitable
             | division (at my time circa 2011).
             | 
             | HP was going through some turbulent waters in those days.
        
           | kabdib wrote:
           | one of the best books on Linux architecture i've read was the
           | one on the Itanium port
           | 
           | i think, because Itanic broke a _ton_ of assumptions
        
         | EnPissant wrote:
         | Do you have any opinions on mimalloc?
        
         | gazpacho wrote:
         | I would love to see these changes - or even some sort of blog
         | post or extended documentation explaining rational. As is the
         | docs are somewhat barren. I feel that there's a lot of
         | knowledge that folks like you have right now from all of the
         | work that was done internally at Meta that would be best shared
         | now before it is lost.
        
         | einpoklum wrote:
         | > TCMalloc is great, but is an absolute nightmare to use if
         | you're not using bazel
         | 
         | custom-malloc-newbie question: Why is the choice of build
         | system (generator) significant when evaluating the usability of
         | a library?
        
           | fc417fc802 wrote:
           | Because you need to build it to use it, and you likely
           | already have significant build related infrastructure, and
           | you are going to need to integrate any new dependencies into
           | that. I'm increasingly convinced that the various build
           | systems are elaborate and wildly successful ploys intended
           | only to sap developer time and energy.
        
           | CamouflagedKiwi wrote:
           | Because you have to build it. If they don't use the same
           | build system as you, you either want to invoke their system,
           | or import it into yours. The former is unappealing if it's
           | 'heavy' or doesn't play well as a subprocess; the latter can
           | take a lot of time if the build process you're replicating is
           | complex.
           | 
           | I've done both before, and seen libraries at various levels
           | of complexity; there is definitely a point where you just
           | want to give up and not use the thing when it's very complex.
        
             | username223 wrote:
             | This. When step one is "install our weird build system,"
             | I'll immediately look for something else that meets my
             | needs. All build systems suck, so everyone thinks they can
             | write a better one, and too many people try. Pretty soon
             | you end up having to learn a majority of this (https://en.w
             | ikipedia.org/wiki/List_of_build_automation_softw...) to get
             | your code to compile.
        
               | einpoklum wrote:
               | If TCMalloc uses bazel, then you build it with Bazel. It
               | just needs to install itself where you tell it to, and
               | then either it has given you a pkg-config file, or
               | otherwise, your own build system needs some library-
               | finding logic for it ("find module" in CMake terms). Or -
               | are you saying the problem is that you need to install
               | Bazel?
        
         | klabb3 wrote:
         | > we (i.e. the Jemalloc team) weren't really in a great place
         | to respond to all the random GitHub issues people would file
         | 
         | Why not? I mean this is complete drive-by comment, so please
         | correct me, but there was a fully staffed team at Meta that
         | maintained it, but was not in the best place to manage the
         | issues?
        
           | xcrjm wrote:
           | They said the team was not _in_ a great place to do it, eg.
           | they probably had competing priorities that overshadowed
           | triaging issues.
        
           | anonymoushn wrote:
           | Well, to be blunt, the company does not care about this, so
           | it does not get done.
        
         | Thaxll wrote:
         | It's kind of wild that great software is hindered by a
         | complicated build and integration process.
        
       | mavis wrote:
       | Switching to jemalloc instantly fixed an irksome memory leak in
       | an embedded Linux appliance I inherited many moons ago. Thank you
       | je, we salute you!
        
         | vlovich123 wrote:
         | That's because sane allocators that aren't glibc will return
         | unused memory periodically to the OS while glibc prefers to
         | permanently retain said memory.
        
           | masklinn wrote:
           | glibc will return memory to the OS just fine, the problem is
           | that its arena design is _extremely_ prone to fragmentation,
           | so you end up with a bunch of arenas which are almost but not
           | quite empty and can 't be released, but can't really be used
           | either.
           | 
           | In fact, Jason himself (the author of jemalloc and TFA)
           | posted an article on glibc malloc fragmentation 15 years ago:
           | https://web.archive.org/web/20160417080412/http://www.canonw.
           | ..
           | 
           | And it's an issue to this day:
           | https://blog.arkey.fr/drafts/2021/01/22/native-memory-
           | fragme...
        
             | nh2 wrote:
             | glibc does NOT return memory to the OS just fine.
             | 
             | In my experience it delays it way too much, causing memory
             | overuse and OOMs.
             | 
             | I have a Python program that allocates 100 GB for some
             | work, free()s it, and then calls a subprocess that takes
             | 100 GB as well. Because the memory use is serial, it should
             | fit in 128 GB just fine. But it gets OOM-killed, because
             | glibc does not turn the free() into an munmap() before the
             | subprocess is launched, so it needs 200 GB total, with 100
             | GB sitting around pointlessly unused in the Python process.
             | 
             | This means if you use glibc, you have no idea how much
             | memory your system will use and whether they will OOM-
             | crash, even if your applications are carefully designed to
             | avoid it.
             | 
             | Similar experience:
             | https://news.ycombinator.com/item?id=24242571
             | 
             | I commented there 4 years ago the glibc settings
             | MALLOC_MMAP_THRESHOLD_ and MALLOC_TRIM_THRESHOLD_ should
             | fix that, but I was wrong: MALLOC_TRIM_THRESHOLD_ is
             | apparently bugged and has no effect in some situations.
             | 
             | A bug I think might be involved: "free() doesn't honor
             | M_TRIM_THRESHOLD"
             | https://sourceware.org/bugzilla/show_bug.cgi?id=14827
             | 
             | Open since 13 years ago. This stuff doesn't seem to get
             | fixed.
             | 
             | The fix in general is to use jemalloc with
             | MALLOC_CONF="retain:false,muzzy_decay_ms:0,dirty_decay_ms:0
             | "
             | 
             | which tells it to immediately munmap() at free().
             | 
             | So in jemalloc, the settings to control this behaviour seem
             | to actually work, in contrast to glibc malloc.
             | 
             | (I'm happy to be proven wrong here, but so far no
             | combination of settings seem to actually make glibc return
             | memory as written in their docs.)
             | 
             | From this perspective, it is frightening to see the
             | jemalloc repo being archived, because that was my way to
             | make sure stuff doesn't OOM in production all the time.
        
           | Crespyl wrote:
           | Can you elaborate on this? I don't know much about
           | allocators.
           | 
           | How would the allocator know that some block is unused, short
           | of `free` being called? Does glibc not return all memory
           | after a `free`? Do other allocators do something clever to
           | automatically release things? Is there just a lot of
           | bookkeeping overhead that some allocators are better at
           | handling?
        
             | adwn wrote:
             | When `free()` is called, the allocator internally marks
             | that specific memory area as _unused_ , but it doesn't
             | necessarily return that area back to the OS, for two main
             | reasons:
             | 
             | 1. `malloc()` is usually called with sizes smaller than the
             | sizes by which the allocator requests memory from the OS,
             | which are at least page-sized (4096 bytes on x86/x86-64)
             | and often much larger. After a `free()`, the freed memory
             | can't be returned to the OS because it's only a small chunk
             | in a larger OS allocation. Only after all memory within a
             | page has been `free()`d, the allocator may, but doesn't
             | have to, return that page back to the OS.
             | 
             | 2. After a `free()`, the allocator wants to hang on to that
             | memory area because the next `malloc()` is sure to follow
             | soon.
             | 
             | This is a very simplified overview, and different
             | allocators have different strategies for gathering new
             | `malloc()`s in various areas and for returning areas back
             | to the OS (or not).
        
             | mort96 wrote:
             | They're not really correct, glibc will return stuff back to
             | the OS. It just has some quirks about how and when it does
             | it.
             | 
             | First, some background: no allocator will return memory
             | back to the kernel for every `free`. That's for performance
             | and memory consumption reasons: the smallest unit of memory
             | you can request from and return to the kernel is a _page_
             | (typically 4kiB or 16kiB), and requesting and returning
             | memory (typically called  "mapping" and "unmapping" memory
             | in the UNIX world) has some performance overhead.
             | 
             | So if you allocate space for one 32-byte object for
             | example, your `malloc` implementation won't map a whole new
             | 4k or 16k page to store 32 bytes. The allocator probably
             | has some pages from earlier allocations, and it will make
             | space for your 32-byte allocation in pages it has already
             | mapped. Or it can't fit your allocation, so it will map
             | more pages, and then set aside 32 bytes for your
             | allocation.
             | 
             | This all means that when you call `free()` on a pointer,
             | the allocator can't just unmap a page immediately, because
             | there may be other allocations on the same page which
             | haven't been freed yet. Only when all of the allocations
             | which happen to be on a specific page are freed, can the
             | page be unmapped. In a worst-case situation, you could in
             | theory allocate and free memory in such a way that you end
             | up with 100 1-byte allocations allocated across 100 pages,
             | none of which can be unmapped; you'd be using 400kiB or
             | 1600kiB of memory to store 100 bytes. (But that's not
             | necessarily a huge problem, because it just means that
             | future allocations would probably end up in the existing
             | pages and not increase your memory consumption.)
             | 
             | Now, the glibc-specific quirk: glibc will only ever unmap
             | _the last page_ , from what I understand. So you can
             | allocate megabytes upon megabytes of data, which causes
             | glibc to map a bunch of pages, then free() every allocation
             | except for the last one, and you'd end up still consuming
             | many megabytes of memory. Glibc won't unmap those megabytes
             | of unused pages until you free the allocation that sits in
             | the last page that glibc mapped.
             | 
             | This typically isn't a huge deal; yes, you're keeping more
             | memory mapped than you strictly need, but if the
             | application needs more memory in the future, it'll just re-
             | use the free space in all the pages it has already mapped.
             | So it's not like those pages are "leaked", they're just
             | kept around for future use.
             | 
             | It can sometimes be a real problem though. For example, a
             | program could do a bunch of memory-intensive computation on
             | launch requiring gigabytes of memory at once, then all that
             | computation culminates in one relatively small allocated
             | object, then the program calls free() on all the
             | allocations it did as part of that computation. The
             | application could potentially keep around gigabytes worth
             | of pages which serve no purpose but can't be unmapped due
             | to that last small allocation.
             | 
             | If any of this is wrong, I would love to be corrected. This
             | is my current impression of the issue but I'm not an
             | authoritative source.
        
       | p0w3n3d wrote:
       | Thank you. Jemalloc was recently recommended to me on some
       | presentation about Java optimization.
       | 
       | I wonder if you did get everything you should from the companies
       | that use it. I mean sometimes I feel that big tech firms only use
       | free software, never giving anything to it, so I hope you were
       | the exception here.
        
         | jeffbee wrote:
         | Imagine being a Java developer and thinking "what have big tech
         | corporations ever done for me?"
        
           | keybored wrote:
           | That are good for me, the developer.
        
       | masklinn wrote:
       | > jemalloc was probably booted from Rust binaries sooner than the
       | natural course of development might have otherwise dictated.
       | 
       | FWIW while it was _a_ factor it was just one of a number:
       | https://github.com/rust-lang/rust/issues/36963#issuecomment-...
       | 
       | And jemalloc was only removed two years after that issue was
       | opened: https://github.com/rust-lang/rust/pull/55238
        
         | Aissen wrote:
         | Interesting that one of the factor listed in there, the
         | hardcoded page-size on arm64, is still is an unsolved issue
         | upstream, and that forces app developers to either ship
         | multiple arm64 linux binaries, or drop support for some
         | platforms.
         | 
         | I wonder if some kind of dynamic page-size (with dynamic
         | ftrace-style binary patching for performance?) would have been
         | that much slower.
        
           | pkhuong wrote:
           | You can run jemalloc configured with 16KB pages on a 4KB page
           | system.
        
       | schrep wrote:
       | Your work was so impactful over a long period from Firefox to
       | Facebook. Honored to have been a small part of it.
        
         | lbrandy wrote:
         | Suppose this is as good a place to pile-on as any.
         | 
         | Though this was not the post I was expecting to show up today,
         | it was super awesome for me to get to have played my tiny part
         | in this big journey. Thanks for everything @je (and qi + david
         | -- and all the contributors before and after my time!).
        
         | liuliu wrote:
         | Your leadership on continuing investing in core technologies in
         | Facebook were as fruitful as it could ever being. GraphQL,
         | PyTorch, React to name a few cannot happen without.
        
           | dao- wrote:
           | Hmm, if I had to choose between not having Facebook and
           | having React, I'd pick the former in a heartbeat. Not that
           | this was a real choice, but it was nonetheless bitter to see
           | colleagues join the behemoth that was Facebook.
        
       | Omarbev wrote:
       | This is a good thing
        
       | adityapatadia wrote:
       | Jason, here is a story about how much your work impacts us. We
       | run a decently sized company that processes hundreds of millions
       | of images/videos per day. When we first started about 5 years
       | ago, we spent countless hours debugging issues related to memory
       | fragmentation.
       | 
       | One fine day, we discovered Jemalloc and put it in our service,
       | which was causing a lot of memory fragmentation. We did not think
       | that those 2 lines of changes in Dockerfile were going to fix all
       | of our woes, but we were pleasantly surprised. Every single issue
       | went away.
       | 
       | Today, our multi-million dollar revenue company is using your
       | memory allocator on every single service and on every single
       | Dockerfile.
       | 
       | Thank you! From the bottom of our hearts!
        
         | laszlojamf wrote:
         | I really don't mean to be snarky, but honest question: Did you
         | donate? Nothing says thank you like some $$$...
        
           | onli wrote:
           | It was a meta project and development ceased. For a regular
           | project that expectation is fine, but here it does not apply
           | IMHO.
        
           | adityapatadia wrote:
           | We regularly donate to project via open collective. We
           | frankly did not see here due to FB involvement I think.
        
         | thewisenerd wrote:
         | indeed! most image processing golang services suggest/use
         | jemalloc
         | 
         | the top 3 from https://github.com/topics/resize-images (as of
         | 2025-06-13)
         | 
         | imaginary:
         | https://github.com/h2non/imaginary/blob/1d4e251cfcd58ea66f83...
         | 
         | imgproxy:
         | https://web.archive.org/web/20210412004544/https://docs.imgp...
         | (linked from a discussion in the imaginary repo)
         | 
         | imagor:
         | https://github.com/cshum/imagor/blob/f6673fa6656ee8ef17728f2...
        
           | tecleandor wrote:
           | Yep, imgproxy seems to use libvips, that recommends jemalloc.
           | I was checking and this is a funny (not) bug report:
           | 
           | https://github.com/libvips/libvips/discussions/3019
        
       | b0a04gl wrote:
       | been using jemalloc unknowingly for a long time. only after
       | reading this post it hit how much of it was under the hood in
       | things I've built. didn't know the gc-style decay mechanism was
       | that involved, or that it handled fragmentation with time-based
       | heuristics. surprising how much tuning was exposed through env
       | vars. solid closure
        
       | brcmthrowaway wrote:
       | What allocator does Apple use?
        
         | half-kh-hacker wrote:
         | you probably want to look at their 'libmalloc'
        
         | forty wrote:
         | Probably iMalloc ;)
        
       | wiz21c wrote:
       | FTA:
       | 
       | > And people find themselves in impossible situations where the
       | main choices are 1) make poor decisions under extreme pressure,
       | 2) comply under extreme pressure, or 3) get routed around.
       | 
       | It doesn't sound like a work place :-(
        
         | bravetraveler wrote:
         | Sounds like every workplace I've 'enjoyed' since ~2008
        
           | throwaway314155 wrote:
           | nice username
           | 
           | - fsociety
        
           | mrweasel wrote:
           | Now I'm not one for victim blaming, but if that's more than
           | three places of employment, maybe you need to rethink the
           | positions you apply for.
        
             | acdha wrote:
             | There's something to that but it is victim blaming if
             | you're not acknowledging the larger trends. There are a lot
             | of places whose MBAs are attending the same conferences,
             | getting the same recommendations from consultants, and
             | hearing the same demands from investors. The push against
             | remote work, for example, was all driven by ideology
             | against most of the available data but it affected a huge
             | number of jobs.
        
               | throw0101d wrote:
               | > _The push against remote work, for example, was all
               | driven by ideology against most of the available data but
               | it affected a huge number of jobs._
               | 
               | And before that, open office plans.
               | 
               | You're saving on rent: great. But what is it doing to
               | productivity?
               | 
               | * https://business.adobe.com/blog/perspectives/what-
               | science-sa...
               | 
               | Of course productivity doesn't show up on a spreadsheet,
               | but rent does, so it's what about "the numbers" say.
        
       | the_mitsuhiko wrote:
       | All the allocators have the same issue. They largely work against
       | a shared set of allocation APIs. Many of their users mostly
       | engage via malloc and free.
       | 
       | So the flow is like this: user has an allocation looking issue.
       | Picks up $allocator. If they have an $allocator type problem then
       | they keep using it, otherwise they use something else.
       | 
       | There are tons of users if these allocators but many rarely
       | engage with the developers. Many wouldn't even notice
       | improvements or regressions on upgrades because after the initial
       | choice they stop looking.
       | 
       | I'm not sure how to fix that, but this is not healthy for such
       | projects.
        
         | Cloudef wrote:
         | malloc is bad api in general, if you want to go fast you don't
         | rely on general purpose allocator
        
           | const_cast wrote:
           | This is true, but the unfortunate thing with how C and C++
           | were developed is that pretty much everything just assumes
           | the existence of malloc/free. So if you're using third-party
           | libraries then it's out of your control mostly. Linking a new
           | allocator is a very easy and pretty much free way to improve
           | performance.
        
       | dazzawazza wrote:
       | I've used jemalloc in every game engine I've written for years.
       | It's just the thing to do. WAY faster on win32 than the default
       | allocator. It's also nice to have the same allocator across all
       | platforms.
       | 
       | I learned of it from it's integration in FreeBSD and never looked
       | back.
       | 
       | jemalloc has help entertained a lot of people :)
        
         | Iwan-Zotow wrote:
         | +1
         | 
         | windows def allocator is pos. Jemalloc rules
        
           | ahartmetz wrote:
           | >windows def allocator is pos
           | 
           | Wow, still? I remember allocator benchmarks from 10-15 years
           | ago where there were some notable differences between
           | allocators... and then Windows with like 20% the performance
           | of everything else!
        
           | int_19h wrote:
           | > windows def allocator
           | 
           | Which one of them? These days it could mean HeapAlloc, or it
           | could mean malloc from uCRT.
        
             | carey wrote:
             | malloc in uCRT just calls HeapAlloc, though? You can see
             | the code in ucrt\heap\malloc_base.cpp if you have the
             | Windows SDK installed.
             | 
             | Programs can opt in to the _segment_ heap in their
             | manifest, but it's not necessarily any faster.
        
       | mrweasel wrote:
       | Looking at all the comments and lightly browsing the source code,
       | I'm amazed. Both at how much impact a memory allocator can make,
       | but also how much code is involved.
       | 
       | I'm not really sure what I expected, but somehow I expect a
       | memory allocator to be ... smaller, simpler perhaps?
        
         | ratorx wrote:
         | Memory allocators can be simple. In fact it was an assignment
         | for a course in the 2nd year of my CS degree to make an
         | (almost) complete allocator.
         | 
         | However it is typically always more complex to make production
         | quality software, especially in a performance sensitive domain.
        
           | burnt-resistor wrote:
           | Naive allocators are very easy: just subdivide RAM and
           | defragment only when absolutely necessary (if virtual memory
           | is unavailable). Performant allocators are _hard._
           | 
           | I think we lost a great deal of potential when ORCA was too
           | tied to Pony and not extracted to a framework, tool, and/or
           | library useful outside of it such as integrated or working
           | with LLVM.
        
         | const_cast wrote:
         | It's the same way with garbage collectors.
         | 
         | You can write a naive mark-and-sweep in an afternoon. You can
         | write a reference counter in even less time. And for some
         | runtimes this is fine.
         | 
         | But writing a generational, concurrent, moving GC takes a lot
         | of time. But if you can achieve it, you can get amazing
         | performance gains. Just look at recent versions of Java.
        
         | swinglock wrote:
         | mimalloc is cleaner but lacks the very useful profiling
         | features. To be fair it also has not gone through decades of
         | changes as described in the postmortem either.
        
         | senderista wrote:
         | You can write a simple size-class allocator (even lock-free) in
         | just a couple dozen lines of code. (I've done it both for
         | interviews and for a work presentation.) But an allocator that
         | is fast, scalable, and performs well over diverse workloads--
         | that is HARD.
        
       | burnt-resistor wrote:
       | Lesson: Don't let one megacorp dominate or take over your FOSS
       | project. Push back somewhat and say "no" to too much help from
       | one source.
        
         | igrunert wrote:
         | I think the author was happy to be employed by a megacorp,
         | along with a team to push jemalloc forward.
         | 
         | He and the other previous contributors are free to find new
         | employers to continue such an arrangement, if any are willing
         | to make that investment. Alternatively they could cobble
         | together funding from a variety of smaller vendors. I think the
         | author is happy to move on to other projects, after spending a
         | long time in this problem space.
         | 
         | I don't think that "don't let one megacorp hire a team of
         | contributors for your FOSS project" is the lesson here. I'd say
         | it's a lesson in working upstream - the contributions made
         | during their Facebook / Meta investment are available for the
         | community to build upon. They could've just as easily been made
         | in a closed source fork inside Facebook, without violating the
         | terms of the license.
         | 
         | Also Mozilla were unable to switch from their fork to the
         | upstream version, and didn't easily benefit from the Facebook /
         | Meta investment as a result.
        
         | ecshafer wrote:
         | He worked for like a decade at Facebook it looks like. I would
         | guess at least at a Staff level. How many millions of dollars
         | do you think he got from that? It doesnt sound like the worse
         | trade in the world.
        
       | didip wrote:
       | Thanks for everything, JE!
       | 
       | jemalloc is always the first thing I installed whenever I had to
       | provision bare servers.
       | 
       | If jemalloc is somehow the default allocator in Linux, I think it
       | will not have a hard time retaining contributors.
        
       | soulbadguy wrote:
       | Maybe add a link to the post on the github repo. I feel like this
       | is important context for people visiting the repo in the future
        
       ___________________________________________________________________
       (page generated 2025-06-13 23:00 UTC)