hngopher.com

       [HN Gopher] A fast, growable array with stable pointers in C
       ___________________________________________________________________
        
       A fast, growable array with stable pointers in C
        
       Author : ibobev
       Score  : 94 points
       Date   : 2025-08-06 18:21 UTC (4 hours ago)
        
 (HTM) web link (danielchasehooper.com)
 (TXT) w3m dump (danielchasehooper.com)
        
       | zokier wrote:
       | > Today's computers use only 48 bits of the 64 bits in a pointer
       | 
       | https://en.wikipedia.org/wiki/Intel_5-level_paging introduced in
       | Ice Lake 6 years ago.
       | 
       | But anyways, isn't this just variant of std::deque?
       | https://en.cppreference.com/w/cpp/container/deque.html
        
         | cornstalks wrote:
         | What kind of setups use over 256 TiB of RAM?
        
           | reorder9695 wrote:
           | "640k ought to be enough for anybody"
        
           | bonzini wrote:
           | In practice it's over 64 TiB because kernels often use a
           | quarter of the available addressing space (half of the kernel
           | addressing space) to map the physical addresses (e.g.
           | FFFFC000_12345678 maps physical address 0x12345678). So 48
           | virtual address bits can be used with up to 2^46 bytes of
           | RAM.
        
             | hinkley wrote:
             | And how long has 48 bit addressing been de rigeur? Not so
             | long ago we had processors that could address 40 bits of
             | address space. Or was that 38?
        
               | winocm wrote:
               | At least since maybe the DEC Alpha 21264. It could
               | address 48-bits of VA space, but that comes with caveats
               | due to PALcode specific intricacies.
               | 
               | I think VMS (or was it Tru64?) uses this mode, but many
               | other OSes just use 43-bit or 40-bit addressing.
               | Realistically though, I don't think many users would be
               | using workloads that addressed more than 38-bits worth of
               | contiguous VA space in 1998-1999.
        
           | TapamN wrote:
           | It not necessarily physical RAM. If you memmap large files,
           | like maybe a large file from RAID or network share, you could
           | still need that much virtual address space.
        
         | sestep wrote:
         | Does std::deque support random access?
        
           | mwkaufma wrote:
           | Yes, you can see operator[] in the linked reference docs.
        
           | ksherlock wrote:
           | The cppreference documentation specifies "Random access -
           | constant _O(1)_ " (same as std::vector). There's a slight
           | overhead from going through 2 pointers instead of 1 and
           | keeping track of how many items are in the first bucket (so
           | you know which bucket to check), but that's a constant and
           | the big _O_ don 't care about constants.
        
         | mwkaufma wrote:
         | In principle it's not that different that deque, though:
         | 
         | (1) deque uses fixed-sized blocks, not increasing-size blocks.
         | (2) dequeue supports prepending, which adds another level of
         | indirection internally.
        
           | sigbottle wrote:
           | You can support prepending by mirroring the allocations,
           | probably? eg for the "negative index" case do an exponential
           | thing in the other direction.
           | 
           | Your indexing has some legitimate math to be done now which
           | can be annoying (efficiency perspective) I think you can
           | still get o(1) with careful allocation of powers of 2.
        
             | o11c wrote:
             | That's fine if you only ever add, but is likely to fail if
             | you pop FIFO-style. This too is ultimately fixable but
             | means we can no longer assume "every bucket size doubles".
             | 
             | That said, IMO "stable pointers" is overrated; "minimize
             | copying" is all that's useful.
        
               | dwattttt wrote:
               | Stable pointers is a limitation for simplicity's sake. If
               | C were better at tracking pointer invalidation, we'd be
               | better at propagating updated pointers.
        
         | sigbottle wrote:
         | I don't know the precise details of how deques are implemented
         | in C++, but given the most popular stack overflow explanation
         | of them, some immidiate pitfalls are that the T* map itself
         | sounds unbounded and if each chunk allocates only a fixed
         | constant size it's probably horrible for fragmentation or
         | overallocation. The indexing also seems dependent on division.
         | 
         | With this power of twos approach you can't really truly delete
         | from the front of the array but the amount of pointers you
         | store is constant and the memory fragmentation is better.
         | (Though OP never claimed to want to support deque behavior, it
         | shouldn't be that hard to modify, though indexing seems like it
         | has to go thru more arithmetic again)
         | 
         | I haven't used OP's array, but I have been bit plenty of times
         | with std::deque's memory allocation patterns and had to rewrite
         | with raw arrays and pointer tracking.
        
         | forrestthewoods wrote:
         | std::deque details vary by implementation and is largely
         | considered unusable for MSVC.
         | 
         | MSVC uses a too small block size making it worthless. libc++
         | block size is 16 elements or 4096 bytes.
         | 
         | It is generally better to use a container you can actually
         | understand the implementation details and control.
         | 
         | I would not call it a variant of std::deque myself. Not wrong.
         | But not a helpful observation imho.
        
       | 01HNNWZ0MV43FF wrote:
       | Readers might also find `plf::colony` interesting:
       | https://www.plflib.org/colony.htm
        
       | variadix wrote:
       | You can also use virtual memory for a stable resizable vector
       | implementation, up to some max length based on how much you
       | virtual memory you reserve initially, then commit as required to
       | grow the physical capacity.
        
         | loeg wrote:
         | Yeah, with less runtime overhead, so long as you're ok with the
         | ~4kB minimum allocation size.
        
           | IshKebab wrote:
           | I think optimal would be just normal vector under 4 kB and
           | then switch to whole pages after that.
           | 
           | Do any vector implementations & allocators actually do this
           | though?
        
         | fyrn_ wrote:
         | This mention this alturnative in the article, and also point
         | out how it does not work in embeded contexts or with WASM
        
           | ncruces wrote:
           | I've actually used this to implement the linear memory of a
           | Wasm runtime (reserve 4GB and commit as needed) and have
           | faced user complaints (it's in a library that uses a Wasm
           | runtime internally) due to it unexpectedly running into
           | issues, particularly under nested virtualization scenarios.
           | 
           | I've needed to add knobs to configure it, because even a
           | handful of 4GB instances causes issues. I've defaulted to
           | 256MB/instance, and for my own GitHub CI use 32MB/instance to
           | reduce test flakiness.
           | 
           | This is to say: I found the idea that "just reserve address
           | space, it's not an issue of you don't commit it" very flaky
           | in practice, _unless_ you 're running on bare metal Linux.
        
       | tovej wrote:
       | Very nice! I do wonder if it would be useful to be able to skip
       | even more smaller segments, maybe a ctor argument for the minimum
       | segment size. Or maybe some housekeeping functions to collapse
       | the smallest segments into one.
       | 
       | Mostly the thing that feels strange is when using say, n > 10
       | segments, then the smallest segment will be less than a
       | thousandth of the largest, and iterating over the first half will
       | access n-1 or n-2 segments, worse cache behaviour, while
       | iterating over the second half will access 1 or two segments.
       | 
       | Seems like, in most cases, you would want to be able to collapse
       | those earlier segments together.
        
       | o11c wrote:
       | Can we really call it an array if it's not contiguous (or at
       | least strided)? Only a small fraction of APIs take an `iovec,
       | iovcnt`-equivalent ...
        
         | dhooper wrote:
         | feel free to call it a "levelwise-allocated pile"
        
         | jandrese wrote:
         | Yeah, the limitation that it can't be just dumped into anything
         | that expects a C array is a large one. You need to structure
         | your code around the access primitives this project implements.
        
       | unwind wrote:
       | Very nice, although I think the level of "trickery" with the
       | macros becomes a bit much. I do understand that is The Way in C
       | (I've written C for 30 years), it's just not something I'd do
       | very often.
       | 
       | Also, from a strictly prose point of view, isn't it strange that
       | the `clz` instruction doesn't actually appear in the
       | 10-instruction disassembly of the indexing function? It feels
       | like it was optimized out by the compiler perhaps due to the
       | index being compile-time known or something, but after the setup
       | and explanation that was a bit jarring to me.
        
         | mananaysiempre wrote:
         | The POSIX name for the function is clz() [the C23 name is
         | stdc_leading_zeros(), because that's how the committee names
         | things now, while the GCC intrinsic is __builtin_clz()]. The
         | name of the x86 instruction, on the other hand, is BSR (80386+)
         | or LZCNT (Nehalem+, K10+) depending on what semantics you want
         | for zero inputs (keep in mind that early implementations of
         | BSF/BSR are _very slow_ and take time proportional to the
         | output value). The compiled code uses BSR. (All of these are
         | specified slightly differently, take care if you plan to
         | actually use them.)
        
           | unwind wrote:
           | Got it, thanks. I suck at x86, even more than I thought. :/
           | 
           | Edit: it's CLZ on Arm [1], probably what I was looking for.
           | 
           | [1]: https://developer.arm.com/documentation/100069/0610/A32-
           | and-...
        
       | jovial_cavalier wrote:
       | The example code doesn't seem to compile.
        
         | seanwessmith wrote:
         | Make sure the create the segment_array.h file. Mine outputted
         | just fine on Mac M4                 Desktop gcc main.c
         | Desktop ./a.out
         | 
         | entities[0].a = 1 entities[0].a = 1 entities[1].a = 2
        
       | pfg_ wrote:
       | Zig has this as std.SegmentedList, but it can resize the segment
       | array dynamically
        
       | andrewla wrote:
       | This is really clever but better to call this a list rather than
       | an array; functions which expect array semantics will simply not
       | work, and there's no way to transparently pass slices of this
       | data structure around.
       | 
       | In the past I've abused virtual memory systems to block off a
       | bunch of pages after my array. This lets you use an array data
       | structure, have guard pages to prevent out of bounds access, and
       | to have stable pointers in the data structure.
        
         | hinkley wrote:
         | I believe I've seen problems like this also solved with arena
         | allocators. You have certain very special allocations have an
         | arena unto themselves.
        
       | zoogeny wrote:
       | I think the article buries a significant drawback: contiguity. It
       | is obviously implied by the design but I think this approach
       | would have hard-to-define characteristics for things like cache
       | prefetching. The next address is a function, not an easily
       | predictable change.
       | 
       | One frequent reason to use an array is to iterate the items. In
       | those cases, non-contiguous memory layout is not ideal.
        
         | forrestthewoods wrote:
         | I would not call it "non-contiguous". It's more like "mostly
         | contiguous". Which for large amounts data is "amortized
         | contiguous" just like a regular vector has "amortized constant"
         | time to add an element.
        
         | dhooper wrote:
         | 1. The image at the top of the article makes it clear the
         | segments aren't contiguous
         | 
         | 2. iterating a 4 billion item segmented array would have 26
         | cache misses. Not a big deal.
        
       | taminka wrote:
       | i love a nice single header project, it's c23 only tho (bc of
       | typeof)?
        
       | gotoeleven wrote:
       | Under what conditions is exponential segment sizing preferable to
       | fixed size segments? Are there any specific algorithms or
       | situations where this is especially good? It seems like the
       | likelihood of large amounts of wasted space is a major downside.
        
         | o11c wrote:
         | It's always better - the increase in indexing complexity is
         | negligible, and it completely eliminates resizing of the top-
         | level array. It also reduces the number of calls to `malloc`
         | while keeping capacity proportional to active size.
        
       | listeria wrote:
       | They mention using this as the backing array for a power-of-two-
       | sized hash table, but I don't think it would be very useful
       | considering that the hash table won't provide stable pointers,
       | given that you would need to rehash every element as the table
       | grows. Even if you just wanted to reuse the memory, rehashing in-
       | place would be a PITA.
        
       | Lichtso wrote:
       | Another similar data structure which has a balanced tree (instead
       | of a list) that references array segments is the
       | https://en.wikipedia.org/wiki/Rope_(data_structure)
       | 
       | Its main advantages are the O(log n) time complexity for all size
       | changes at any index, meaning you can efficiently insert and
       | delete anywhere, and it is easy to implement copy-on-write
       | version control on top of it.
        
       ___________________________________________________________________
       (page generated 2025-08-06 23:00 UTC)