[HN Gopher] A fast, growable array with stable pointers in C
___________________________________________________________________
A fast, growable array with stable pointers in C
Author : ibobev
Score : 94 points
Date : 2025-08-06 18:21 UTC (4 hours ago)
(HTM) web link (danielchasehooper.com)
(TXT) w3m dump (danielchasehooper.com)
| zokier wrote:
| > Today's computers use only 48 bits of the 64 bits in a pointer
|
| https://en.wikipedia.org/wiki/Intel_5-level_paging introduced in
| Ice Lake 6 years ago.
|
| But anyways, isn't this just variant of std::deque?
| https://en.cppreference.com/w/cpp/container/deque.html
| cornstalks wrote:
| What kind of setups use over 256 TiB of RAM?
| reorder9695 wrote:
| "640k ought to be enough for anybody"
| bonzini wrote:
| In practice it's over 64 TiB because kernels often use a
| quarter of the available addressing space (half of the kernel
| addressing space) to map the physical addresses (e.g.
| FFFFC000_12345678 maps physical address 0x12345678). So 48
| virtual address bits can be used with up to 2^46 bytes of
| RAM.
| hinkley wrote:
| And how long has 48 bit addressing been de rigeur? Not so
| long ago we had processors that could address 40 bits of
| address space. Or was that 38?
| winocm wrote:
| At least since maybe the DEC Alpha 21264. It could
| address 48-bits of VA space, but that comes with caveats
| due to PALcode specific intricacies.
|
| I think VMS (or was it Tru64?) uses this mode, but many
| other OSes just use 43-bit or 40-bit addressing.
| Realistically though, I don't think many users would be
| using workloads that addressed more than 38-bits worth of
| contiguous VA space in 1998-1999.
| TapamN wrote:
| It not necessarily physical RAM. If you memmap large files,
| like maybe a large file from RAID or network share, you could
| still need that much virtual address space.
| sestep wrote:
| Does std::deque support random access?
| mwkaufma wrote:
| Yes, you can see operator[] in the linked reference docs.
| ksherlock wrote:
| The cppreference documentation specifies "Random access -
| constant _O(1)_ " (same as std::vector). There's a slight
| overhead from going through 2 pointers instead of 1 and
| keeping track of how many items are in the first bucket (so
| you know which bucket to check), but that's a constant and
| the big _O_ don 't care about constants.
| mwkaufma wrote:
| In principle it's not that different that deque, though:
|
| (1) deque uses fixed-sized blocks, not increasing-size blocks.
| (2) dequeue supports prepending, which adds another level of
| indirection internally.
| sigbottle wrote:
| You can support prepending by mirroring the allocations,
| probably? eg for the "negative index" case do an exponential
| thing in the other direction.
|
| Your indexing has some legitimate math to be done now which
| can be annoying (efficiency perspective) I think you can
| still get o(1) with careful allocation of powers of 2.
| o11c wrote:
| That's fine if you only ever add, but is likely to fail if
| you pop FIFO-style. This too is ultimately fixable but
| means we can no longer assume "every bucket size doubles".
|
| That said, IMO "stable pointers" is overrated; "minimize
| copying" is all that's useful.
| dwattttt wrote:
| Stable pointers is a limitation for simplicity's sake. If
| C were better at tracking pointer invalidation, we'd be
| better at propagating updated pointers.
| sigbottle wrote:
| I don't know the precise details of how deques are implemented
| in C++, but given the most popular stack overflow explanation
| of them, some immidiate pitfalls are that the T* map itself
| sounds unbounded and if each chunk allocates only a fixed
| constant size it's probably horrible for fragmentation or
| overallocation. The indexing also seems dependent on division.
|
| With this power of twos approach you can't really truly delete
| from the front of the array but the amount of pointers you
| store is constant and the memory fragmentation is better.
| (Though OP never claimed to want to support deque behavior, it
| shouldn't be that hard to modify, though indexing seems like it
| has to go thru more arithmetic again)
|
| I haven't used OP's array, but I have been bit plenty of times
| with std::deque's memory allocation patterns and had to rewrite
| with raw arrays and pointer tracking.
| forrestthewoods wrote:
| std::deque details vary by implementation and is largely
| considered unusable for MSVC.
|
| MSVC uses a too small block size making it worthless. libc++
| block size is 16 elements or 4096 bytes.
|
| It is generally better to use a container you can actually
| understand the implementation details and control.
|
| I would not call it a variant of std::deque myself. Not wrong.
| But not a helpful observation imho.
| 01HNNWZ0MV43FF wrote:
| Readers might also find `plf::colony` interesting:
| https://www.plflib.org/colony.htm
| variadix wrote:
| You can also use virtual memory for a stable resizable vector
| implementation, up to some max length based on how much you
| virtual memory you reserve initially, then commit as required to
| grow the physical capacity.
| loeg wrote:
| Yeah, with less runtime overhead, so long as you're ok with the
| ~4kB minimum allocation size.
| IshKebab wrote:
| I think optimal would be just normal vector under 4 kB and
| then switch to whole pages after that.
|
| Do any vector implementations & allocators actually do this
| though?
| fyrn_ wrote:
| This mention this alturnative in the article, and also point
| out how it does not work in embeded contexts or with WASM
| ncruces wrote:
| I've actually used this to implement the linear memory of a
| Wasm runtime (reserve 4GB and commit as needed) and have
| faced user complaints (it's in a library that uses a Wasm
| runtime internally) due to it unexpectedly running into
| issues, particularly under nested virtualization scenarios.
|
| I've needed to add knobs to configure it, because even a
| handful of 4GB instances causes issues. I've defaulted to
| 256MB/instance, and for my own GitHub CI use 32MB/instance to
| reduce test flakiness.
|
| This is to say: I found the idea that "just reserve address
| space, it's not an issue of you don't commit it" very flaky
| in practice, _unless_ you 're running on bare metal Linux.
| tovej wrote:
| Very nice! I do wonder if it would be useful to be able to skip
| even more smaller segments, maybe a ctor argument for the minimum
| segment size. Or maybe some housekeeping functions to collapse
| the smallest segments into one.
|
| Mostly the thing that feels strange is when using say, n > 10
| segments, then the smallest segment will be less than a
| thousandth of the largest, and iterating over the first half will
| access n-1 or n-2 segments, worse cache behaviour, while
| iterating over the second half will access 1 or two segments.
|
| Seems like, in most cases, you would want to be able to collapse
| those earlier segments together.
| o11c wrote:
| Can we really call it an array if it's not contiguous (or at
| least strided)? Only a small fraction of APIs take an `iovec,
| iovcnt`-equivalent ...
| dhooper wrote:
| feel free to call it a "levelwise-allocated pile"
| jandrese wrote:
| Yeah, the limitation that it can't be just dumped into anything
| that expects a C array is a large one. You need to structure
| your code around the access primitives this project implements.
| unwind wrote:
| Very nice, although I think the level of "trickery" with the
| macros becomes a bit much. I do understand that is The Way in C
| (I've written C for 30 years), it's just not something I'd do
| very often.
|
| Also, from a strictly prose point of view, isn't it strange that
| the `clz` instruction doesn't actually appear in the
| 10-instruction disassembly of the indexing function? It feels
| like it was optimized out by the compiler perhaps due to the
| index being compile-time known or something, but after the setup
| and explanation that was a bit jarring to me.
| mananaysiempre wrote:
| The POSIX name for the function is clz() [the C23 name is
| stdc_leading_zeros(), because that's how the committee names
| things now, while the GCC intrinsic is __builtin_clz()]. The
| name of the x86 instruction, on the other hand, is BSR (80386+)
| or LZCNT (Nehalem+, K10+) depending on what semantics you want
| for zero inputs (keep in mind that early implementations of
| BSF/BSR are _very slow_ and take time proportional to the
| output value). The compiled code uses BSR. (All of these are
| specified slightly differently, take care if you plan to
| actually use them.)
| unwind wrote:
| Got it, thanks. I suck at x86, even more than I thought. :/
|
| Edit: it's CLZ on Arm [1], probably what I was looking for.
|
| [1]: https://developer.arm.com/documentation/100069/0610/A32-
| and-...
| jovial_cavalier wrote:
| The example code doesn't seem to compile.
| seanwessmith wrote:
| Make sure the create the segment_array.h file. Mine outputted
| just fine on Mac M4 Desktop gcc main.c
| Desktop ./a.out
|
| entities[0].a = 1 entities[0].a = 1 entities[1].a = 2
| pfg_ wrote:
| Zig has this as std.SegmentedList, but it can resize the segment
| array dynamically
| andrewla wrote:
| This is really clever but better to call this a list rather than
| an array; functions which expect array semantics will simply not
| work, and there's no way to transparently pass slices of this
| data structure around.
|
| In the past I've abused virtual memory systems to block off a
| bunch of pages after my array. This lets you use an array data
| structure, have guard pages to prevent out of bounds access, and
| to have stable pointers in the data structure.
| hinkley wrote:
| I believe I've seen problems like this also solved with arena
| allocators. You have certain very special allocations have an
| arena unto themselves.
| zoogeny wrote:
| I think the article buries a significant drawback: contiguity. It
| is obviously implied by the design but I think this approach
| would have hard-to-define characteristics for things like cache
| prefetching. The next address is a function, not an easily
| predictable change.
|
| One frequent reason to use an array is to iterate the items. In
| those cases, non-contiguous memory layout is not ideal.
| forrestthewoods wrote:
| I would not call it "non-contiguous". It's more like "mostly
| contiguous". Which for large amounts data is "amortized
| contiguous" just like a regular vector has "amortized constant"
| time to add an element.
| dhooper wrote:
| 1. The image at the top of the article makes it clear the
| segments aren't contiguous
|
| 2. iterating a 4 billion item segmented array would have 26
| cache misses. Not a big deal.
| taminka wrote:
| i love a nice single header project, it's c23 only tho (bc of
| typeof)?
| gotoeleven wrote:
| Under what conditions is exponential segment sizing preferable to
| fixed size segments? Are there any specific algorithms or
| situations where this is especially good? It seems like the
| likelihood of large amounts of wasted space is a major downside.
| o11c wrote:
| It's always better - the increase in indexing complexity is
| negligible, and it completely eliminates resizing of the top-
| level array. It also reduces the number of calls to `malloc`
| while keeping capacity proportional to active size.
| listeria wrote:
| They mention using this as the backing array for a power-of-two-
| sized hash table, but I don't think it would be very useful
| considering that the hash table won't provide stable pointers,
| given that you would need to rehash every element as the table
| grows. Even if you just wanted to reuse the memory, rehashing in-
| place would be a PITA.
| Lichtso wrote:
| Another similar data structure which has a balanced tree (instead
| of a list) that references array segments is the
| https://en.wikipedia.org/wiki/Rope_(data_structure)
|
| Its main advantages are the O(log n) time complexity for all size
| changes at any index, meaning you can efficiently insert and
| delete anywhere, and it is easy to implement copy-on-write
| version control on top of it.
___________________________________________________________________
(page generated 2025-08-06 23:00 UTC)