[HN Gopher] Generic dynamic array in 60 lines of C
       ___________________________________________________________________
        
       Generic dynamic array in 60 lines of C
        
       Author : nice_byte
       Score  : 65 points
       Date   : 2023-02-28 18:50 UTC (4 hours ago)
        
 (HTM) web link (gist.github.com)
 (TXT) w3m dump (gist.github.com)
        
       | nice_byte wrote:
       | yes, i know growth by a factor of 2 has issues under certain
       | usage patterns. those are less likely to be problematic when
       | there is more stuff going on than just growing the buffer - you
       | can use the "hole" for other allocations.
        
         | rkeene2 wrote:
         | Hard to tease out what you're saying here, but maybe a Hashed
         | Array Table (HAT) is what you're referring to ? I have a
         | slightly extended version here [0][1]
         | 
         | [0] https://rkeene.org/viewer/tmp/hat.c.htm [1]
         | https://rkeene.org/viewer/tmp/hat.png.htm
        
           | quibono wrote:
           | I think GP meant that his implementation doubles the capacity
           | of the array on resize as opposed to increasing by 1.5 or
           | some other factor.
        
       | WalterBright wrote:
       | Heavy use of macros to do metaprogramming is a strong sign it's
       | time to move to a more powerful language.
        
         | cperciva wrote:
         | C + a set of macros _is_ a more powerful programming language.
        
           | JUNGLEISMASSIVE wrote:
           | [dead]
        
           | Night_Thastus wrote:
           | Also a far _worse_ language though. Macros have their place,
           | but trying to do anything complex with them just turns into a
           | total nightmare. They 're hard to reason about, walk through,
           | or modify.
           | 
           | If heavy macro usage is found, it's definitely time to
           | reconsider the approach.
        
             | cperciva wrote:
             | It really depends on how macros are used. If you're just
             | using them to implement "high level language features",
             | it's not a problem; sure, you might have trouble figuring
             | out what STAILQ_INSERT_TAIL does internally, but you're
             | going to have just as much trouble figuring out what the
             | Lisp or Perl or Python "add this item to the end of that
             | list" operations do internally.
             | 
             | Macros _can_ be a nightmare, but when they 're used
             | properly they're not.
        
               | WalterBright wrote:
               | > when they're used properly they're not
               | 
               | Yeah, they are.
               | 
               | Source: Decades of C programming
        
               | cperciva wrote:
               | You're not the only person with decades of experience.
        
               | WalterBright wrote:
               | We all suffer from this, so it's not personal, but one is
               | too often blind to the shortcomings of one's own code.
        
         | flohofwoe wrote:
         | On one hand that's obviously true, on the other hand it's just
         | 60 lines of code, which is easier to check and audit than (for
         | instance) a typical std::vector implementation - I don't know
         | what the equivalent is in D, please forgive my ignorance :)
        
           | tines wrote:
           | A typical std::vector implementation supports a lot more
           | things, so it's not an apples-to-apples comparison. It's like
           | saying that a horse-drawn carriage is a lot easier to repair
           | than a motor vehicle. It's true, but doesn't really say much.
        
         | wheelerof4te wrote:
         | C is powerful enough.
        
         | alar44 wrote:
         | Yeah that's a great opinion and all but generally C is used
         | because it HAS to be used these days. No one is going to use
         | this code for building a web application or will be tossing it
         | into legacy code.
        
         | jalino23 wrote:
         | I don't have this luxury if I want to learn about ffmpeg
         | libraries, its all written in c, almost all media and hardware
         | accelerator libraries are written in c, you go to another
         | language they just bind to c. its soo frustrating but I've no
         | choice but to stick with c.
        
           | ammar2 wrote:
           | huh? What's wrong with using the higher level languages if
           | they take care of binding to the C parts for you.
           | 
           | (I know these are far an in-between for ffmpeg, too many
           | leaky abstractions.)
        
             | flohofwoe wrote:
             | Usually increased build system complexity, outdated
             | manually maintained bindings, etc etc... - in some
             | languages (like Zig) this is really trivial, in other
             | languages it can be much harder.
        
             | jalino23 wrote:
             | try to find ffmpeg in go or rust that is < 5 years old and
             | complete. how about intel libva,
        
           | jpollock wrote:
           | Most languages have the ability to call C functions. Use the
           | language that helps you write your code and convert to call
           | the third party API.
           | 
           | For example, a C++ Vector provides a generic container and
           | the code can still call C functions with the underlying
           | array.
           | 
           | This is in the "Doctor it hurts if I do X" bucket.
           | 
           | https://stackoverflow.com/questions/2923272/how-to-
           | convert-v...
        
             | flohofwoe wrote:
             | Mixing high level C++ stdlib classes with low level C APIs
             | is often not exactly trivial, for instance most C++
             | containers expect that their items are RAII compatibel, and
             | the resulting memory corruption errors will be harder to
             | debug than plain old C code because the shit hits the fan
             | deep inside some completely unreadable C++ stdlib
             | implementation code.
        
             | cozzyd wrote:
             | Yes, but if you're writing code that you want to be
             | bindable to other languages, you essentially have to follow
             | the C ABI, in which case, writing in C is a natural choice
             | (yes, you can expose a C ABI from other languages, but
             | that's usually not natural and you sort of have to
             | understand C anyway to do that).
        
               | int_19h wrote:
               | Unless your API surface constitutes the majority of your
               | code, writing in C still doesn't make sense, because
               | you're paying the tax every time you have to manually
               | juggle strings and whatnot. C++ is a more sensible choice
               | for this - you can still write C-compatible headers (and
               | it's trivial to verify in builds - just include it in a
               | .c file and see if that compiles!) while using STL etc in
               | the implementation.
        
               | cozzyd wrote:
               | Yeah if you're using opaque handles (where what you have
               | is really a pointer to some C++ thing), this can work
               | well.
               | 
               | If you're exposing structs that are passed back and forth
               | for state, it's a bit more cumbersome.
        
           | WalterBright wrote:
           | D code can call C functions directly.
        
         | eimrine wrote:
         | Lisp?
        
           | lisper wrote:
           | Yes! :-)
        
           | tmtvl wrote:
           | When it comes to macros, accept no substitute.
           | 
           | Granted, Lisp already has arrays, so not much reason to
           | reinvent the wheel.                 (make-array '(2 6)
           | :initial-element 0 :element-type '(unsigned-byte 32))
           | ;; => #2A((0 0 0 0 0 0) (0 0 0 0 0 0))
        
         | Gibbon1 wrote:
         | If were me I'd use inline functions instead of macro's
        
       | posharma wrote:
       | Seriously, in this day and age, why are we still stuck with C?
       | There are battle tested container libraries in C++ for e.g.
        
       | nathants wrote:
       | awesome! i do the same thing for arrays[1] and maps[2].
       | 
       | stuff like this is great when you are trying to find the
       | performance ceiling of some workload in c/cpp. literally nothing
       | to hide.
       | 
       | 1. https://github.com/nathants/bsv/blob/master/util/array.h
       | 
       | 2. https://github.com/nathants/bsv/blob/master/util/map.h
        
       | marcodiego wrote:
       | I think this author should learn about the "do {} while(0)"
       | trick.
        
         | HarHarVeryFunny wrote:
         | What's the benefit of do {...} while (0) vs just {...} ?
        
           | tines wrote:
           | The latter makes the macro usable without requiring a
           | semicolon, which some people don't like; the former will
           | cause a syntax error if the semicolon is missing, so they
           | feel that it regulates syntax a bit better.
        
           | nice_byte wrote:
           | Works if you want to e.g. skip braces if if..else...
           | statements. Some would say putting braces in always is good
           | style, but I like skipping them myself so won't defend it :)
        
             | yakubin wrote:
             | I agree it's good style, but surrounding code using bad
             | style is still poor excuse for writing macros which break.
        
           | yakubin wrote:
           | do {...} while (0) is a statement, so it can be put between
           | "if (...)" and "; else", while just {...} would result in a
           | syntax error due to a hanging else.
        
             | HarHarVeryFunny wrote:
             | OK, but I guess what's really being relied on there is that
             | it's syntactically still a single statement when you follow
             | it with ';', whereas {...}; is two statements (which is
             | what breaks the if-else).
             | 
             | I just discovered that gcc also supports non-standard
             | "statement expressions" of the form ({...}) which would
             | serve the same purpose at the cost of portability.
        
           | LegionMammal978 wrote:
           | You can put it inside a braceless if/else statement and
           | follow it with a semicolon.
           | 
           | Compare:                 #define FOO() { ... }       #define
           | BAR() do { ... } while (0)
           | 
           | In the former case:                 if (x)         FOO();
           | else         ...;              // becomes       if (x) {
           | ...       }       ;       else         ...;
           | 
           | In the latter case:                 if (x)         BAR();
           | else         ...;              // becomes       if (x)
           | do {           ...         } while (0);       else
           | ...;
        
       | halayli wrote:
       | a.capacity <<= 1u;       decltype(a.data) tmp =
       | (decltype(a.data)) realloc(a.data, sizeof(a.data[0]) *
       | a.capacity);       assert(tmp != NULL);
       | 
       | That's not ideal. Imagine you are at 32GB capacity, the next
       | realloc will ask for 64GB which is pretty excessive.
        
       | fpoling wrote:
       | The code has integer overflow even on 64 bit CPU due to using
       | uint32 for capacity.
        
       | andrewmcwatters wrote:
       | I don't like the use of macros for things like this. Macros in
       | general should rarely or sparingly be used.
       | 
       | I'm also not certain one should use "end pointers."
       | Conventionally, it seems more advisable to use `size_t capacity`,
       | `size_t length`, and `void *data`.
       | 
       | Great use of Cunningham's Law, though! I appreciate C posts on
       | Hacker News.
        
         | User23 wrote:
         | Amusingly, Ward Cunningham denies inventing Cunningham's law
         | and I feel compelled to correct that potential
         | misunderstanding.
        
         | nice_byte wrote:
         | the alternative in c is something like glib's array container
         | which hides types. I dislike that more.
        
           | andrewmcwatters wrote:
           | Yes, but GLib arrays are also lossy. They require you to
           | externally manage capacity or to know with perfect foresight
           | exactly what your capacity will be for the lifetime of the
           | memory allocation.
           | 
           | I don't think those are impossible scenarios, but the cost of
           | one additional pointer in terms of size leaves you with a lot
           | more functionality. Saving one pointer in size and not having
           | capacity makes GLib arrays nearly useless, which I find
           | confusing.
           | 
           | You could simply pass a pointer and a size around instead.
           | Which is what most people actually do when they don't need
           | resizable data layouts.
           | 
           | If you're working with C, I think you have to just accept
           | that void pointers happen. Working around losing compile-time
           | type data requires you to create runtime structures, which I
           | don't find acceptable.
        
         | matheusmoreira wrote:
         | One should always use size_t to keep lengths. Reallocating the
         | array invalidates all pointers but not sizes.
        
       | latenightcoding wrote:
       | this one does RAII in C and it's most likely faster:
       | https://bit.ly/3ICOplU
        
       | orbital223 wrote:
       | DYN_ARR_RESET should probably be called DYN_ARR_INIT instead, as
       | calling it more than once will leak memory.
       | 
       | The handling of endptr in DYN_ARR_RESIZE seems to be incorrect.
       | If I have an array with 2 elements and capacity of 3 and I
       | DYN_ARR_RESIZE it to 5, I now have an array with 5 elements, 3 of
       | which are garbage values.
        
         | aslilac wrote:
         | ~~Isn't the handling of `endptr` incorrect in `DYN_ARR_RESET`
         | as well? `a.endptr = a.data;` feels very incorrect to me.~~
         | 
         | Nevermind, I see how it's supposed to work now.
         | 
         | To your point, `RESIZE` is definitely incorrect. It should be
         | checking the length before calling `realloc`.
        
           | parasti wrote:
           | Thats just idiomatic C. Set the "pointer to end" to be equal
           | to the "pointer to start".
        
         | matheusmoreira wrote:
         | The values are garbage because they haven't been initialized
         | yet. Other code may write to those locations.
        
         | nice_byte wrote:
         | that's the intended behavior.
         | 
         | this somewhat mirrors what happens with a std::vector when you
         | call resize() except of course c doesn't have default ctors.
         | the point of resize is to make exactly "size" elements of the
         | dynamic array addressable.
        
       | olliej wrote:
       | I dislike this - having the "array" be a struct containing the
       | pointer and size fields makes it easy to "copy" the array such
       | that you get dangling pointers. Similarly there's no way to track
       | lifetime or ownership of the array.
       | 
       | There are long term ABI benefits to these data structures just
       | being opaque pointers outside of the implementing libraries
        
         | nice_byte wrote:
         | no reason for it to ever cross abi boundary.
        
       | wzdd wrote:
       | This seems pretty sensical to me except that it achieves its
       | small size by sacrificing error checking and handling, even in
       | the (aiui) non-exceptional case of realloc() returning NULL. This
       | is of course classic C program behaviour so perhaps that's fine
       | ;-)
        
       | downvotetruth wrote:
       | data (startptr), endptr & capacity?! Memory might be cheap enough
       | to waste, but more derefs do not help timings.
        
         | exitb wrote:
         | Capacity may not be equal to size.
        
         | JonChesterfield wrote:
         | Three pointers (or two and a size, or one and two sizes) is
         | pretty standard for dynamic arrays. It lets you allocate more
         | than one element a time when repeatedly appending one element.
         | 
         | No checking malloc or realloc though. The anonymous struct is
         | dubious too, though maybe being unable to pass these things to
         | functions is a feature.
        
           | nice_byte wrote:
           | functions can work with just pointer / size / stride. think
           | of it as stl iterators, there's no real point to be passing
           | the container itself around.
           | 
           | however, you can still do that, if you really want.
           | 
           | just `typedef DYN_ARR_OF(widget) widget_array;` and now you
           | have a name-able type, and can even have dynamic-arrays-of-
           | dynamic-arrays (`DYN_ARR_OF(widget_array)`).
        
       | kiryk wrote:
       | I like that the comments mention another two implementations of a
       | similar idea and all are a bit different.
       | 
       | Here's one I've written a few years ago:
       | https://github.com/kiryk/mlisp/blob/master/vector.c
       | 
       | You may find it dirty, but it can be wrapped using macros, and
       | used both on LHS and RHS like in "string(v, i) = ch"
       | 
       | (the macros BTW)
       | https://github.com/kiryk/mlisp/blob/71d028738d2f9607fa7ce8c1...
       | 
       | (and a usecase)
       | https://github.com/kiryk/mlisp/blob/71d028738d2f9607fa7ce8c1...
        
       | saagarjha wrote:
       | Why use macros instead of inline functions?
        
         | vore wrote:
         | Inline functions can't be made type safe (or at least, as type
         | safe as C gets) over generic arrays.
        
         | flohofwoe wrote:
         | Because the API needs to work for random item types. C doesn't
         | have templates (and for stuff like this C11's _Generic isn't
         | helpful because that just maps specialized function names to
         | generic function names, but doesn't help with the generic
         | implementation code).
        
       | kzrdude wrote:
       | Not an entirely uncommon idea. I've written one.
       | 
       | There's also a well-known one here, in klib:
       | https://github.com/attractivechaos/klib/blob/master/kvec.h
        
         | globalreset wrote:
         | Who didn't. Almost any C program dealing with strings and
         | collections has to have their own implementation or import one.
         | 
         | Part of the reason why C developers "feel" productive, but
         | can't produce anything of meaningful complexity.
        
           | [deleted]
        
           | matheusmoreira wrote:
           | Have you ever heard of Linux?
        
           | kzrdude wrote:
           | You hurt me with the truth.
           | 
           | Anyway, I think this HN posting has this "crazy" flavor of
           | using macros to simulate generics, and that's the specific
           | kind of implementation that I meant, which klib also does.
        
           | nice_byte wrote:
           | til the Linux kernel has no meaningful complexity
        
             | kevin_thibedeau wrote:
             | Still waiting for a revolutionary OS written in
             | $virtuous_lang to save us from the scourge of unsafe Linux.
             | They've had since Ada-83 to get something off the ground.
             | Guess they aren't so productive either.
        
           | rootw0rm wrote:
           | I think sqlite is fairly complex
        
       | sitkack wrote:
       | This isn't C, this is preprocessor. Tomato tomato, who cares if
       | it is 60 lines or 600.
       | 
       | https://doc.rust-lang.org/src/alloc/vec/mod.rs.html#400
       | 
       | https://docs.rs/containers/latest/src/containers/collections...
        
       | klyrs wrote:
       | DON'T USE THIS
       | 
       | As fpoling points out, capacity is a 32-bit unsigned. That can
       | overflow. There is no safety in append: if capacity is zero no
       | new space will be made, if capacity overflows the realloc will
       | not have enough space -- in either case, you end up writing past
       | the end.
       | 
       | Allocating a [capacity] zero array and appending to it is
       | extremely common. That you'll write past the end in that case
       | shows that the author has barely even used this.
       | 
       | The code is unacceptably bad and unsafe. I don't usually do this,
       | but I'm going to flag this post. I encourage everybody to do the
       | same.
       | 
       | Apologies to the author. Golf is fun. This is uncool.
       | 
       | edit: changed size to [capacity]
        
         | marginalia_nu wrote:
         | Is this overflow really a realistic scenario? I imagine most
         | machines won't even permit you to allocate the sorts of sizes
         | that would cause this bug.
         | 
         | In that scenario (where you're allocating a multi-gigabytes
         | array), you typically know the size ahead of time, instead of
         | growing it dynamically, as the latter doesn't really perform
         | too well.
        
           | klyrs wrote:
           | 2Gb (1<<31 + 1 = boom) just isn't that much memory these
           | days. If you're parsing large text files, for example, it's
           | _really_ easy to blow that limit. And what 's the point of
           | this dynamically-growing array if not to forget about
           | managing capacity?
        
         | nice_byte wrote:
         | I have been using this for years. Capacity is not size. A zero
         | size array will have non zero capacity. All c code is unsafe.
        
           | klyrs wrote:
           | Where do you check that capacity is nonzero? When capacity is
           | zero, what happens on this line?
           | a.capacity <<= 1u;
           | 
           | "all c code is unsafe" is not an excuse to permit bloody
           | obvious, undocumented memory overruns.
           | 
           | I write a lot of c. Avoiding the unsafe bits, avoiding UB, is
           | _the skill_ required to write good c.  "C code is unsafe" is
           | a Rustacean marketing slogan. Don't believe it, but
           | _definitely_ don 't practice it.
        
             | nice_byte wrote:
             | just don't initialize it with 0 capacity :-)
        
               | skullt wrote:
               | Why not just fix it? It's a trivial fix and has the added
               | benefit that a zero-initialized DYN_ARR_OF(x) struct
               | would be in a valid state, which is always nice. A struct
               | with several dynamic arrays can then much simpler to
               | initialize, for example.
        
         | [deleted]
        
       | jxy wrote:
       | What does it actually gain by using macros instead of proper
       | functions? The only generic macro that can't be written in
       | function is the one use `type`, but saving the type size in the
       | struct is enough for this kind of code.
        
       ___________________________________________________________________
       (page generated 2023-02-28 23:01 UTC)