[HN Gopher] Generic dynamic array in 60 lines of C
___________________________________________________________________
Generic dynamic array in 60 lines of C
Author : nice_byte
Score : 65 points
Date : 2023-02-28 18:50 UTC (4 hours ago)
(HTM) web link (gist.github.com)
(TXT) w3m dump (gist.github.com)
| nice_byte wrote:
| yes, i know growth by a factor of 2 has issues under certain
| usage patterns. those are less likely to be problematic when
| there is more stuff going on than just growing the buffer - you
| can use the "hole" for other allocations.
| rkeene2 wrote:
| Hard to tease out what you're saying here, but maybe a Hashed
| Array Table (HAT) is what you're referring to ? I have a
| slightly extended version here [0][1]
|
| [0] https://rkeene.org/viewer/tmp/hat.c.htm [1]
| https://rkeene.org/viewer/tmp/hat.png.htm
| quibono wrote:
| I think GP meant that his implementation doubles the capacity
| of the array on resize as opposed to increasing by 1.5 or
| some other factor.
| WalterBright wrote:
| Heavy use of macros to do metaprogramming is a strong sign it's
| time to move to a more powerful language.
| cperciva wrote:
| C + a set of macros _is_ a more powerful programming language.
| JUNGLEISMASSIVE wrote:
| [dead]
| Night_Thastus wrote:
| Also a far _worse_ language though. Macros have their place,
| but trying to do anything complex with them just turns into a
| total nightmare. They 're hard to reason about, walk through,
| or modify.
|
| If heavy macro usage is found, it's definitely time to
| reconsider the approach.
| cperciva wrote:
| It really depends on how macros are used. If you're just
| using them to implement "high level language features",
| it's not a problem; sure, you might have trouble figuring
| out what STAILQ_INSERT_TAIL does internally, but you're
| going to have just as much trouble figuring out what the
| Lisp or Perl or Python "add this item to the end of that
| list" operations do internally.
|
| Macros _can_ be a nightmare, but when they 're used
| properly they're not.
| WalterBright wrote:
| > when they're used properly they're not
|
| Yeah, they are.
|
| Source: Decades of C programming
| cperciva wrote:
| You're not the only person with decades of experience.
| WalterBright wrote:
| We all suffer from this, so it's not personal, but one is
| too often blind to the shortcomings of one's own code.
| flohofwoe wrote:
| On one hand that's obviously true, on the other hand it's just
| 60 lines of code, which is easier to check and audit than (for
| instance) a typical std::vector implementation - I don't know
| what the equivalent is in D, please forgive my ignorance :)
| tines wrote:
| A typical std::vector implementation supports a lot more
| things, so it's not an apples-to-apples comparison. It's like
| saying that a horse-drawn carriage is a lot easier to repair
| than a motor vehicle. It's true, but doesn't really say much.
| wheelerof4te wrote:
| C is powerful enough.
| alar44 wrote:
| Yeah that's a great opinion and all but generally C is used
| because it HAS to be used these days. No one is going to use
| this code for building a web application or will be tossing it
| into legacy code.
| jalino23 wrote:
| I don't have this luxury if I want to learn about ffmpeg
| libraries, its all written in c, almost all media and hardware
| accelerator libraries are written in c, you go to another
| language they just bind to c. its soo frustrating but I've no
| choice but to stick with c.
| ammar2 wrote:
| huh? What's wrong with using the higher level languages if
| they take care of binding to the C parts for you.
|
| (I know these are far an in-between for ffmpeg, too many
| leaky abstractions.)
| flohofwoe wrote:
| Usually increased build system complexity, outdated
| manually maintained bindings, etc etc... - in some
| languages (like Zig) this is really trivial, in other
| languages it can be much harder.
| jalino23 wrote:
| try to find ffmpeg in go or rust that is < 5 years old and
| complete. how about intel libva,
| jpollock wrote:
| Most languages have the ability to call C functions. Use the
| language that helps you write your code and convert to call
| the third party API.
|
| For example, a C++ Vector provides a generic container and
| the code can still call C functions with the underlying
| array.
|
| This is in the "Doctor it hurts if I do X" bucket.
|
| https://stackoverflow.com/questions/2923272/how-to-
| convert-v...
| flohofwoe wrote:
| Mixing high level C++ stdlib classes with low level C APIs
| is often not exactly trivial, for instance most C++
| containers expect that their items are RAII compatibel, and
| the resulting memory corruption errors will be harder to
| debug than plain old C code because the shit hits the fan
| deep inside some completely unreadable C++ stdlib
| implementation code.
| cozzyd wrote:
| Yes, but if you're writing code that you want to be
| bindable to other languages, you essentially have to follow
| the C ABI, in which case, writing in C is a natural choice
| (yes, you can expose a C ABI from other languages, but
| that's usually not natural and you sort of have to
| understand C anyway to do that).
| int_19h wrote:
| Unless your API surface constitutes the majority of your
| code, writing in C still doesn't make sense, because
| you're paying the tax every time you have to manually
| juggle strings and whatnot. C++ is a more sensible choice
| for this - you can still write C-compatible headers (and
| it's trivial to verify in builds - just include it in a
| .c file and see if that compiles!) while using STL etc in
| the implementation.
| cozzyd wrote:
| Yeah if you're using opaque handles (where what you have
| is really a pointer to some C++ thing), this can work
| well.
|
| If you're exposing structs that are passed back and forth
| for state, it's a bit more cumbersome.
| WalterBright wrote:
| D code can call C functions directly.
| eimrine wrote:
| Lisp?
| lisper wrote:
| Yes! :-)
| tmtvl wrote:
| When it comes to macros, accept no substitute.
|
| Granted, Lisp already has arrays, so not much reason to
| reinvent the wheel. (make-array '(2 6)
| :initial-element 0 :element-type '(unsigned-byte 32))
| ;; => #2A((0 0 0 0 0 0) (0 0 0 0 0 0))
| Gibbon1 wrote:
| If were me I'd use inline functions instead of macro's
| posharma wrote:
| Seriously, in this day and age, why are we still stuck with C?
| There are battle tested container libraries in C++ for e.g.
| nathants wrote:
| awesome! i do the same thing for arrays[1] and maps[2].
|
| stuff like this is great when you are trying to find the
| performance ceiling of some workload in c/cpp. literally nothing
| to hide.
|
| 1. https://github.com/nathants/bsv/blob/master/util/array.h
|
| 2. https://github.com/nathants/bsv/blob/master/util/map.h
| marcodiego wrote:
| I think this author should learn about the "do {} while(0)"
| trick.
| HarHarVeryFunny wrote:
| What's the benefit of do {...} while (0) vs just {...} ?
| tines wrote:
| The latter makes the macro usable without requiring a
| semicolon, which some people don't like; the former will
| cause a syntax error if the semicolon is missing, so they
| feel that it regulates syntax a bit better.
| nice_byte wrote:
| Works if you want to e.g. skip braces if if..else...
| statements. Some would say putting braces in always is good
| style, but I like skipping them myself so won't defend it :)
| yakubin wrote:
| I agree it's good style, but surrounding code using bad
| style is still poor excuse for writing macros which break.
| yakubin wrote:
| do {...} while (0) is a statement, so it can be put between
| "if (...)" and "; else", while just {...} would result in a
| syntax error due to a hanging else.
| HarHarVeryFunny wrote:
| OK, but I guess what's really being relied on there is that
| it's syntactically still a single statement when you follow
| it with ';', whereas {...}; is two statements (which is
| what breaks the if-else).
|
| I just discovered that gcc also supports non-standard
| "statement expressions" of the form ({...}) which would
| serve the same purpose at the cost of portability.
| LegionMammal978 wrote:
| You can put it inside a braceless if/else statement and
| follow it with a semicolon.
|
| Compare: #define FOO() { ... } #define
| BAR() do { ... } while (0)
|
| In the former case: if (x) FOO();
| else ...; // becomes if (x) {
| ... } ; else ...;
|
| In the latter case: if (x) BAR();
| else ...; // becomes if (x)
| do { ... } while (0); else
| ...;
| halayli wrote:
| a.capacity <<= 1u; decltype(a.data) tmp =
| (decltype(a.data)) realloc(a.data, sizeof(a.data[0]) *
| a.capacity); assert(tmp != NULL);
|
| That's not ideal. Imagine you are at 32GB capacity, the next
| realloc will ask for 64GB which is pretty excessive.
| fpoling wrote:
| The code has integer overflow even on 64 bit CPU due to using
| uint32 for capacity.
| andrewmcwatters wrote:
| I don't like the use of macros for things like this. Macros in
| general should rarely or sparingly be used.
|
| I'm also not certain one should use "end pointers."
| Conventionally, it seems more advisable to use `size_t capacity`,
| `size_t length`, and `void *data`.
|
| Great use of Cunningham's Law, though! I appreciate C posts on
| Hacker News.
| User23 wrote:
| Amusingly, Ward Cunningham denies inventing Cunningham's law
| and I feel compelled to correct that potential
| misunderstanding.
| nice_byte wrote:
| the alternative in c is something like glib's array container
| which hides types. I dislike that more.
| andrewmcwatters wrote:
| Yes, but GLib arrays are also lossy. They require you to
| externally manage capacity or to know with perfect foresight
| exactly what your capacity will be for the lifetime of the
| memory allocation.
|
| I don't think those are impossible scenarios, but the cost of
| one additional pointer in terms of size leaves you with a lot
| more functionality. Saving one pointer in size and not having
| capacity makes GLib arrays nearly useless, which I find
| confusing.
|
| You could simply pass a pointer and a size around instead.
| Which is what most people actually do when they don't need
| resizable data layouts.
|
| If you're working with C, I think you have to just accept
| that void pointers happen. Working around losing compile-time
| type data requires you to create runtime structures, which I
| don't find acceptable.
| matheusmoreira wrote:
| One should always use size_t to keep lengths. Reallocating the
| array invalidates all pointers but not sizes.
| latenightcoding wrote:
| this one does RAII in C and it's most likely faster:
| https://bit.ly/3ICOplU
| orbital223 wrote:
| DYN_ARR_RESET should probably be called DYN_ARR_INIT instead, as
| calling it more than once will leak memory.
|
| The handling of endptr in DYN_ARR_RESIZE seems to be incorrect.
| If I have an array with 2 elements and capacity of 3 and I
| DYN_ARR_RESIZE it to 5, I now have an array with 5 elements, 3 of
| which are garbage values.
| aslilac wrote:
| ~~Isn't the handling of `endptr` incorrect in `DYN_ARR_RESET`
| as well? `a.endptr = a.data;` feels very incorrect to me.~~
|
| Nevermind, I see how it's supposed to work now.
|
| To your point, `RESIZE` is definitely incorrect. It should be
| checking the length before calling `realloc`.
| parasti wrote:
| Thats just idiomatic C. Set the "pointer to end" to be equal
| to the "pointer to start".
| matheusmoreira wrote:
| The values are garbage because they haven't been initialized
| yet. Other code may write to those locations.
| nice_byte wrote:
| that's the intended behavior.
|
| this somewhat mirrors what happens with a std::vector when you
| call resize() except of course c doesn't have default ctors.
| the point of resize is to make exactly "size" elements of the
| dynamic array addressable.
| olliej wrote:
| I dislike this - having the "array" be a struct containing the
| pointer and size fields makes it easy to "copy" the array such
| that you get dangling pointers. Similarly there's no way to track
| lifetime or ownership of the array.
|
| There are long term ABI benefits to these data structures just
| being opaque pointers outside of the implementing libraries
| nice_byte wrote:
| no reason for it to ever cross abi boundary.
| wzdd wrote:
| This seems pretty sensical to me except that it achieves its
| small size by sacrificing error checking and handling, even in
| the (aiui) non-exceptional case of realloc() returning NULL. This
| is of course classic C program behaviour so perhaps that's fine
| ;-)
| downvotetruth wrote:
| data (startptr), endptr & capacity?! Memory might be cheap enough
| to waste, but more derefs do not help timings.
| exitb wrote:
| Capacity may not be equal to size.
| JonChesterfield wrote:
| Three pointers (or two and a size, or one and two sizes) is
| pretty standard for dynamic arrays. It lets you allocate more
| than one element a time when repeatedly appending one element.
|
| No checking malloc or realloc though. The anonymous struct is
| dubious too, though maybe being unable to pass these things to
| functions is a feature.
| nice_byte wrote:
| functions can work with just pointer / size / stride. think
| of it as stl iterators, there's no real point to be passing
| the container itself around.
|
| however, you can still do that, if you really want.
|
| just `typedef DYN_ARR_OF(widget) widget_array;` and now you
| have a name-able type, and can even have dynamic-arrays-of-
| dynamic-arrays (`DYN_ARR_OF(widget_array)`).
| kiryk wrote:
| I like that the comments mention another two implementations of a
| similar idea and all are a bit different.
|
| Here's one I've written a few years ago:
| https://github.com/kiryk/mlisp/blob/master/vector.c
|
| You may find it dirty, but it can be wrapped using macros, and
| used both on LHS and RHS like in "string(v, i) = ch"
|
| (the macros BTW)
| https://github.com/kiryk/mlisp/blob/71d028738d2f9607fa7ce8c1...
|
| (and a usecase)
| https://github.com/kiryk/mlisp/blob/71d028738d2f9607fa7ce8c1...
| saagarjha wrote:
| Why use macros instead of inline functions?
| vore wrote:
| Inline functions can't be made type safe (or at least, as type
| safe as C gets) over generic arrays.
| flohofwoe wrote:
| Because the API needs to work for random item types. C doesn't
| have templates (and for stuff like this C11's _Generic isn't
| helpful because that just maps specialized function names to
| generic function names, but doesn't help with the generic
| implementation code).
| kzrdude wrote:
| Not an entirely uncommon idea. I've written one.
|
| There's also a well-known one here, in klib:
| https://github.com/attractivechaos/klib/blob/master/kvec.h
| globalreset wrote:
| Who didn't. Almost any C program dealing with strings and
| collections has to have their own implementation or import one.
|
| Part of the reason why C developers "feel" productive, but
| can't produce anything of meaningful complexity.
| [deleted]
| matheusmoreira wrote:
| Have you ever heard of Linux?
| kzrdude wrote:
| You hurt me with the truth.
|
| Anyway, I think this HN posting has this "crazy" flavor of
| using macros to simulate generics, and that's the specific
| kind of implementation that I meant, which klib also does.
| nice_byte wrote:
| til the Linux kernel has no meaningful complexity
| kevin_thibedeau wrote:
| Still waiting for a revolutionary OS written in
| $virtuous_lang to save us from the scourge of unsafe Linux.
| They've had since Ada-83 to get something off the ground.
| Guess they aren't so productive either.
| rootw0rm wrote:
| I think sqlite is fairly complex
| sitkack wrote:
| This isn't C, this is preprocessor. Tomato tomato, who cares if
| it is 60 lines or 600.
|
| https://doc.rust-lang.org/src/alloc/vec/mod.rs.html#400
|
| https://docs.rs/containers/latest/src/containers/collections...
| klyrs wrote:
| DON'T USE THIS
|
| As fpoling points out, capacity is a 32-bit unsigned. That can
| overflow. There is no safety in append: if capacity is zero no
| new space will be made, if capacity overflows the realloc will
| not have enough space -- in either case, you end up writing past
| the end.
|
| Allocating a [capacity] zero array and appending to it is
| extremely common. That you'll write past the end in that case
| shows that the author has barely even used this.
|
| The code is unacceptably bad and unsafe. I don't usually do this,
| but I'm going to flag this post. I encourage everybody to do the
| same.
|
| Apologies to the author. Golf is fun. This is uncool.
|
| edit: changed size to [capacity]
| marginalia_nu wrote:
| Is this overflow really a realistic scenario? I imagine most
| machines won't even permit you to allocate the sorts of sizes
| that would cause this bug.
|
| In that scenario (where you're allocating a multi-gigabytes
| array), you typically know the size ahead of time, instead of
| growing it dynamically, as the latter doesn't really perform
| too well.
| klyrs wrote:
| 2Gb (1<<31 + 1 = boom) just isn't that much memory these
| days. If you're parsing large text files, for example, it's
| _really_ easy to blow that limit. And what 's the point of
| this dynamically-growing array if not to forget about
| managing capacity?
| nice_byte wrote:
| I have been using this for years. Capacity is not size. A zero
| size array will have non zero capacity. All c code is unsafe.
| klyrs wrote:
| Where do you check that capacity is nonzero? When capacity is
| zero, what happens on this line?
| a.capacity <<= 1u;
|
| "all c code is unsafe" is not an excuse to permit bloody
| obvious, undocumented memory overruns.
|
| I write a lot of c. Avoiding the unsafe bits, avoiding UB, is
| _the skill_ required to write good c. "C code is unsafe" is
| a Rustacean marketing slogan. Don't believe it, but
| _definitely_ don 't practice it.
| nice_byte wrote:
| just don't initialize it with 0 capacity :-)
| skullt wrote:
| Why not just fix it? It's a trivial fix and has the added
| benefit that a zero-initialized DYN_ARR_OF(x) struct
| would be in a valid state, which is always nice. A struct
| with several dynamic arrays can then much simpler to
| initialize, for example.
| [deleted]
| jxy wrote:
| What does it actually gain by using macros instead of proper
| functions? The only generic macro that can't be written in
| function is the one use `type`, but saving the type size in the
| struct is enough for this kind of code.
___________________________________________________________________
(page generated 2023-02-28 23:01 UTC)