[HN Gopher] Datatype99: C99 with Sum Types, v0.1.0
___________________________________________________________________
Datatype99: C99 with Sum Types, v0.1.0
Author : Hirrolot
Score : 147 points
Date : 2021-02-04 11:27 UTC (11 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| marcodiego wrote:
| Can it be used to implement something similar to typen_ame() or
| type_id()?
| Hirrolot wrote:
| Something similar, yes. Behind the scenes, datatype99 deduces a
| type of a matched expression using the typedefs generated
| earlier.
| marcodiego wrote:
| You mean, type_id and type_name would have to be handwritten
| functions, right?
|
| If it is possible to automatically generate these, could you
| provide an example?
| Hirrolot wrote:
| We can generate a type ID by defining an extern variable of
| a sum type inside the `datatype` macro. Its address then
| will be a type ID (though I am not sure that such two
| variables will always have different addresses, I need to
| check the standard).
|
| About type_name: you can deduce a type name if you know at
| least one tag of a sum type. For example, if a tag is
| `Foo`, then `FooSumT` is a typedef to an outer sum type
| (datatype99 generates this typedef).
| eqvinox wrote:
| Two non-const variables will always have different
| addresses if they are of the same type. Different types
| can [AFAIR] per the standard theoretically be in
| different address spaces, though this is rare in
| practice.
| a1369209993 wrote:
| > Different types can [AFAIR] per the standard
| theoretically be in different address spaces
|
| If you can force the types to contain a member of the
| same type (say, a int or enum foobar_discriminant), the
| pointers &const_var_foo.disc and &const_var_bar.disc
| would have to differ for obvious reasons.
| enriquto wrote:
| What is this sorcery? #include <datatype99.h>
| datatype( BinaryTree, (Leaf,
| int), (Node, struct BinaryTree *, int, struct
| BinaryTree *) );
|
| What kind of preprocessor black magic can you do so that this
| code compiles? I don't dare to open the file datatype99.h lest it
| casts a dark spell on me.
| willxinc wrote:
| https://github.com/Hirrolot/epilepsy
|
| According to the GitHub, which appears to be a metaprogramming
| library by the same author.
| flohofwoe wrote:
| The whole system seems to be built on top of variadic macros
| (which had been standardized in C99).
| Hirrolot wrote:
| Yes, as others have already noticed, it is implemented on top
| of https://github.com/Hirrolot/epilepsy, a metalanguage for
| preprocessor metaprogramming.
| orra wrote:
| In case anybody mistakenly thinks doing so is 'cheating',
| you're the author of both libraries!
|
| This is absurd, but in a good way.
| jrimbault wrote:
| > a metalanguage for preprocessor metaprogramming
|
| That sentence is everything I was taught to avoid. Props to
| you.
|
| I like the drug/alcohol metaphor for these things, as a child
| avoid them, as an adult do your thing as long as you remain
| sensible.
| globuous wrote:
| Funny you use this example, the author's GH page says he's
| 16; in many countries he wouldn't be able to drink, event
| responsibly ^^
| masklinn wrote:
| > in many countries he wouldn't be able to drink
|
| But in most countries they would be able to drink
| (especially in a private setting), just not purchase. In
| Europe for instance less than half a dozen countries have
| a minimum drinking age for private settings, and a dozen
| more have an MDA in public (but not private), though for
| some 16 is enough to drink fermented alcohols (alongside
| a meal with adults for the UK, regardless for Germany and
| Austria).
| enriquto wrote:
| LOL, I cannot get enough of this shit! Can you point to some
| cool implementation tricks that you used to achieve that? I
| guess there's quite a bit of ## concatenation to create
| temporary names and stuff?
| Hirrolot wrote:
| Well, mostly I use idioms from the Cloak Wiki
| (https://github.com/pfultz2/Cloak/wiki/C-Preprocessor-
| tricks,...). It takes some time to realise them :)
| kzrdude wrote:
| It's syntax sugar for tagged unions, of course. It's not entirely
| out there that C could include such support natively in a future
| standard.
| rbonvall wrote:
| I predict all languages will adopt some form of sum types and
| pattern matching in the next 10 years. They will go from
| esoteric to mainstream in the same way lambdas did.
| mhh__ wrote:
| Currently with D the two people in charge both favour library
| solutions, but so far I'm seriously leaning towards a
| language one - it seems to hard to do type erasure
| efficiently otherwise, for example.
| mtzet wrote:
| I find it really interresting that such a simple concept as a
| tagged union, which appeared at least as far back as ALGOL
| 68, could be completely abandoned during the 90's and 00's
| where OOP reigned surpreme, and only seems to be gaining
| popularity again now that pattern matching is all the rage.
|
| It's just baffling to me that tagged unions could be shunned
| in favour of the visitor pattern. How could that ever look
| like progress?
|
| Being a millenial, I don't know what programming was like the
| 80's, but surely tagged unions must have been bread and
| butter for Pascal programmers?
| flohofwoe wrote:
| Tagged unions have always been around as runtime
| constructs, they just had different names (e.g. "variant").
| In C it's understandable why they aren't part of the
| language: they would be the only data type that requires a
| "builtin" opaque struct type to hold the tag and payload
| union (e.g. something like this, but hidden from the
| programmer): struct my_tagged_union {
| enum my_tag_enum tag; union { ....
| } payload; };
|
| That would be a pretty big break with "tradition" for the C
| language ;) (e.g. it would be similar to a "builtin string
| type")
|
| And of course tagged unions as "generic user-defined types"
| are much less convenient/useful than being directly
| integrated into the language syntax and type system.
| nicoburns wrote:
| Structs aren't really any more fundamental than tagged
| unions though. They also contain bytes which are hidden
| from the language (padding). If a struct can have
| automatically generated padding bytes, why shouldn't a
| tagged union have an automatically generated tag?
| eqvinox wrote:
| Mostly because C structs have a very exactly bound memory
| representation, and padding bytes are just about the only
| thing you can - mostly - get away with not caring about.
|
| Coincidentally, even padding bytes break things
| _technically_. Since they 're undefined, you - in theory
| - can't ever calculate a bytewise hash value over a
| struct that has any padding bytes. It's still done
| frequently, it's just nulled out beforehand and copying
| is bytewise anyway. Doesn't change the fact that to the
| spec letter it's undefined behaviour. GCC also added
| __builtin_clear_padding() recently.
|
| Hidden tags ... no. Just no.
| pornel wrote:
| Tagged unions built-in into the language can have layout
| optimizations, e.g. in Rust sizeof(Option<&i32>) ==
| sizeof(&i32), because Rust knows it can repurpose NULL as
| the None value.
| dboreham wrote:
| There weren't many Pascal programmers, but for C
| programmers, sure.
| pjmlp wrote:
| Sure there were, in Europe during the heyday of CP/M, MS-
| DOS, Mac OS (until MPW and PowerPlant).
| marcosdumay wrote:
| To be fair, tagged unions are too error prone to be useful
| without strong types. That never stopped people from using
| them, but when a statically enforced alternative came
| along, everybody jumped ship.
| simias wrote:
| Tagged unions never went away, it's just that, as you
| mention, C++ style OOP became very popular and they worked
| the other way around: instead of having a single function
| that matches the tag to figure out what to do with the
| data, you ship data-specific functions straight inside the
| object (virtual methods).
|
| In theory you can have both of course, and C++ now has
| support for variants (because we know C++ tries to be come
| a superset of all programming languages in existence). It's
| still a bit clunky to use though, especially because of the
| lack of "match" support in the language.
|
| But I think a problem is that it's easy to mess up tagged
| union in low level, non-managed languages. Rust's enum work
| great, but that's thanks to the very strong type system and
| borrow checker which makes them effectively impossible to
| mess up in safe code.
|
| In C or C++ it can become messy and hard to debug. In C you
| don't really have a choice, so you do it anyway but in C++
| virtual methods and inheritance are usually a bit easier to
| manage.
| jerf wrote:
| "I find it really interresting that such a simple concept
| as a tagged union, which appeared at least as far back as
| ALGOL 68, could be completely abandoned during the 90's and
| 00's where OOP reigned surpreme"
|
| From a modern perspective, it may be difficult to
| understand just how _crushing_ the OO consensus was in the
| 1980s and 1990s. It still reverberates in our language
| design today. OO was good design was OO. If it didn 't fit
| into OO, that was _ipso facto_ proof that it was bad. Close
| the book. No further evidence required. Sum types? Kinda
| overlaps inheritance. Bad design. Not OO, therefore bad
| design. Why is Java a better language than C++? It 's more
| OO, so QED.
|
| OO here, by the way, is specifically that variant that
| emerged in the form of C++ and Java. Very important to have
| defined, enforced "private" keywords and deep hierarchies.
| Smalltalk by this definition is _not_ an OO language. (And
| also, therefore, bad. Also possibly heretical for trying to
| steal the term OO... and I mean all the implications of the
| word "heretical".)
|
| My "Software Engineering" course circa 1999 was basically
| "How to Design Software via OO". (It has the distinction of
| being the one course from my entire college career that I'm
| not sure I agree with a _single thing_ I "learned" in that
| course. Even at the time I had done enough real work to be
| suspicious, and from here in 2021 I find the whole thing
| risible. But I got a 4.0, so....) Formal diagrams with
| strictly enforced diagram languages like UML, because OO
| was the future of Good Design and UML was how OO was going
| to help bring programming to the masses who would only have
| to draw Object Hierarchies and then {magic goes here that
| never was worked out} and perfect programs!
|
| The internet hit programmers and programming earliest, and
| now there's only a remnant of the OO dogma just hanging on
| by its teeth in school curricula that haven't hardly been
| changed in 25 years, and it lacks the power to envelop the
| students the way it used to.
|
| There was of course always a counterculture that didn't
| listen, which is where you get Smalltalk, Perl, Haskell and
| its ancestry, Python, etc. But you have to bear in mind
| that that was a _counterculture_...
|
| ... and you know, even today, Java and C++ are still
| _pretty darned popular_ and it 's not _that_ hard to find
| people who still only know one of those (or maybe both) and
| still think OO [?] good design [?] OO, and the vast chaotic
| landscape of languages that don 't even have "private" is
| just kid stuff for people who can't handle Real Design.
| Which is OO. Because OO is good, and good is OO.
| steerablesafe wrote:
| If `match` compiles down to a switch case on the type
| discriminator then it can easily beat current implentations of
| C++'s std::visit on std::variant.
|
| I assume pattern matching must be exhaustive. Can there be a
| default case?
| sesuximo wrote:
| What else can std visit generate? I'd imagine it does exactly
| that
| ynik wrote:
| std::variant is implemented with variadic templates; but C++
| doesn't have a "variadic switch" statement. So std::visit
| cannot directly use a switch statement. Instead it's usually
| implemented as a recursive function with the hope that the
| compiler will unroll the recursion and optimize it into a
| switch. This hope usually doesn't work out in practice.
| steerablesafe wrote:
| AFAIK the usual implementation is statically generating an
| array of function pointers and dispatching based on the
| type discriminator. This works even less for inlining.
| Hirrolot wrote:
| Yes, just write otherwise { ... } to handle a default case.
| geon wrote:
| How does it compare to the tagged unions in Zig, which aims to be
| "C with the bugs fixed"?
| flohofwoe wrote:
| Zig has the advantage that it can add the syntax sugar right
| into the language instead of using macro "workarounds", but I'd
| guess in the end both have similar lowlevel memory layout (a
| struct with an enum tag, followed by a payload "raw union") and
| generated code (e.g. it would be interesting to look at the
| compiler outputs, they _should_ be very similar).
| eqvinox wrote:
| This "advantage" is also its greatest disadvantage. I can
| stick these macro definitions in any existing project and
| start incrementally rolling them out. Migrating something --
| even partially -- to Zig is a much bigger task.
| Hirrolot wrote:
| Hmm, I don't use Zig so can't say much about it.
| flohofwoe wrote:
| This is what it looks like in Zig:
|
| https://ziglang.org/documentation/master/#Tagged-union
|
| You create two types, a tag enum, and the tagged union
| itself, which has a typed "payload" for each tag.
|
| Plus the necessary "syntax sugar" for initialization and
| extracting the payload by tag.
| dleslie wrote:
| Oh, this is from the poica author! Neat!
|
| That Poica can't work on C99 made it a non-starter for me; but
| all I really wanted was the sum types.
| Hirrolot wrote:
| Yeah, actually the most missing thing in C is sum types, as for
| me. Poica was rather an experiment; datatype99 is rather a
| practical library. I'm planning to use it on my work.
| cies wrote:
| > Pattern matching is exhaustive too.
|
| Sweet. C seems to be getting there :)
| orra wrote:
| Indeed, exhaustive pattern matching is sweet. What is the noddy
| explanation of how it is achieved, I wonder?
| Hirrolot wrote:
| datatype99 generates a switch expression under the hood, and
| modern compilers check this switch expression for
| exhaustiveness.
| DaiPlusPlus wrote:
| How do compilers check for exhaustiveness when the domain
| is unbound? Or rather: how does one constrain an integer in
| C? C enums are glorified consts, it's still legal to do
| `myenum_t x = 0xFFFF` where 0xFFFF is not a defined value.
| This is necessary to allow for flags/bitfield enums.
| setpatchaddress wrote:
| > Pure C99. No external tools are required; datatype99 is
| implemented using only preprocessor macros.
|
| And sure enough, the code is unreadable and unmaintainable as
| you'd expect.
|
| You can build Boost out of C99 macros. There's a really good
| reason why you shouldn't.
| fallat wrote:
| Combine it with Checked C...Why do we need Rust? :)
| pjmlp wrote:
| - proper strings and arrays, bounded checked by default.
|
| By the way, it doesn't need to be Rust, there are plenty of
| alternatives, some of them like NEWP and PL/I variants are
| about 10 years older than C.
| masklinn wrote:
| - compiler prevents use of uninitialised or freed memory
|
| - langage actually uses safety features in api design (good
| luck updating libc to use this)
| skocznymroczny wrote:
| working and sane package manager and build tool
| syockit wrote:
| Too often that I find the C extension projects introduced here on
| HN would omit examples for dynamically allocated objects. In the
| binary tree example, apparently I can't just memcpy an existing
| tree to a malloc'd tree because the nodes would point to the ones
| on the old tree. I was halfway writing a copy function that
| recursively traverses the source tree, void
| copy(BinaryTree *dest, const BinaryTree *src) {
| match(*src) { of(Leaf, x) { *dest =
| TREE(Leaf(x)); } } }
|
| then realised that this still points to a leaf on the stack. Need
| a way to replace all the pointers to be relative to the
| destination tree...
| lasagnaphil wrote:
| Isn't that a problem even in managed, garbaged-collected
| languages? For most (procedural) languages it's a bit involved
| if you want to deep-copy heap-allocated objects. If you don't
| want the overhead of re-creating the tree from scratch, you can
| use indices instead of pointers and store the nodes in a heap-
| allocated contiguous array, and then when you want to create a
| deep copy of the tree you can just memcpy the whole array in
| one go.
| dataflow wrote:
| I'm guessing they're coming from C++ or similar where the
| copy constructors already take care of making a deep copy.
| lasagnaphil wrote:
| Still, inside that copy constructor you're going to write
| the exact boilerplate code for recursively copying each
| node, so the complexity of code won't change that much.
| Also, you're going to call quite a lot of malloc/free,
| which could be a potential performance hit if you're
| creating/destroying large trees a lot, as well as
| fragmenting your memory if you're not careful.
|
| Personally I have a much easier time dealing with
| tree/graph data structures using indices, regardless of the
| language I'm using. Using indices as handles in general are
| just so much easier than using pointers in terms of memory
| management, and generally performs much better in terms of
| cache coherency. Indices are also easier to
| serialize/deserialize, and sometimes it just "makes sense"
| (for example, if you're developing a physics simulation,
| representing objects as indices are natural because you're
| going to use those index values inside various math
| equations.)
|
| The downside might be that if you're creating and
| destroying nodes in a tree continuously. then you have to
| be careful about reusing indices. There's a pattern in
| gamedev (generational indices) that solve the issue
| (basically, in addition to an index you also store a
| counter value with it, and you increment the counter
| everytime you allocate a new node in the same spot of the
| array. It's a bit more bookkeeping, but there's lots of
| code examples for it. It's kind of a well-known trick
| inside gamedev (and recently in the Rust community, in
| order to bypass borrow checker issues).
|
| A few pointers:
|
| - https://floooh.github.io/2018/06/17/handles-vs-
| pointers.html
|
| - https://kyren.github.io/2018/09/14/rustconf-talk.html
|
| - https://news.ycombinator.com/item?id=17995634
| (steveklabnik explaining how generational indices work with
| code).
| boomlinde wrote:
| The allocation strategy used is only a tangential problem. The
| purpose of the example is to demonstrate the functionality of
| the library in brief, not to walk the reader through problems
| that C programmers face in general.
|
| I would suggest recursively walking through the original tree,
| creating an equivalent tree in the process instead of copying
| the original elements.
| syockit wrote:
| Yeah, I suppose. /* get size required for
| malloc */ size_t getsize(const BinaryTree *tree) {
| size_t acc = sizeof(*tree); match(*tree) {
| of(Leaf, x) { return acc; }
| of(Node, lhs, x, rhs) { return
| getsize(*lhs) + acc + getsize(*rhs); }
| } } BinaryTree* copy(BinaryTree *dest,
| const BinaryTree *src) { dest->tag = src->tag;
| match(*src) { of(Leaf, x) {
| *dest = Leaf(*x); } of(Node, lx, x,
| rx) { match(*dest) {
| of(Node, ly, y, ry) { *ly = dest + 1;
| *ry = copy(*ly, *lx); *y = *x;
| copy(*ry, *rx); }
| otherwise {} } } }
| return dest + 1; }
|
| Kind of hackish, but I guess it works.
| klodolph wrote:
| Hackish????
| naasking wrote:
| I created something similar [1] awhile back:
| SUM(foo) { foo_one, foo_two, };
| /* declare each sum case */ CASE(foo, foo_one) { int i;
| char c; }; CASE(foo, foo_two) { double d; };
| void do_bar(foo f) { MATCH(f) {
| AS(foo_one, y) printf("foo_one: %d, %c\n", y->i, y->c);
| AS(foo_two, y) printf("foo_two: %d\n", y->d);
| MATCHANY fprintf(stderr, "No such case!");
| exit(1); } }
|
| [1] https://github.com/naasking/libsum
___________________________________________________________________
(page generated 2021-02-04 23:01 UTC)