[HN Gopher] Datatype99: C99 with Sum Types, v0.1.0
       ___________________________________________________________________
        
       Datatype99: C99 with Sum Types, v0.1.0
        
       Author : Hirrolot
       Score  : 147 points
       Date   : 2021-02-04 11:27 UTC (11 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | marcodiego wrote:
       | Can it be used to implement something similar to typen_ame() or
       | type_id()?
        
         | Hirrolot wrote:
         | Something similar, yes. Behind the scenes, datatype99 deduces a
         | type of a matched expression using the typedefs generated
         | earlier.
        
           | marcodiego wrote:
           | You mean, type_id and type_name would have to be handwritten
           | functions, right?
           | 
           | If it is possible to automatically generate these, could you
           | provide an example?
        
             | Hirrolot wrote:
             | We can generate a type ID by defining an extern variable of
             | a sum type inside the `datatype` macro. Its address then
             | will be a type ID (though I am not sure that such two
             | variables will always have different addresses, I need to
             | check the standard).
             | 
             | About type_name: you can deduce a type name if you know at
             | least one tag of a sum type. For example, if a tag is
             | `Foo`, then `FooSumT` is a typedef to an outer sum type
             | (datatype99 generates this typedef).
        
               | eqvinox wrote:
               | Two non-const variables will always have different
               | addresses if they are of the same type. Different types
               | can [AFAIR] per the standard theoretically be in
               | different address spaces, though this is rare in
               | practice.
        
               | a1369209993 wrote:
               | > Different types can [AFAIR] per the standard
               | theoretically be in different address spaces
               | 
               | If you can force the types to contain a member of the
               | same type (say, a int or enum foobar_discriminant), the
               | pointers &const_var_foo.disc and &const_var_bar.disc
               | would have to differ for obvious reasons.
        
       | enriquto wrote:
       | What is this sorcery?                   #include <datatype99.h>
       | datatype(                 BinaryTree,                 (Leaf,
       | int),                 (Node, struct BinaryTree *, int, struct
       | BinaryTree *)         );
       | 
       | What kind of preprocessor black magic can you do so that this
       | code compiles? I don't dare to open the file datatype99.h lest it
       | casts a dark spell on me.
        
         | willxinc wrote:
         | https://github.com/Hirrolot/epilepsy
         | 
         | According to the GitHub, which appears to be a metaprogramming
         | library by the same author.
        
         | flohofwoe wrote:
         | The whole system seems to be built on top of variadic macros
         | (which had been standardized in C99).
        
         | Hirrolot wrote:
         | Yes, as others have already noticed, it is implemented on top
         | of https://github.com/Hirrolot/epilepsy, a metalanguage for
         | preprocessor metaprogramming.
        
           | orra wrote:
           | In case anybody mistakenly thinks doing so is 'cheating',
           | you're the author of both libraries!
           | 
           | This is absurd, but in a good way.
        
           | jrimbault wrote:
           | > a metalanguage for preprocessor metaprogramming
           | 
           | That sentence is everything I was taught to avoid. Props to
           | you.
           | 
           | I like the drug/alcohol metaphor for these things, as a child
           | avoid them, as an adult do your thing as long as you remain
           | sensible.
        
             | globuous wrote:
             | Funny you use this example, the author's GH page says he's
             | 16; in many countries he wouldn't be able to drink, event
             | responsibly ^^
        
               | masklinn wrote:
               | > in many countries he wouldn't be able to drink
               | 
               | But in most countries they would be able to drink
               | (especially in a private setting), just not purchase. In
               | Europe for instance less than half a dozen countries have
               | a minimum drinking age for private settings, and a dozen
               | more have an MDA in public (but not private), though for
               | some 16 is enough to drink fermented alcohols (alongside
               | a meal with adults for the UK, regardless for Germany and
               | Austria).
        
           | enriquto wrote:
           | LOL, I cannot get enough of this shit! Can you point to some
           | cool implementation tricks that you used to achieve that? I
           | guess there's quite a bit of ## concatenation to create
           | temporary names and stuff?
        
             | Hirrolot wrote:
             | Well, mostly I use idioms from the Cloak Wiki
             | (https://github.com/pfultz2/Cloak/wiki/C-Preprocessor-
             | tricks,...). It takes some time to realise them :)
        
       | kzrdude wrote:
       | It's syntax sugar for tagged unions, of course. It's not entirely
       | out there that C could include such support natively in a future
       | standard.
        
         | rbonvall wrote:
         | I predict all languages will adopt some form of sum types and
         | pattern matching in the next 10 years. They will go from
         | esoteric to mainstream in the same way lambdas did.
        
           | mhh__ wrote:
           | Currently with D the two people in charge both favour library
           | solutions, but so far I'm seriously leaning towards a
           | language one - it seems to hard to do type erasure
           | efficiently otherwise, for example.
        
           | mtzet wrote:
           | I find it really interresting that such a simple concept as a
           | tagged union, which appeared at least as far back as ALGOL
           | 68, could be completely abandoned during the 90's and 00's
           | where OOP reigned surpreme, and only seems to be gaining
           | popularity again now that pattern matching is all the rage.
           | 
           | It's just baffling to me that tagged unions could be shunned
           | in favour of the visitor pattern. How could that ever look
           | like progress?
           | 
           | Being a millenial, I don't know what programming was like the
           | 80's, but surely tagged unions must have been bread and
           | butter for Pascal programmers?
        
             | flohofwoe wrote:
             | Tagged unions have always been around as runtime
             | constructs, they just had different names (e.g. "variant").
             | In C it's understandable why they aren't part of the
             | language: they would be the only data type that requires a
             | "builtin" opaque struct type to hold the tag and payload
             | union (e.g. something like this, but hidden from the
             | programmer):                 struct my_tagged_union {
             | enum my_tag_enum tag;           union {             ....
             | } payload;       };
             | 
             | That would be a pretty big break with "tradition" for the C
             | language ;) (e.g. it would be similar to a "builtin string
             | type")
             | 
             | And of course tagged unions as "generic user-defined types"
             | are much less convenient/useful than being directly
             | integrated into the language syntax and type system.
        
               | nicoburns wrote:
               | Structs aren't really any more fundamental than tagged
               | unions though. They also contain bytes which are hidden
               | from the language (padding). If a struct can have
               | automatically generated padding bytes, why shouldn't a
               | tagged union have an automatically generated tag?
        
               | eqvinox wrote:
               | Mostly because C structs have a very exactly bound memory
               | representation, and padding bytes are just about the only
               | thing you can - mostly - get away with not caring about.
               | 
               | Coincidentally, even padding bytes break things
               | _technically_. Since they 're undefined, you - in theory
               | - can't ever calculate a bytewise hash value over a
               | struct that has any padding bytes. It's still done
               | frequently, it's just nulled out beforehand and copying
               | is bytewise anyway. Doesn't change the fact that to the
               | spec letter it's undefined behaviour. GCC also added
               | __builtin_clear_padding() recently.
               | 
               | Hidden tags ... no. Just no.
        
               | pornel wrote:
               | Tagged unions built-in into the language can have layout
               | optimizations, e.g. in Rust sizeof(Option<&i32>) ==
               | sizeof(&i32), because Rust knows it can repurpose NULL as
               | the None value.
        
             | dboreham wrote:
             | There weren't many Pascal programmers, but for C
             | programmers, sure.
        
               | pjmlp wrote:
               | Sure there were, in Europe during the heyday of CP/M, MS-
               | DOS, Mac OS (until MPW and PowerPlant).
        
             | marcosdumay wrote:
             | To be fair, tagged unions are too error prone to be useful
             | without strong types. That never stopped people from using
             | them, but when a statically enforced alternative came
             | along, everybody jumped ship.
        
             | simias wrote:
             | Tagged unions never went away, it's just that, as you
             | mention, C++ style OOP became very popular and they worked
             | the other way around: instead of having a single function
             | that matches the tag to figure out what to do with the
             | data, you ship data-specific functions straight inside the
             | object (virtual methods).
             | 
             | In theory you can have both of course, and C++ now has
             | support for variants (because we know C++ tries to be come
             | a superset of all programming languages in existence). It's
             | still a bit clunky to use though, especially because of the
             | lack of "match" support in the language.
             | 
             | But I think a problem is that it's easy to mess up tagged
             | union in low level, non-managed languages. Rust's enum work
             | great, but that's thanks to the very strong type system and
             | borrow checker which makes them effectively impossible to
             | mess up in safe code.
             | 
             | In C or C++ it can become messy and hard to debug. In C you
             | don't really have a choice, so you do it anyway but in C++
             | virtual methods and inheritance are usually a bit easier to
             | manage.
        
             | jerf wrote:
             | "I find it really interresting that such a simple concept
             | as a tagged union, which appeared at least as far back as
             | ALGOL 68, could be completely abandoned during the 90's and
             | 00's where OOP reigned surpreme"
             | 
             | From a modern perspective, it may be difficult to
             | understand just how _crushing_ the OO consensus was in the
             | 1980s and 1990s. It still reverberates in our language
             | design today. OO was good design was OO. If it didn 't fit
             | into OO, that was _ipso facto_ proof that it was bad. Close
             | the book. No further evidence required. Sum types? Kinda
             | overlaps inheritance. Bad design. Not OO, therefore bad
             | design. Why is Java a better language than C++? It 's more
             | OO, so QED.
             | 
             | OO here, by the way, is specifically that variant that
             | emerged in the form of C++ and Java. Very important to have
             | defined, enforced "private" keywords and deep hierarchies.
             | Smalltalk by this definition is _not_ an OO language. (And
             | also, therefore, bad. Also possibly heretical for trying to
             | steal the term OO... and I mean all the implications of the
             | word  "heretical".)
             | 
             | My "Software Engineering" course circa 1999 was basically
             | "How to Design Software via OO". (It has the distinction of
             | being the one course from my entire college career that I'm
             | not sure I agree with a _single thing_ I  "learned" in that
             | course. Even at the time I had done enough real work to be
             | suspicious, and from here in 2021 I find the whole thing
             | risible. But I got a 4.0, so....) Formal diagrams with
             | strictly enforced diagram languages like UML, because OO
             | was the future of Good Design and UML was how OO was going
             | to help bring programming to the masses who would only have
             | to draw Object Hierarchies and then {magic goes here that
             | never was worked out} and perfect programs!
             | 
             | The internet hit programmers and programming earliest, and
             | now there's only a remnant of the OO dogma just hanging on
             | by its teeth in school curricula that haven't hardly been
             | changed in 25 years, and it lacks the power to envelop the
             | students the way it used to.
             | 
             | There was of course always a counterculture that didn't
             | listen, which is where you get Smalltalk, Perl, Haskell and
             | its ancestry, Python, etc. But you have to bear in mind
             | that that was a _counterculture_...
             | 
             | ... and you know, even today, Java and C++ are still
             | _pretty darned popular_ and it 's not _that_ hard to find
             | people who still only know one of those (or maybe both) and
             | still think OO [?] good design [?] OO, and the vast chaotic
             | landscape of languages that don 't even have "private" is
             | just kid stuff for people who can't handle Real Design.
             | Which is OO. Because OO is good, and good is OO.
        
       | steerablesafe wrote:
       | If `match` compiles down to a switch case on the type
       | discriminator then it can easily beat current implentations of
       | C++'s std::visit on std::variant.
       | 
       | I assume pattern matching must be exhaustive. Can there be a
       | default case?
        
         | sesuximo wrote:
         | What else can std visit generate? I'd imagine it does exactly
         | that
        
           | ynik wrote:
           | std::variant is implemented with variadic templates; but C++
           | doesn't have a "variadic switch" statement. So std::visit
           | cannot directly use a switch statement. Instead it's usually
           | implemented as a recursive function with the hope that the
           | compiler will unroll the recursion and optimize it into a
           | switch. This hope usually doesn't work out in practice.
        
             | steerablesafe wrote:
             | AFAIK the usual implementation is statically generating an
             | array of function pointers and dispatching based on the
             | type discriminator. This works even less for inlining.
        
         | Hirrolot wrote:
         | Yes, just write otherwise { ... } to handle a default case.
        
       | geon wrote:
       | How does it compare to the tagged unions in Zig, which aims to be
       | "C with the bugs fixed"?
        
         | flohofwoe wrote:
         | Zig has the advantage that it can add the syntax sugar right
         | into the language instead of using macro "workarounds", but I'd
         | guess in the end both have similar lowlevel memory layout (a
         | struct with an enum tag, followed by a payload "raw union") and
         | generated code (e.g. it would be interesting to look at the
         | compiler outputs, they _should_ be very similar).
        
           | eqvinox wrote:
           | This "advantage" is also its greatest disadvantage. I can
           | stick these macro definitions in any existing project and
           | start incrementally rolling them out. Migrating something --
           | even partially -- to Zig is a much bigger task.
        
         | Hirrolot wrote:
         | Hmm, I don't use Zig so can't say much about it.
        
           | flohofwoe wrote:
           | This is what it looks like in Zig:
           | 
           | https://ziglang.org/documentation/master/#Tagged-union
           | 
           | You create two types, a tag enum, and the tagged union
           | itself, which has a typed "payload" for each tag.
           | 
           | Plus the necessary "syntax sugar" for initialization and
           | extracting the payload by tag.
        
       | dleslie wrote:
       | Oh, this is from the poica author! Neat!
       | 
       | That Poica can't work on C99 made it a non-starter for me; but
       | all I really wanted was the sum types.
        
         | Hirrolot wrote:
         | Yeah, actually the most missing thing in C is sum types, as for
         | me. Poica was rather an experiment; datatype99 is rather a
         | practical library. I'm planning to use it on my work.
        
       | cies wrote:
       | > Pattern matching is exhaustive too.
       | 
       | Sweet. C seems to be getting there :)
        
         | orra wrote:
         | Indeed, exhaustive pattern matching is sweet. What is the noddy
         | explanation of how it is achieved, I wonder?
        
           | Hirrolot wrote:
           | datatype99 generates a switch expression under the hood, and
           | modern compilers check this switch expression for
           | exhaustiveness.
        
             | DaiPlusPlus wrote:
             | How do compilers check for exhaustiveness when the domain
             | is unbound? Or rather: how does one constrain an integer in
             | C? C enums are glorified consts, it's still legal to do
             | `myenum_t x = 0xFFFF` where 0xFFFF is not a defined value.
             | This is necessary to allow for flags/bitfield enums.
        
       | setpatchaddress wrote:
       | > Pure C99. No external tools are required; datatype99 is
       | implemented using only preprocessor macros.
       | 
       | And sure enough, the code is unreadable and unmaintainable as
       | you'd expect.
       | 
       | You can build Boost out of C99 macros. There's a really good
       | reason why you shouldn't.
        
       | fallat wrote:
       | Combine it with Checked C...Why do we need Rust? :)
        
         | pjmlp wrote:
         | - proper strings and arrays, bounded checked by default.
         | 
         | By the way, it doesn't need to be Rust, there are plenty of
         | alternatives, some of them like NEWP and PL/I variants are
         | about 10 years older than C.
        
           | masklinn wrote:
           | - compiler prevents use of uninitialised or freed memory
           | 
           | - langage actually uses safety features in api design (good
           | luck updating libc to use this)
        
         | skocznymroczny wrote:
         | working and sane package manager and build tool
        
       | syockit wrote:
       | Too often that I find the C extension projects introduced here on
       | HN would omit examples for dynamically allocated objects. In the
       | binary tree example, apparently I can't just memcpy an existing
       | tree to a malloc'd tree because the nodes would point to the ones
       | on the old tree. I was halfway writing a copy function that
       | recursively traverses the source tree,                   void
       | copy(BinaryTree *dest, const BinaryTree *src) {
       | match(*src) {             of(Leaf, x) {               *dest =
       | TREE(Leaf(x));              }           }         }
       | 
       | then realised that this still points to a leaf on the stack. Need
       | a way to replace all the pointers to be relative to the
       | destination tree...
        
         | lasagnaphil wrote:
         | Isn't that a problem even in managed, garbaged-collected
         | languages? For most (procedural) languages it's a bit involved
         | if you want to deep-copy heap-allocated objects. If you don't
         | want the overhead of re-creating the tree from scratch, you can
         | use indices instead of pointers and store the nodes in a heap-
         | allocated contiguous array, and then when you want to create a
         | deep copy of the tree you can just memcpy the whole array in
         | one go.
        
           | dataflow wrote:
           | I'm guessing they're coming from C++ or similar where the
           | copy constructors already take care of making a deep copy.
        
             | lasagnaphil wrote:
             | Still, inside that copy constructor you're going to write
             | the exact boilerplate code for recursively copying each
             | node, so the complexity of code won't change that much.
             | Also, you're going to call quite a lot of malloc/free,
             | which could be a potential performance hit if you're
             | creating/destroying large trees a lot, as well as
             | fragmenting your memory if you're not careful.
             | 
             | Personally I have a much easier time dealing with
             | tree/graph data structures using indices, regardless of the
             | language I'm using. Using indices as handles in general are
             | just so much easier than using pointers in terms of memory
             | management, and generally performs much better in terms of
             | cache coherency. Indices are also easier to
             | serialize/deserialize, and sometimes it just "makes sense"
             | (for example, if you're developing a physics simulation,
             | representing objects as indices are natural because you're
             | going to use those index values inside various math
             | equations.)
             | 
             | The downside might be that if you're creating and
             | destroying nodes in a tree continuously. then you have to
             | be careful about reusing indices. There's a pattern in
             | gamedev (generational indices) that solve the issue
             | (basically, in addition to an index you also store a
             | counter value with it, and you increment the counter
             | everytime you allocate a new node in the same spot of the
             | array. It's a bit more bookkeeping, but there's lots of
             | code examples for it. It's kind of a well-known trick
             | inside gamedev (and recently in the Rust community, in
             | order to bypass borrow checker issues).
             | 
             | A few pointers:
             | 
             | - https://floooh.github.io/2018/06/17/handles-vs-
             | pointers.html
             | 
             | - https://kyren.github.io/2018/09/14/rustconf-talk.html
             | 
             | - https://news.ycombinator.com/item?id=17995634
             | (steveklabnik explaining how generational indices work with
             | code).
        
         | boomlinde wrote:
         | The allocation strategy used is only a tangential problem. The
         | purpose of the example is to demonstrate the functionality of
         | the library in brief, not to walk the reader through problems
         | that C programmers face in general.
         | 
         | I would suggest recursively walking through the original tree,
         | creating an equivalent tree in the process instead of copying
         | the original elements.
        
           | syockit wrote:
           | Yeah, I suppose.                   /* get size required for
           | malloc */         size_t getsize(const BinaryTree *tree) {
           | size_t acc = sizeof(*tree);           match(*tree) {
           | of(Leaf, x) {                   return acc;               }
           | of(Node, lhs, x, rhs) {                   return
           | getsize(*lhs) + acc + getsize(*rhs);               }
           | }         }              BinaryTree* copy(BinaryTree *dest,
           | const BinaryTree *src) {           dest->tag = src->tag;
           | match(*src) {              of(Leaf, x) {
           | *dest = Leaf(*x);              }              of(Node, lx, x,
           | rx) {                 match(*dest) {
           | of(Node, ly, y, ry) {                     *ly = dest + 1;
           | *ry = copy(*ly, *lx);                     *y = *x;
           | copy(*ry, *rx);                   }
           | otherwise {}                 }              }           }
           | return dest + 1;         }
           | 
           | Kind of hackish, but I guess it works.
        
             | klodolph wrote:
             | Hackish????
        
       | naasking wrote:
       | I created something similar [1] awhile back:
       | SUM(foo) {             foo_one,             foo_two,         };
       | /* declare each sum case */         CASE(foo, foo_one) { int i;
       | char c; };         CASE(foo, foo_two) { double d; };
       | void do_bar(foo f) {             MATCH(f) {
       | AS(foo_one, y) printf("foo_one: %d, %c\n", y->i, y->c);
       | AS(foo_two, y) printf("foo_two: %d\n", y->d);
       | MATCHANY                  fprintf(stderr, "No such case!");
       | exit(1);             }         }
       | 
       | [1] https://github.com/naasking/libsum
        
       ___________________________________________________________________
       (page generated 2021-02-04 23:01 UTC)