[HN Gopher] Plain C API design, the real world Kobayashi Maru test
___________________________________________________________________
Plain C API design, the real world Kobayashi Maru test
Author : jmillikin
Score : 96 points
Date : 2023-04-16 15:17 UTC (7 hours ago)
(HTM) web link (nibblestew.blogspot.com)
(TXT) w3m dump (nibblestew.blogspot.com)
| nanofortnight wrote:
| This seems perfect for _Generic.
|
| https://en.cppreference.com/w/c/language/generic
| jesse__ wrote:
| I've done a few APIs similar in spirit to this one (and one
| similar in functionality, too), and I've found using a blend of a
| few of the mentioned methods to be pretty effective.
|
| I start with the basics and implement everything as explicitly
| (read: type safe) as possible: struct
| bob_params {...} struct line_params {...}
| pdf_page_cmd_bob(bob_params *params) { ... }
| pdf_page_cmd_line(line_params *params) { ... }
|
| Then, when that works and I want to add patterns, which are a
| superset, I'd add something that is literally a superset of those
| functions. enum pattern_type {
| patterntype_Bob, patterntype_Line, }
| struct pattern { pattern_type type; union {
| bob_params Bob; line_params Line; }
| } pdf_pattern_cmd(pattern *pattern) {
| switch (pattern->type) { case patterntype_Bob: /* do
| bob drawing a bunch of times, or whatever */ }
| } // The pattern struct has a lot of different
| names (sum type, discriminated union, algebraic datatype, tagged
| union .. probably more), but the idea is the 'type' tag tells you
| which one of the union values to use.
|
| I've tried everything and, as far as I can tell, this gets you
| the best of all worlds. It's pretty much as type-safe as things
| get in C. It's extremely flexible; you just mix the fundamentals
| together as you like when adding higher-order functions. It's
| fast; the compiler can see exactly what's going on, so you pay
| minimal runtime cost.
|
| Yes, it's a bunch of typing to get the functions all spelled out,
| but it's really not that bad considering how easy/obvious the
| code actually is.
|
| I use this pattern so much I actually wrote a little
| metaprogramming language that is capable of generating a lot of
| the boilerplate for you. Link in my bio, if anyone's interested
| in looking at it.
| twisteriffic wrote:
| That style speaks to me. Thanks!
| cpeterso wrote:
| If you're designing a stable API for external users, you might
| want to lock down your API even more by forward declaring the
| struct types as opaque in the public header file and only
| defining the struct members in a private library header file.
| This prevents users from messing with your library's private
| state and allows you to change implementation details later
| without breaking binary compatibility. The disadvantage is that
| users can't control how the structs are allocated or embed them
| in their own structs. /* foo.h */ struct
| foo_object; struct foo_object* foo_create(int flags,
| ...); /* foo.c */ #include "foo.h"
| struct foo_object { int flags, ... };
| lelanthran wrote:
| Funnily enough that's the most common pattern I see in my
| personal C code (for example, my little lisp interpreter - http
| s://github.com/lelanthran/csl/blob/master/src/parser/ato...)
| but I still recommend using the `Generic` keyword in C.
|
| For the next time I do a pattern like this, I'll be using
| `Generic` keyword to make the dispatch a compile-time match,
| not check at runtime.
| jesse__ wrote:
| Can you elaborate slightly on how you're planning on using
| _Generic to turn runtime dispatches into compile time ones?
| I'm not quite putting together how you can do that.
| lelanthran wrote:
| The example from the wikipedia page for C11 (https://en.wik
| ipedia.org/wiki/C11_(C_standard_revision)#Chan...) is
| compile time determination of the function to call:
| #define cbrt(x) _Generic((x), long double: cbrtl, \
| default: cbrt, \ float:
| cbrtf)(x)
|
| In code you'll write `cbrt(foo)` and the correct function
| will be called for the type of foo. As I understand it, the
| `_Generic` selection is performed at compile time.
| [deleted]
| JonChesterfield wrote:
| Recommend `__attribute__((overloadable))` instead of
| _Generic. The former opts into C++ style name mangling, which
| is a bit of a mess in C, but interacts well with `static` as
| forwarding wrappers in a header. The latter is a ridiculous
| mess invented by the C committee.
| KerrAvon wrote:
| I think you may be disappointed. Every single time I've tried
| to use `_Generic`, I've found that it's more trouble than
| it's worth. They seem to have made it be useful for a very
| narrow case -- tgmath.h -- and not bothered to make it
| general enough to be applicable to a wide variety of things
| that you might like to use it with.
| gabereiser wrote:
| can we just agree that PDF sucks? Sometime's the API must
| operate on a spec that is unruly to begin with. I like this
| approach. It's about the best you can get when dealing with
| this without completely reinventing the storage format to
| support the API.
| jesse__ wrote:
| 0MgZz yea PDF has got to be one of the worst specs ever
| created. It's amazing PDF viewers work at all.
| mintplant wrote:
| Composition? typedef struct { /* ... */ }
| context; typedef struct { context ctx;
| /* ... */ } foo_context; typedef struct {
| context ctx; /* ... */ } bar_context;
| void some_general_method(context* ctx, int a, int b);
| void some_foo_specific_method(foo_context* foo_ctx, int c, int
| d); foo_context* foo_ctx;
| some_foo_specific_method(foo_ctx, 0, 0);
| some_general_method(&foo_ctx->ctx, 1, 1);
|
| For checked downcasts, you could include an enum type tag inside
| `context` and have something like `foo_context*
| as_foo_context(context* ctx)` for downcasts, which, if `context`
| is at the beginning of `foo_context`, could just check the tag
| and cast the `ctx` pointer to `foo_context*` (or do a little
| pointer arithmetic if `context` is somewhere else in the
| containing struct). Return `NULL` (or assert) if the tag doesn't
| match.
| uecker wrote:
| I use this technique a lot and it is very powerful.
| dromtrund wrote:
| +1, this approach will also highlight cases where you're trying
| to generalize something that isn't as generic as you thought.
|
| I'd argue that the need for downcasting in a method working on
| the inner context would also be a code smell, and that you
| might want to reconsider the context split. In some cases,
| there's no way around it (like async events), but it might be
| more appropriate to pass additional context or a callback
| instead, to avoid a circular dependency.
| joeatwork wrote:
| I know it isn't really appropriate to the spirit of the article,
| but it seems like in this case there is a right answer, and it's
| "Fully separate object types" - it's explicit, prevents errors,
| is complete, and while it requires a lot of typing to implement
| it doesn't require much complexity.
| ablob wrote:
| In which of the listed requirements of the article is this
| approach better, and why?
| cozzyd wrote:
| Yeah I agree. Macros can be used to avoid some of the typing in
| defining the interface when the implementation really is
| common, but unfortunately that makes it harder to document the
| generated interface. I wish doxygen had some way of supporting
| comments for macro-generated interfaces.
|
| In defining the implementation, it's easy enough to do
| something like this (if the implementation really is common):
| static int pdf_ll_foo_impl(pdf_ll_ctx_t c, pdf_ll_type_t t,
| ...) { //real implementation goes here,
| perhaps switching on t } int
| pdf_page_foo(pdf_page_ctx_t c, ..) {
| pdf_ll_foo_impl((pdf_ll_ctx_t) c, PDF_PAGE_TYPE, ..); }
| int pdf_pattern_foo (pdf_pattern_ctx_t c, ..) {
| pdf_ll_foo_impl(pdf_ll_ctx_t) c, PDF_PATTERN_TYPE, ..); }
|
| which you can also use macros to help generate if you want to.
| CyberDildonics wrote:
| This is a few paragraphs about the C API of cairo, postscript and
| pdf and it doesn't seem all that insightful.
|
| I wish blog posts wouldn't use some random movie reference as
| clickbait when they could explain what their post is about
| instead.
| fpoling wrote:
| What is not considered in the article is to replace pointers to
| pages and patterns with handles that are tagged indexes into
| internal arrays.
|
| The big plus is the index tag allows to detect use after free and
| other memory safety bugs in wast majority of cases greatly
| improving memory safety.
|
| If one then expose this handle as a generic typedef type over
| some integer type, then the API will be not type-safe, but the
| type mismatch will be detected very early.
|
| Another option is to wrap the handle into separated structs for
| type safety. Then the caller will need to convert from specific
| handle for page, pattern etc. to the base handle when calling
| common draw operations. But that will be a simple operation like
| page.base or pattern.base, not MYLIB_PATTERN_TO_BASE(pattern).
| The drawback is that the caller will be able to construct wrong
| handles via struct initialization, but that is a big abuse and
| the type mismatch will still be detected at runtime.
| Dwedit wrote:
| COM (Component Object Model) is compatible with C. It's still
| obviously C++ code being shoehorned into C by explicitly
| declaring the layout of the object (VTable member), but it does
| work. You get inheritance that way, and inter-module compatible
| cleanup. The downside is so much boilerplate code to declare the
| object, and the performance cost of virtual calls.
|
| You don't necessarily need the complete COM system. For example,
| you could remove use of the Windows Registry, use of IDL files,
| remove the `QueryInterface` method, remove the use of GUIDs,
| remove the class factories (when just a simple 'create' function
| would do), remove Windows API functions related to cross-module
| memory management (not using `CoTaskAlloc`). Then it would be
| portable to systems that aren't Windows. One thing you can't
| remove is specifying a calling convention for the class methods,
| because C++ `__thiscall` is not compatible with C code on
| Windows.
| kelnos wrote:
| This is more or less what GObject is, which the author mentions
| in passing. It's an OO system for C, but it does require quite
| a lot of boilerplate, and you need to manually initialize
| vtables when creating new subclasses. You also need to manually
| chain up to the superclass in virtual methods, in many places
| where it's easy to forget. It's a decent system, all things
| considered, but it's just a reminder that C's type system is
| very weak and implementing advanced features all but requires
| abuse of the preprocessor to avoid unreadable code.
| Dwedit wrote:
| While you do need to manually initialize a vtable pointer,
| that pointer can simply point to a const struct which lives
| in the read-only data section. You don't need to allocate a
| new vtable with each object or anything like that.
|
| And you can do COM objects in C without using the
| preprocessor at all (outside of the "are we C++ or not"
| condition, then you could use real classes instead)
| morelisp wrote:
| This is basically what the author is referring to with GObject.
| asveikau wrote:
| I think Mozilla has historically had a lot of COM outside of
| Windows. But there's been a goal to remove it:
| https://wiki.mozilla.org/Gecko:DeCOMtamination
|
| I believe VirtualBox is another project with COM on other
| platforms. I see lots of GUIDs and HRESULTs in their error
| messages.
|
| I actually really like the COM style when done well. HRESULT,
| the idea of somewhat standardized error handling that sub-
| divides the space of a 32-bit integer into various subsystem-
| specific error codes, is one of my favorite ideas from there.
|
| Some things are not so nice. For example, everything being a
| virtual call is not good for performance. Reference counting is
| also great but over-use of it is also not great for performance
| (for example, in C++ it's considered poor form to make
| everything std::shared_ptr<> when you can get away with less).
| Dwedit wrote:
| The goal to remove it applied to internal use, for things
| which aren't exported. Which makes perfect sense, COM-like
| interfaces are only needed when you cross module boundaries,
| and need an fixed ABI.
| pavlov wrote:
| _> "Then it would be portable to systems that aren 't Windows"_
|
| I've seen this type of "COM Lite" used for cross-platform
| plugin and driver APIs. For example Blackmagic Design, a
| manufacturer of pro video capture hardware, provides an SDK
| that is essentially identical on Windows, Linux and Mac using
| this design.
| spacechild1 wrote:
| Another example would be the VST3 SDK.
| tedunangst wrote:
| I think the answer is function pointers.
| zabzonk wrote:
| > you can have functions like pdf_page_cmd_l(page, x, y)
|
| oh, no, please don't.
|
| just use c++ and you can have namespaces or classes to control
| visibility - there is no need to take on all the other c++ stuff
| if you don't want it.
| cozzyd wrote:
| Aside from the interlanguage interop issues, c++-like
| visibility control (via private) makes ABI compatibility
| essentially impossible.
| lelanthran wrote:
| But then it stops being a general tool that is used by python,
| PHP, java, lisp, C++, rust, nim, zig, lisp and becomes a
| specific tool for C++ programs.
| synergy20 wrote:
| C is truly the only universal common denominator for all
| other languages, C++ indeed is only for itself.
| wruza wrote:
| How would you use that in this case? These cmd_l ops are just
| similar ops to different objects, like pen on paper vs. brush
| on canvas. They don't operate on a single type of object.
|
| I'd say that C++ way is a bad idea here, because it usually
| begs for some compoinherimorphism with operator overloads that
| makes things 10x worse for the cost of one additional
| signature.
| zabzonk wrote:
| namespace pdf_page { dunno cmd_l (whatever); }
| wruza wrote:
| This is purely cosmetic and doesn't save you anything.
| Also, Cairo API (and most C APIs in general) already use
| <lib>_<class>_<method> naming scheme. E.g.
| cairo_svg_surface_create(), gtk_container_get_children().
| [deleted]
| zabzonk wrote:
| it is not cosmetic or a naming scheme - the language and
| compiler enforce it.
| morelisp wrote:
| None of the concerns are about the name.
| qsort wrote:
| Even if you're using C++ internally, you're likely exposing a C
| API behind extern C, so you don't have access to those features
| at the API boundary.
| ar-nelson wrote:
| I've had a lot of ideas for cross-language libraries that would
| need a C API, and this issue always comes up. The idea I had
| several years ago---but never implemented, because most of the
| projects I'd use this for are in limbo because I never seem to
| finish anything---is an API with only one function, which takes
| JSON and returns JSON, possibly via JSON-RPC. Basically a library
| that pretends it's a remote service. Slow, yes, but not as slow
| as some alternatives, and it makes FFI setup with other languages
| easy.
| CyberDildonics wrote:
| Software that takes in text and outputs text is literally every
| command line program.
| ar-nelson wrote:
| Yes, but it's rare to see linked libraries that use this as
| their API, even though it would greatly simplify FFI.
| CyberDildonics wrote:
| That's the exact point. Why would someone use a linked
| library if speed doesn't matter and they are passing text
| back and forth to be parsed?
|
| That's a terrible way to use a FFI and you would still deal
| with all the tricky parts.
|
| If that's what you need, you would write a separate stand
| alone program and call that.
| lelanthran wrote:
| > is an API with only one function, which takes JSON and
| returns JSON, possibly via JSON-RPC.
|
| I've done this one, and once only. I wouldn't do it again
| because the pain point is the lack of typing.
|
| Yeah, yeah, I know, you've read all these blogs everywhere
| about how C is not type-safe, how C is weakly-typed, etc, but
| it's a damn sight better than runtime errors because something
| emitted valid JSON that missed a field, or has the field in the
| wrong child, or the field is of the incorrect type, etc.
|
| If you're sending and receiving messages to another part of the
| program, using an untyped interface with runtime type-checking
| is the worst way to do it; the errors will not stop coming.
|
| Every single time your FFI functions are entered, the function
| must religiously type-check every parameter, which means that
| every FFI call made has to now handle an extra possible error
| that may be returned - invalid params.
|
| Every single time your FFI function return, the caller must
| religously type-cechk the response, which means that _the
| caller itself_ may return an extra possible error - bad
| response.
|
| Having the compiler check the types is so much better. C
| enforces types on everything[1], almost everywhere. Take the
| type enforcement.
|
| [1] Unless the type check is explicitly and intentionally
| disabled by the programmer
| twic wrote:
| This is a nice concrete example of a situation where inheritance
| is useful for program design.
|
| I think i'd go for the "object oriented" approach, but with
| convenience functions to avoid explicit upcasts. Start with three
| types: cairo_t /* a generic context, could be a
| page or a pattern */ cairo_page_t cairo_pattern_t
|
| Functions only defined on pages take a page: void
| pdf_page_cmd_bob(cairo_page_t* ctx);
|
| Functions defined on both take a generic context:
| void pdf_ctx_cmd_l(cairo_t* ctx, int x, int y);
|
| Then you need some way to upcast from the child types to the
| parent (which would be implicit in C++ etc):
| cairo_t* pdf_page_to_ctx(cairo_page_t* ctx); cairo_t*
| pdf_pattern_to_ctx(cairo_pattern_t* ctx);
|
| So a call looks like:
| pdf_ctx_cmd_l(pdf_page_to_ctx(page), 10, 20);
|
| But we can generate this: void
| pdf_page_cmd_l(cairo_page_t* ctx, int x, int y) {
| pdf_ctx_cmd_l(pdf_page_to_ctx(ctx), x, y); }
|
| Which lets users write this: pdf_page_cmd_l(page,
| 10, 20);
|
| The convenience functions could even be macros. There would be no
| loss of type safety from using macros that way. There would need
| to be a lot of convenience functions or macros, but they are
| trivial, and so could be generated by a script (or another
| macro!).
| fanf2 wrote:
| I have implemented this using the gcc/clang "transparent union"
| extension, which eliminates the need for explicit casting or
| helpers.
|
| https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.ht...
___________________________________________________________________
(page generated 2023-04-16 23:00 UTC)