[HN Gopher] Type-Safe Printf for C
       ___________________________________________________________________
        
       Type-Safe Printf for C
        
       Author : tinkersleep
       Score  : 48 points
       Date   : 2021-12-12 10:47 UTC (2 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | Animats wrote:
       | GCC has had printf checking for, what, 20 years?
        
       | WalterBright wrote:
       | D supports calling C functions directly, including printf. When
       | we added printf format checking against the arguments, many bugs
       | were exposed and fixed. It was a big win.
        
         | kazinator wrote:
         | GCC has this also; this work is different because it removes
         | the type errors. Instead of having an error like "%d expects a
         | parameter of type int, not char _" , it just lets %d print the
         | string anyway. It's more like _format* in Lisp, say:
         | [1]> (format t "~05,'0d" 5)       00005       NIL       [2]>
         | (format t "~05,'0d" "abc")       00abc       NIL
        
       | AlexanderDhoore wrote:
       | How is this safer than enabling all warnings in GCC or clang?
       | 'Type-safe' in this context does not mean that you get more
       | compile errors, but that the format specifier does not need to
       | specify the argument type, but just defines the print format. In
       | fact, format strings with this library will have less compile-
       | time checking (namely none) than with modern compilers for
       | standard printf. This approach is still safer.
       | 
       | EDIT The answer is at the bottom apparently. Maybe put that up
       | higher?
        
         | tinkersleep wrote:
         | Ok, thanks for the hint. I put the most important infos into
         | the intro: you just don't need 'll', 'l', 'z' modifiers for
         | specifying sizeof(operand), as the compiler does that via
         | _Generic.
        
           | eps wrote:
           | Also put an example in the first pageful. I almost lost hope
           | while scrolling through the wall of format spec when I
           | finally saw the first example.
        
         | edflsafoiewq wrote:
         | > There is absolutely no chance to give a wrong format
         | specifier and access the stack (like printf does via stdarg.h)
         | in undefined ways. This is particularly true for multi-arch
         | development where with printf you need to be careful about
         | length specifiers, and you might not get a warning on your
         | machine, but the next person will and it will crash there. I
         | usually need to compile for a few times on multiple
         | architectures to get the integer length correct, e.g., %u vs
         | %lu vs. %llu vs. %zu.
        
           | kevin_thibedeau wrote:
           | This is really annoying on architectures like ARM32 where
           | size_t is closely related to unsigned int but uint32_t is
           | _long_ unsigned int and gets flagged as a different type. It
           | becomes a real problem when using a stripped down printf like
           | the one in newlib that doesn 't support %zu.
        
             | tinkersleep wrote:
             | Exactly! Or on Windows 64-bit, where 'long' is 32-bit and
             | 'size_t' is 'unsigned long long'.
        
               | GeorgeTirebiter wrote:
               | The Real Problem (tm) is: specifiers like 'char' and
               | 'int' etc should not be allowed; they 'should be' things
               | like c8 or i16 or u64 --- that is, specify the #of bits
               | for that dataype in the type specifier. This is what
               | sys/stdint.h is trying to fix.
               | 
               | What maybe 'should' happen in C2x is: 'int' is defined as
               | i16, 'long' as i32, 'long long' as i64 etc and then see
               | which programs break. Because it's perfectly OK to have
               | 16-bit 'ints' on a 64-bit arch. (size_t is what you use
               | to deal with architecture-specific chunks). And then
               | _remove_ all this  'int' etc crap from C. (Obv, some
               | 'compat switch' would need to exist, but you get the
               | idea.)
        
               | kevin_thibedeau wrote:
               | No that should not happen. Integer types that adapt to
               | the platform word size enhance portability. Nobody wants
               | a 32-bit default int on an 8-bit platform and using
               | uint8_t or uint16_t can introduce performance regressions
               | on wider platforms. The traditional integer types are
               | perfectly suited for scenarios where the exact width
               | doesn't matter and you know the guaranteed minimum is
               | good enough.
        
               | arka2147483647 wrote:
               | I would argue that most code nowdays iplicitly assumes
               | that int is 32bit's long, and wont work correctly in a
               | 8bit platform anyways. If 'platform size conforming' ints
               | are used, they probably should be opt-in, instead of opt-
               | out.
        
               | kevin_thibedeau wrote:
               | For Windows the reason they couldn't switch to LP64 is
               | because they screwed up the type system with LONG and
               | allowed it to be incorporated into OS structs. That
               | prevents long from being 64-bit for the sake of
               | rationality.
        
           | thebruce87m wrote:
           | Can't you just use the inttypes.h along with stdint fixed
           | width types to avoid the multiple compiles?
           | 
           | This stackoverflow answer gives an example:
           | https://stackoverflow.com/questions/7597025/difference-
           | betwe...
        
       | jhallenworld wrote:
       | So I also have a custom printf, but there is a limitation: if you
       | ask gcc to check it with "__attribute__((__format__ (__printf__",
       | then you are forced into using gcc's idea of what the printf
       | format string syntax.
       | 
       | How can I have strict type checking, but a user defined format
       | string?
        
         | kevin_thibedeau wrote:
         | Write your own linter.
        
       | kazinator wrote:
       | It's a big mistake that the format language looks like that of
       | printf.
       | 
       | If you use this in a big code base, there will still be the old
       | printf all over the place.
       | 
       | Now you have to think: is this custom logging function here based
       | on the safe printf from github, or is it vsprintf under the hood?
        
       | 37ef_ced3 wrote:
       | If you want a modern C, use Go.
       | 
       | Unless you need maximum performance (SIMD, GPUs, etc.) you should
       | use a developer-efficient, productive language.
       | 
       | Well-written Go executes almost as fast as C, and you will be
       | more productive as a programmer.
        
         | baybal2 wrote:
         | Beware of Go. Google may use it to do the "Embrace Extend
         | Extinguish" move. It might be type safe, but not ideologically
         | safe.
         | 
         | This is on top of Go being an unstable, immature language.
        
           | 37ef_ced3 wrote:
           | Go is very stable, and 12 years old.
        
         | nikki93 wrote:
         | Have you compared the performance and generated binary size of
         | C vs. Go on WebAssembly?
        
           | 37ef_ced3 wrote:
           | For WebAssembly, use the TinyGo Go compiler:
           | 
           | https://tinygo.org/
        
         | vladharbuz wrote:
         | Has anyone benchmarked Go's garbage collector lately? I like a
         | lot of stuff about Go, but a lot of my work is in video games
         | and real time audio, and I am extremely hesitant to use a
         | garbage collected language for those things.
        
           | nikki93 wrote:
           | I've been working on a Go -> C++ compiler pretty much mainly
           | for this use case, that skips the GC and concurrency stuff --
           | https://www.reddit.com/r/golang/comments/r2795t/i_wrote_a_si.
           | .. -- Includes a demo video of a game I'm making with it and
           | a built-in scene editor that uses reflection etc.
           | 
           | Repo for compiler itself: https://github.com/nikki93/gx (no
           | README.md etc. yet, will be getting to that when I next have
           | a chance (it's a side project)). It just takes around 1500
           | lines of Go thanks to the parser and typechecker in the
           | standard library.
           | 
           | Go's perf was definitely non-trivially bad for me on
           | WebAssembly.
        
             | remexre wrote:
             | WebAssembly is notably a pathological case for _any_ stack-
             | scanning GC, since the stack isn't addressable.
        
             | pphysch wrote:
             | > I know I can "do things to maybe cause the GC to run
             | less" or such, but then that immediately starts to detract
             | from the goal of having a language where I can focus on
             | just the gameplay code.
             | 
             | Did you try implementing pooling (e.g. sync.Pool) for game
             | objects/entities/components/etc? How did that go perf-wise?
        
               | nikki93 wrote:
               | I think the main thing is it starts to become a
               | distraction from just writing the gameplay code. I don't
               | have to implement the pooling stuff now that I have this
               | language. But yeah if I did go further with the game in
               | vanilla Go I might have to try the pool approach. Having
               | worked on game engines with GC language runtimes (using
               | Lua etc.) before, you always ultimately hit a perf
               | ceiling due to lack of memory control and wish you could
               | move out of it, but the runtimes don't give you a way to
               | do that incrementally.
        
         | einpoklum wrote:
         | Go is not a "modern C". It may or may not be a swell language,
         | but it differs fundamentally from C:
         | 
         | 1. Go is a garbage-collected language, C is not.
         | 
         | 2. Go is a single-company-managed language, while C is managed
         | by an international standards committee within ISO. You might
         | not care about this difference, but its quite significant
         | w.r.t. how future language developments happen.
         | 
         | 3. C types are intentional, Go types are extentional
         | ("structural typing").
         | 
         | These fundamental differences are not cases of one language
         | being superior, or further advanced, than the other - they're
         | about going in different directions.
        
           | 37ef_ced3 wrote:
           | I have been writing C for decades, but now I almost
           | exclusively use Go.
           | 
           | What I mean is that if you like C99, you will probably like
           | Go. Go can be understood as a modernization of C that doesn't
           | abandon C's simplicity but adds many useful facilities that C
           | lacks.
           | 
           | Go obviously derives from C. It's a very C-like language. It
           | makes sense to view Go as an enhanced C that makes slightly
           | different trade-offs and that is applicable to a slightly
           | different set of purposes.
        
       | tinkersleep wrote:
       | Probably the 1e6th approach, but anyway, I also wanted to play
       | with this myself: here's a _Generic and macro based approach to
       | get printf type-safe in C. It needs C11, and uses some gcc
       | extensions.
        
         | kzrdude wrote:
         | Do you have a usage example? One early in thee readme maybe.
         | Seeing is believing
        
         | marcodiego wrote:
         | I, a few times, got reasonably far implementing a generic,
         | type-safe, variadic, macro-based and using _Generic "print" for
         | C.
         | 
         | I copied some examples of how to implement variadic macros, and
         | expanded on that for C basic types. It mostly worked, you'll
         | always have difficulty for corner cases like separating
         | pointers and arrays, but it worked well for the basic C types.
         | 
         | I gave up for a few reasons:                 - I wanted a form
         | to register new types, so it could work for user-defined types;
         | - the C pre-processor knows nothing about lists that can be
         | expanded multiple times;            - variadic C macros are
         | ugly hacks.
         | 
         | Maybe one day I'll get back to it and publish it.
         | 
         | The interesting part is that _Generic combined with macros
         | allows some very interesting tools for implementing primitive
         | forms of polymorphism. Actually, if the C pre-processor
         | supported lists, it would be possible to implement RTTI in C.
        
           | bumblebritches5 wrote:
           | > - I wanted a form to register new types, so it could work
           | for user-defined types;
           | 
           | > - the C pre-processor knows nothing about lists that can be
           | expanded multiple times;
           | 
           | I'm actually working on both features as Clang extensions.
           | 
           | #repeat, a preprocessor directive to loop, can be combined
           | with _Pragma(push_macro/pop_macro) to create lists by
           | redefining a macro.
           | 
           | and currently #increment, though I think I want to expand on
           | this so that other macros can be redefined more easily to
           | create lists via push/pop macro.
           | 
           | The reason push_macro/pop_macro pragmas can't work, is the
           | macro has to be undefined and redefined, and the value then
           | pushed onto a stack in the compiler.
           | 
           | and you can't redefine a macro in the body of another macro
           | directly.
           | 
           | so I've been thinking about maybe a
           | _Pragma(redefine_macro(MacroToRedefine,
           | NewValueForRedefinedMacro))
           | 
           | but I don't want it to be limited to the _Pragma area of the
           | compiler, I want it to be eventually standardized.
           | 
           | I've been talking to a friend at WG14 who suggested making it
           | a "Preprocessor Expression, like `__has_c_attribute` and
           | `defined()`
           | 
           | So that's the area I've been working on recently for the
           | Increment/Redefine PE lately.
        
             | bumblebritches5 wrote:
             | As for _Pragma(redefine_macro()) I don't want it to be a
             | pragma, is the problem.
             | 
             | I want it to be either a compile-time operator like sizeof,
             | or a Preprocessor Expression so it can be used correctly.
             | 
             | and it would eclipse #increment pretty easily;
             | __redefine_macro(MacroNameToRedefine,
             | ReplacementExpression)
             | 
             | if ReplacementExpression is a macro identifier it would be
             | expanded first, so like `MacroToRedefine + 1` should work,
             | I see no reason it shouldn't work.
             | 
             | maybe it would be ugly, but I think it would work.
             | 
             | ----
             | 
             | My motivation is compile time registration for codecs, test
             | suites, test cases being registered to suites, etc.
        
             | marcodiego wrote:
             | I spent a long time thinking about this. My conclusion is
             | that the simplest way to achieve this, at least in GCC, is
             | to create a #copy directive that allows a macro, together
             | with its stack, to be copied to another. GCC already allows
             | stack expansion with push and pop but it can only be
             | expanded once; the #copy directive would fix that.
             | 
             | If you get anything close to that working, that would be a
             | godsend. It is the last remaining piece of the puzzle for
             | me to implement complete RTTI in C. It would certainly help
             | to minimize glib boiler plate code too.
             | 
             | I'd really like it to be part of c2x, but I think it is too
             | late now. If it is implemented by either GCC or Clang, the
             | remaining other would certainly it too since it is too
             | useful. So getting it to work in any of these would be good
             | enough for me.
             | 
             | How can I track/follow your progress?
        
             | tinkersleep wrote:
             | There is __VA_OPT__ in C++2a, which handles recursion
             | termination in macro expansion. This will probably be in
             | future C, too, right?
             | 
             | And if there was also __EVAL__ to force the macro
             | preprocessor into another evaluation level, you could write
             | recursive macros quite easily, e.g., to wrap every argument
             | into a function call:                   #define
             | EACH(f,x,...) f(x) __VA_OPT__(, __EVAL__(EACH(f,
             | __VA_ARGS__)))
             | 
             | This would make the macro magic for this library trivial:
             | you could process lists recursively.
             | 
             | Edit: added missing paren
        
           | tinkersleep wrote:
           | > - I wanted a form to register new types, so it could work
           | for user-defined types;
           | 
           | Yes, I had the same urge. You can easily fall into the trap
           | of too many features on the list. I settled on keeping user
           | types out: you can always write a stringify() and pass that
           | to the printf. Not the same, I know. But a more finite
           | project.
           | 
           | > - the C pre-processor knows nothing about lists that can be
           | expanded multiple times;
           | 
           | Yeah, that's a hack. Look at the 'VA_EXP()' macros in
           | include/va_print/base.h. Ugly. Incomprehensible.
           | 
           | > - variadic C macros are ugly hacks.
           | 
           | Absolutely. But I think there is no other way in C.
           | 
           | > Actually, if the C pre-processor supported lists, it would
           | be possible to implement RTTI in C.
           | 
           | I couldn't resist to put in '%t' which prints the C type of
           | the argument...
        
       ___________________________________________________________________
       (page generated 2021-12-14 23:01 UTC)