https://queue.acm.org/detail.cfm?id=3588242 [ ] Current Issue Past Issues Topics [acmqueue_l] Drill Bits March 29, 2023 Volume 21, issue 1 Download PDF version of this article PDF Drill Bits Catch-23: The New C Standard Sets the World on Fire Terence Kelly with Special Guest Borer Yekai Pan A new major revision of the C language standard, C23, is due out this year. We'll tour the highs and lows of the latest draft^9 and then drill down on the mother of all breaking changes. Sidebars celebrate C idioms and undefined behavior with code and song, respectively. The Good News Like the previous major revision, C11,^7 the latest standard introduces several useful features. The most important, if not the most exciting, make it easier to write safe, correct, and secure code. For example, the new header standardizes checked integer arithmetic: int i =...; unsigned long ul =...; signed char sc =...; bool surprise = ckd_add(&i, ul, sc); The type-generic macro ckd_add() computes the sum of ul and sc "as if both operands were represented in a signed integer type with infinite range." If the mathematically correct sum fits into a signed int, it is stored in i and the macro returns false, indicating "no surprise"; otherwise, i ends up with the sum wrapped in a well-defined way and the macro returns true. Similar macros handle multiplication and subtraction. The ckd_* macros steer a refreshingly sane path around arithmetic pitfalls including C's "usual arithmetic conversions." C23 also adds new features to protect secrets from prying eyes and programmers from themselves. The new memset_explicit() function is for erasing sensitive in-memory data; unlike ordinary memset, it is intended to prevent optimizations from eliding the erasure. Good old calloc(size_t n, size_t s) still allocates a zero'd array of n objects of size s, but C23 requires that it return a null pointer if n*s would overflow. In addition to these new correctness and safety aids, C23 provides many new conveniences: Constants true, false, and nullptr are now language keywords; mercifully, they mean what you expect. The new typeof feature makes it easier to harmonize variable declarations. The preprocessor can now #embed arbitrary binary data in source files. Zero-initializing stack-allocated structures and variable-length arrays is a snap with the new standard "={}" syntax. C23 understands binary literals and permits apostrophe as a digit separator, so you can declare int j = 0b10'01'10, and the printf family supports a new conversion specifier for printing unsigned types as binary ("01010101"). The right solution to the classic job interview problem "Count the 1 bits in a given int" is now stdc_count_ones(). Sadly, good news isn't the only news about C23. The new standard's nonfeatures, misfeatures, and defeatures are sufficiently numerous and severe that programmers should not "upgrade" without carefully weighing risks against benefits. Older standards such as C99 and C11 weren't perfect, but detailed analysis will sometimes conclude that they are preferable to C23. After reviewing C23's problems, we'll discuss strategies for peaceful coexistence with existing code and hazard mitigation in new code. C inventor Dennis Ritchie pointed to several flaws in [ANSI C] ... which he said is a licence for the compiler to undertake agressive opimisations that are completely legal by the committee's rules, but make hash of apparently safe programs; the confused attempt to improve optimisation ... spoils the language. --Dennis Ritchie on the first C standard ^4, 27 Unfilled Potholes and Festering Sores Laws should be freely available, intelligible, and agreeable to the governed, and they should keep pace with changing times. C23 lacks these virtues. Standard C hides behind a paywall: The official standard currently costs more than $200, so most coders make do with unofficial drafts.^ 1 The standard routinely confuses its own authors, and crucial parts mystify even experienced and well-educated programmers^27; baffled silence is not consent. Developers who manage to figure out what the standard actually means are frequently appalled.^25,27 Standard C advances slowly (e.g., 30 years and five revisions to define zero equal to zero^26) and sometimes not at all. Progress means draining swamps and fencing off tar pits, but C23 actually expands one of C's most notorious traps for the unwary. All C standards from C89 onward have permitted compilers to delete code paths containing undefined operations--which compilers merrily do, much to the surprise and outrage of coders.^16 C23 introduces a new mechanism for astonishing elision: By marking a code path with the new unreachable annotation,^12 the programmer assures the compiler that control will never reach it and thereby explicitly invites the compiler to elide the marked path. C23 furthermore gives the compiler license to use an unreachable annotation on one code path to justify removing, without notice or warning, an entirely different code path that is not marked unreachable: see the discussion of puts() in Example 1 on page 316 of N3054.^9 Major disappointments of inaction involve the pillar of C programming: pointers. Comparing pointers to different objects (different arrays or dynamically allocated blocks of memory) is still undefined behavior, which is a polite way of saying that the standard permits the compiler to run mad and the machine to catch fire at run time.^16 The standard's pointer comparison restrictions, rooted in forgotten ancient hardware architectures, have surprising consequences. The seemingly innocent sequence a=malloc(...) then b= malloc(...) then if (a. If you see something like (-INT_MAX - 1), why isn't it more straightforward? See page 46 of Gustedt.^11 7. C17^8 purports to be a bug-fix revision of C11. Does the word "toto" on page 1 indicate (a) the editor's musical tastes; (b) that nobody bothered to spell-check the document; (c) that we're not in Kansas anymore; or (d) none of the above? 8. Programmer Yossarian's application requires the new C23 memset_explicit function but also requires realoc(p,0) to be well defined. If both functions live in libc.so, is Yossarian caught in a Catch-23? What should he do? 9. Following Shiffman,^24 write a Socratic dialogue in which C inventor Dennis Ritchie interrogates the C standards committee. See Yodaiken^27 for talking points. Idioms and Fluency Catch-23: The New C Standard Sets the World on Fire I decry C23's ban on the elegant idiomatic use of a classic memory allocation function. Why should you care? What's so important about idioms? Fluent, idiomatic code expresses the programmer's intentions more accurately, more clearly, and often more succinctly than rookie code. C's idioms are not excessively numerous or abstruse, but to master them you must climb a learning curve. For example, the snippets above show how most C programmers learn to collapse a numeric variable to a Boolean. The "bang-bang" idiom at the bottom isn't best in every situation, but it's often handy and you must recognize it to read expert code. Idiomatic expression proves its practicality when cluttery alternatives would complicate an inherently simple chore. For example, imagine a hotel that charges extra for pets: The first dog costs $10 and each additional pooch $5 more; likewise for other animals but with different constants. Idiomatic code is natural, easy to write, compact, and obvious to fluent readers: petFee = !!ndogs * (10 + (ndogs-1) * 5) // risk: rugs + !!ncats * ( 7 + (ncats-1) * 3) // " furniture + !!nfish * (47 + (nfish-1) * 1); // " floods Bang-bang, like most C idioms, is based not on esoteric knowledge but rather on a thorough understanding of fundamentals. Beginners who overlook the bang-bang option know about logical NOT, but perhaps they assume that double negation can't be useful. Fluent programmers, however, appreciate the peculiar nuances of the "!" operator. Mastering double-bang is largely a matter of fully understanding single-bang. Kernighan and Pike discuss programming idioms at length.^17 Klemens describes cool idioms enabled by C11 features.^19 Yodaiken explains how aspects of the C standards intended to enable performance optimizations undermine systems programming idioms.^27 Exercise: Amend the petFee formula above to add premiums for combinations of species that don't play nice together: $30 for any numbers of dogs and cats, because noise; and $20 for any numbers of cats and fish, because splashes. Hint: What happens when you multiply Booleans? Undefined Behavior Acid Trip This parody of Jefferson Airplane's classic song "White Rabbit" is about programming psychedelia--undefined behavior in C. The title refers to the empty assembly-code file you get when the compiler elides code paths with UB.^16 Helgrind is a tool in the Valgrind suite. Chris Lattner created the Clang compiler. white.s One flag makes it faster and one flag makes it small and the deprecated -Wchkp doesn't do anything at all. Go ask Lattner if we should use -Wall. And if you go comparing pointers across segments you're going to fall. That's how a hookah-smoking working group has standardized it all. Go ask Lattner did they make the right call? When your loops and expressions get up from where you said they go and Clang just had some kind of warning and Valgrind is moving slow, go ask Lattner; I hope he'll know. When the logic of -O3 is calling your code dead and the main() task is writing backwards while the workers race ahead, remember what the Helgrind said: Lock your thread. Lock your thread. Undefined Behavior Acid Trip: white.s Acknowledgments Jon Bentley, Hans Boehm, John Dilley, Kevin O'Malley, and Charlotte Zhuang reviewed drafts and provided helpful feedback. Dilley and O'Malley scrutinized the example code and recommended useful improvements. Dhruva Chakrabarti and Pramod Joisha fielded technical questions. We thank C23 standards committee members Aaron Ballman, Robert Seacord, and JeanHeyd Meneide for exchanges of correspondence. References 1. C Standards Committee (Working Group 14). Documents; https:// www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm. 2. C89 Standard; https://web.archive.org/web/20161223125339/http:// flash-gordon.me.uk/ansi.c.txt. 3. C89 Rationale, ANSI X3J11/88-151, November 1988. Available via https://en.wikipedia.org/wiki/ANSI_C. 4. Computer Business Review staff. 1988. Proposed ANSI C language standard draws criticism as comment period ends; https:// techmonitor.ai/technology/ proposed_ansi_c_language_standard_draws_criticism_as_comment_period_ends . 5. C99 Standard (draft n1256). 2007. https://www.open-std.org/jtc1/ sc22/WG14/www/docs/n1256.pdf. 6. C99 Rationale. 2003. https://www.open-std.org/jtc1/sc22/wg14/www/ C99RationaleV5.10.pdf. 7. C11 Standard (draft n1570). 2011. http://www.open-std.org/jtc1/ sc22/wg14/www/docs/n1570.pdf 8. C17 Standard (draft n2176). https://web.archive.org/web/ 20181230041359/http:/www.open-std.org/jtc1/sc22/wg14/www/abq/ c17_updated_proposed_fdis.pdf 9. C23 Standard (draft n3054). 2022. https://www.open-std.org/jtc1/ sc22/wg14/www/docs/n3054.pdf. 10. Ballman, A. 2022. WG14 document n3065: C xor C++ programming; https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3065.pdf. [Also available at Reference 1.] 11. Gustedt, J. 2019. Modern C, second edition. Manning; https:// gustedt.gitlabpages.inria.fr/modern-c/. 12. Gustedt, J. 2021. WG14 document n2826v2. Add annotations for unreachable control flow; https://www.open-std.org/jtc1/sc22/wg14/www /docs/n2826.pdf. [Also available at Reference 1.] 13. Harbison, S. P., Steele III, G. L. 2002. C: A Reference Manual, fifth edition. Prentice Hall. 14. Hatton, L. 1995. Safer C: Developing Software for High-Integrity and Safety-Critical Systems. McGraw-Hill. 15. Kelly, T. 2022. Literate executables. acmqueue 20(5); https:// queue.acm.org/detail.cfm?id=3570938. 16. Kelly, T., Gu, W., Maksimovski, V. 2021. Schrodinger's code: undefined behavior in theory and practice. acmqueue 19(2); https:// queue.acm.org/detail.cfm?id=3468263. 17. Kernighan, B., Pike, R. 1999. The Practice of Programming. Addison-Wesley. 18. Kernighan, B. W., Ritchie, D. M. 1988. The C Programming Language, second edition. Prentice Hall. 19. Klemens, B. 2014. 21st Century C, second edition. O'Reilly Media. 20. Marsaglia, G. 2003. Xorshift RNGs. Journal of Statistical Software 8(14); https://www.jstatsoft.org/index.php/jss/article/view/ v008i14/916. 21. McKenney, P. E., Michael, M., Mauer, J., Sewell, P., Uecker, M., Boehm, H., Tong, H., Douglas, N., Rodgers, T., Deacon, W., Wong, M. 2019. WG14 document n2443: Lifetime-end pointer zap; https:// www.open-std.org/jtc1/sc22/wg14/www/docs/n2443.pdf. [Also available at Reference 1.] 22. Plauger, P. J. 1992. The Standard C Library. Prentice Hall. 23. Seacord, R. C. 2019. WG14 document n2464: Zero-size reallocations are undefined behavior; https://www.open-std.org/jtc1/sc22/wg14/www/ docs/n2464.pdf. [Also available at Reference 1.] 24. Shiffman, M. 2022. A man of all reasons. Harper's Magazine (April), 15-16; https://harpers.org/archive/2022/04/ steven-pinker-meets-socrates/. 25. Torvalds, L. 2018. Linux kernel mailing list posting; https:// lkml.org/lkml/2018/6/5/769. 26. C FP Group. 2021. WG14 document n2670: Zeros compare equal; https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2670.pdf. [Also available at Reference 1.] 27. Yodaiken, V. 2021. How ISO C became unusable for operating systems development. 11th Workshop on Programming Languages and Operating Systems (PLOS '21). https://doi.org/10.1145/3477113.3487274 . Terence Kelly ([email protected]) and Yekai Pan enjoy surveying the status quo, sprinkling most of it with holy water, and consigning to flame the parts they don't like. Copyright (c) 2023 held by owner/author. Publication rights licensed to ACM. acmqueue Originally published in Queue vol. 21, no. 1-- Comment on this article in the ACM Digital Library --------------------------------------------------------------------- --------------------------------------------------------------------- --------------------------------------------------------------------- [logo_acm] (c) ACM, Inc. All Rights Reserved.