[HN Gopher] My review of the C standard library in practice
___________________________________________________________________
My review of the C standard library in practice
Author : djoldman
Score : 108 points
Date : 2023-02-11 14:10 UTC (8 hours ago)
(HTM) web link (nullprogram.com)
(TXT) w3m dump (nullprogram.com)
| schemescape wrote:
| What are some good, cross-platform, permissively licensed
| alternatives?
|
| I also mostly try to avoid the C standard library because I can't
| be bothered to remember which functions are flawed and which are
| (mostly) safe to use. But I really don't like to write my own
| platform-specific wrappers for e.g. converting between string
| encodings.
| matheusmoreira wrote:
| > In general, when working in C I avoid the standard library,
| libc, as much as possible. If possible I won't even link it.
|
| Completely agree: freestanding C is a superior language. I'm so
| happy to discover I'm not alone in thinking like this. The author
| is very thorough in his criticism of libc, I learned a lot from
| the post.
|
| > The platform code is small in comparison: mostly unportable
| code, perhaps raw system calls, graphics functions, or even
| assembly.
|
| For me this is what made programming fun again! I don't even
| bother with multiplatform implementations anymore, I go straight
| for Linux system calls. Turns out to be a much better interface
| compared to libc.
|
| > On some platforms it will still link libc anyway because it's
| got useful platform-specific features, or because it's mandatory.
|
| If anyone would like to know why such a thing would be mandatory,
| I began writing about this subject literally just a few days ago.
|
| https://www.matheusmoreira.com/linux/system-calls
|
| Still very much a work in progress but it does provide context
| for his claim.
| jcalvinowens wrote:
| (Shamelessly plugging my own project)
|
| It's sort of amazing how little userspace code a crude
| webserver using Linux syscalls can be:
| https://github.com/jcalvinowens/asmhttpd
|
| One 4K page of code! No stack!
| stefanos82 wrote:
| Amazing job mate! +1
|
| How much traffic can it handle, have you benchmark it?
| agumonkey wrote:
| highly readable, kudos
| asveikau wrote:
| I agree with a bunch of points here, but this author reveals a
| spectacular gap when they mention qsort and bsearch without
| mentioning the high cost of the repeat function pointer calls on
| a modern CPU.
|
| A good qsort or bsearch mechanism would allow the comparisons to
| be inlined. Performance critical code can't use the libc versions
| due to this.
| quelsolaar wrote:
| I agree with lots of points in this article, but one advantage
| libc has is that many compilers implement intrinsic versions of
| memcpy, sin, sqrt and so on that will be faster than anything you
| can conceivably implement yourself in C. Still, the smaller
| dependency footprint you have the better, and that includes libc.
| megous wrote:
| These will/can be used even if you don't use libc.
| schemescape wrote:
| But wouldn't that require splitting your implementation per
| platform? I'd rather leave that to someone else.
| badsectoracula wrote:
| If you avoid libc you'd need to do that anyway, otherwise
| how you'd open files, allocate memory, etc?
| Someone wrote:
| More or less. I don't think declaring your own _memcpy_
| 'inline' will get you all the benefits of using the 'real'
| one.
|
| If you want your compiler to know what _memcpy_ does, so
| that, for example, it can compile it as an inlined "load
| register, store register" instruction pair _if_ it knows it's
| copying 8 bytes, you have to _#include <string.h>_ (with the
| angle brackets; _#include "string.h"_ won't do)
|
| So, if you want optimal performance, you can't do without the
| compiler header.
|
| You're still free to link with your own implementation of
| memcpy, of course, but don't expect it to be called in all
| places where the source code contains calls to it.
| forrestthewoods wrote:
| libc is an absolute dumpster fire.
|
| He mentions the problems caused by the criminal use of globals in
| locales. That problem has lead to numerous blog posts that have
| hit the HN front page.
|
| This one comes to mind:
| https://aras-p.info/blog/2022/02/25/Curious-lack-of-sprintf-...
| gavinhoward wrote:
| While I'm not entirely sure some of his stuff works correctly, I
| find that I'm also a fan of getting rid of libc in my code. I
| already wrote a memcpy, and I already implemented my own buffered
| I/O. I might as well eliminate everything else that I can.
|
| setjmp and longjmp will have to stay, though.
| flohofwoe wrote:
| Arguably the mem*() functions are part of the "useful stdlib
| core" functions. Also be aware that the standard memcpy() and a
| handful other C stdlib functions are essentially treated like a
| builtin by compilers, e.g. the compiler can add or remove
| memcpy calls as it sees fit.
| mgaunard wrote:
| The mem functions are available as compiler built-ins, no
| need for libc.
| Karellen wrote:
| Well, that depends on your compiler.
| mgaunard wrote:
| GCC is the standard.
| Karellen wrote:
| Huh. Last time I looked at the standard, I didn't see
| where that was mentioned. Could you point me at the
| relevant section, please?
| dezgeg wrote:
| No. While the compiler can optimize memcpy calls to direct
| loads/stores for small arguments (when inlining the
| loads/stores would be smaller than the cost of calling
| memcpy), for bigger or not statically know sizes it WILL
| call to memcpy in libc. And in fact it may even emit memcpy
| calls for things like struct copies.
| mgaunard wrote:
| whether you want the built-in to have the ability to
| refer to an external definition is a compiler flag.
| Asooka wrote:
| memcpy is usually treated as an intrinsic and compiled to
| efficient code when the size is a compile-time constant even in
| -O0. It is one of the few functions in the standard library
| that I see as already written to be the best version possible.
| The one deficiency is that if your ranges overlap, the result
| is undefined, but you can use memmove instead.
| jenadine wrote:
| And vice versa. The compiler can detect a loop and replace it
| by a call to memcpy
| https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888
| CyberDildonics wrote:
| This is actually a big problem and a silly transformation
| in my opinion. You can write a one line loop that copies
| memory and suddenly your compiler needs to link with the
| standard library even if you're trying to avoid it.
| stephc_int13 wrote:
| setjmp and longjmp are very low-level and kind of borderline
| use-case.
|
| I practically almost never use those.
| gavinhoward wrote:
| And that makes sense.
|
| For me, I have a stack allocator that knows about cleaning
| up, as well as about setjmp and longjmp, so I have a function
| I can call that will deallocate everything in the stack
| allocator and do the jump. It's basically a safe exception
| because everything will be cleaned up properly (if I allocate
| everything on the stack allocator or be rooted in the stack
| allocator, which I do).
| anthomtb wrote:
| The author rambled about the annoyances of string.h functions.
| And I do not disagree with many of the points made. Particularly
| strtoks global state and wholly unintuitive usage.
|
| But what is the alternative? Rolling your own equivalents?
| mqus wrote:
| particularly strtok is easy to replace imho. Ans if you've done
| so before, you can just accumulate and use your own "libc".
| Maybe not an entirely reusable one but enough of it. I can
| entirely imagine "rolling your own", as libc is not that big.
|
| But then again, I'm not a C dev by trade.
| kevin_thibedeau wrote:
| Strtok_r() is usually available as is the case for the other
| lib functions with global state. If not, it is a simple
| function to clone with whatever improvements you wish. I have a
| version that doesn't insert NULs so it can be used on read-only
| strings.
| Gibbon1 wrote:
| Reminds me I need to rework mine to use slices.
| pavlov wrote:
| My approach is to use OS-specific C libraries like Apple's
| CoreFoundation wherever possible. CFString is greatly superior
| to bad old C strings.
|
| If you need cross-platform, depending on your application it
| may be reasonable to create wrappers for your string usage with
| platform-specific implementations that use the best option
| available on each OS and fall back to POSIX.
| randomNumber7 wrote:
| The focus on setjmp/longjmp is strange. In my book you should
| have _very_ good reasons to use that.
|
| Of course you want it in there in case you need it, but writing
| so much about them when you say "standard library in practice" is
| a bit suspect.
| eschneider wrote:
| There are a lot of (ok, several :) very good reasons to use
| setjump/longjump in commercial applications and when you need
| it, you NEED it.
| [deleted]
| dcreager wrote:
| For those who haven't seen it, CCAN [1] is a great collection of
| reusable C code, much of which specifically exists to work around
| the kinds of issues mentioned in OP. It turns "oh crap I should
| probably roll my own" into "wait I bet someone has already re-
| rolled this".
|
| [1] https://ccodearchive.net/
| simplotek wrote:
| > reusable C code,
|
| What's the rationale of providing this as a random set of code
| snippets instead of putting together a library?
| zeroonetwothree wrote:
| No expectation of support?
| simplotek wrote:
| There is zero expectation of support in FLOSS projects. You
| get what you paid form
| flykespice wrote:
| The thing that impressed me the most was the reserved identifiers
| rule in C, I didn't even know that existed and wow it's worse
| than I though, it's so ill-thought-out that it covers most
| generic words you can use in your code, I wouldn't be surprised
| if >90% of the existing C codebase out there is already violating
| it unaware.
|
| But just like most pitfalls in the language, it's a consequence
| of the early developers not putting much thought when assigning
| "namespaces" on the C library functions/macros.
|
| As for the locales non-standard behavior in GNU tools, it's no
| strange that GNU project always historically deviated from POSIX
| standard.
| afc wrote:
| > A null pointer legitimately and usefully points to a zero-sized
| object.
|
| I agree with many points from the article but strongly disagree
| with this one.
|
| It makes much more sense to treat a null pointer as an invalid,
| not present, object. Treating null pointer as an empty string
| (perhaps by expecting them first N pages to contain just \0
| bytes, as various older Unix systems predating Linux did) feels
| like a recipe to hide bugs.
|
| For example, I want to distinguish the output from getenv of "the
| variable was found but has an empty value" from "the variable was
| not defined".
|
| If you have some random struct Foo, it's very clear that a null
| Foo pointer just means "no valid Foo value" and attempting to
| dereference it should crash (rather than optimistically return
| ~random data and naively continuing program execution, making
| bugs harder to detect).
|
| I expect any reasonable programmer would suggest treating a
| string (or array of any types) identically.
| jart wrote:
| I think Chris is right. I was able to build the world's tiniest
| programming language by redefining NULL to be a NUL-terminated
| string that says "NIL". https://justine.lol/sectorlisp2/#memory
| mgaunard wrote:
| In C++, std::string_view() refers to an empty string, with a
| null pointer and size of 0.
| dezgeg wrote:
| > Treating null pointer as an empty string (perhaps by
| expecting them first N pages to contain just \0 bytes, as
| various older Unix systems predating Linux did) feels like a
| recipe to hide bugs.
|
| Empty string is not a 'zero-sized object'. It's an object with
| size of one byte that is zero.
| leni536 wrote:
| It's not an empty string, it's an empty range. The author wrote
| "zero-sized object". An empty, \0 terminated string is 1 byte.
|
| There is nothing worong in treating [NULL, NULL) as a valid,
| 0-sized range. As a sibling comment pointed out, this is
| already valid in C++ for std::string_view. Or more
| appropriately, for std::copy as well.
| asveikau wrote:
| But a lot of APIs take a void* and size_t pair, where one is a
| buffer and another is a length. It is pretty unreasonable for
| ptr=NULL, len=0 to be UB, if the code will not follow the
| pointer when len == 0.
|
| TFA says that is the case for some libc functions. I agree that
| this is unreasonable. Eg. memcpy shouldn't care if pointers it
| should not follow are invalid; a copy of length 0 ought to be
| spec'd as a no-op.
| none_to_remain wrote:
| Author didn't say "treat null pointer as an empty string".
|
| To treat a C string as empty, I have to dereference the pointer
| and see that the first byte is \0. If the pointer is 0x0 then I
| crash.
|
| But if I want to copy 0 bytes to or from 0x0, I do not have to
| dereference 0x0.
|
| I would want to spend way too long with the standard before I
| opined on the correctness of the author's statement, but "zero-
| sized object" is definitely different from "zero-length
| string".
| eqvinox wrote:
| > When using fread, some implementations use the entire buffer as
| a temporary work space, even if the returns a length less than
| the entire buffer. So the following won't work reliably:
| char buf[N] = {0}; fread(buf, N-1, 1, f); puts(buf);
|
| This doesn't work reliably because the arguments are wrong. It's
| "fread(buf, size_of_item, number_of_items, file_ptr)". You're
| telling it to read one item of size N-1. If there's not enough
| data to read for a complete item, the rest of the "item" might be
| written with garbage. If you do it the right way around - reading
| N-1 items of size 1, you actually get the result you expect.
|
| Is this an unnecessary footgun in the standard C library API?
| Probably yes. But if you complain about it, I really rather you
| complain about the actual problem.
|
| Other than this, there's a few things I would agree with, and
| then a few other things that are issues with _specific_ standard
| C libraries, e.g. assert just exiting the process. Mixing C
| library API / POSIX problems with specific C library problems
| isn't particularly great for an article like this.
|
| Anyway, the article registers appropriately "uncooked" for
| someone reinventing pkg-config; as far as my personal "reputation
| database" is concerned this is strike 2 for the author.
| jdefr89 wrote:
| I too am surprised. The author generally writes clean code,
| most of it seems to rely heavily on libC... I am confused...
| eqvinox wrote:
| The article is indeed rather fuzzy about what "libc" is. It
| seems to exclude "things provided by the compiler" while
| talking about varargs, but later calls out atomics with no
| such note.
| arka2147483647 wrote:
| The subject is messy. For example; traditionally memset()
| and memcpy() are clib functions, but ofcourse nowdays any
| good compiler will intrisic or optimize those. Same for
| atomics.
| matheusmoreira wrote:
| Honestly I don't blame him. Perhaps he's used to _read_ where
| the number of bytes to read follows the buffer:
|
| https://man7.org/linux/man-pages/man2/read.2.html
| ssize_t read(int fd, void *buf, size_t count);
|
| That's how it works in every sane I/O API I've seen. What's
| even the point of the size_of_item argument?
| eqvinox wrote:
| I agree it's a weird API. Not sure where it comes from, maybe
| some '70s thing with reading blocks?
|
| However, this functions as a red flag for the article itself,
| as in: in an article dismissing something with very little
| nuance, finding one of the arguments exhibit a lack of good
| understanding of that thing raises an interrupt for me that
| its other arguments may also be flawed.
|
| The article overall just rubs me as hubris. If you go about
| dismissing decades of engineering history, you better get
| your facts right. It's easy to write things like this when
| you're at the wrong end of the Dunning-Kruger bathtub, so you
| have to show you are in fact at the right end of it. (While
| in the middle, people tend to not write articles like this in
| my experience.)
| jstimpfle wrote:
| > as far as my personal "reputation database" is concerned
| this is strike 2 for the author.
|
| Wow, so cocky.
|
| > If you go about dismissing decades of engineering
| history, you better get your facts right.
|
| I find most of the opinions on that page are well
| explained, and they aren't spectacularly bold anyway. You
| will be hard pressed finding people that think locales are
| usable and most of string.h is not historical baggage.
|
| I also learned something new, for example the thing about
| isXXX() taking unsigned values.
|
| > finding one of the arguments exhibit a lack of good
| understanding of that thing raises an interrupt for me that
| its other arguments may also be flawed.
|
| You could try the thing about reading in a benevolent way.
| Don't choose the interpretation that would most upset you.
|
| From what I can see you found one nit where the author
| might not have a complete understanding on the issue, or
| maybe they just swapped the two arguments. (I'm not quite
| sure what that explaination was about, and I didn't bother
| to research. You're right of course about how fread should
| be used). And then you went back to HN to tear the article
| in pieces (btw. I find the post to be well above average
| quality).
|
| As far as I'm concerned, this is strike 1 on my list.
| shanebellone wrote:
| "wrong end of the Dunning-Kruger bathtub, so you have to
| show you are in fact at the right end of it."
|
| It seems so simple. Yet certainty eludes me.
|
| "Dunning-Kruger bathtub"
|
| This phrasing is gold btw.
| arka2147483647 wrote:
| I have heard from somewhere that there once was an idea of
| making computers work in terms of objects, instead of
| streams. Ie, the storage layer would store individual
| objects, not bytes. And that the sizeof object, count of
| objects, params in traditional read and write functions
| comes from that.
| theteapot wrote:
| > This doesn't work reliably because the arguments are wrong.
| It's "fread(buf, size_of_item, number_of_items, file_ptr)".
|
| No it isn't. What C standard library are you using? Above code
| is correct in my C standard library ... man fread:
| SYNOPSIS #include <stdio.h> size_t
| fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
| .... The function fread() reads nmemb items of
| data, each size bytes long, from the stream pointed to by
| stream, storing them at the location given by ptr.
| ntrz wrote:
| How does the man page's "size_t fread(void *ptr, size_t size,
| size_t nmemb, FILE *stream);" disagree with the GP's
| "fread(buf, size_of_item, number_of_items, file_ptr)"? Both
| seem to support that "fread(buf, N-1, 1, f);" is "telling it
| to read one item of size N-1".
| theteapot wrote:
| Oh it doesn't. Sorry. Too early not enough sleep to be
| commenting ...
| eqvinox wrote:
| https://pubs.opengroup.org/onlinepubs/007904975/functions/fr.
| ..
|
| "The fread() function shall read into the array pointed to by
| ptr up to nitems elements whose size is specified by size in
| bytes, from the stream pointed to by stream. For each object,
| size calls shall be made to the fgetc() function and the
| results stored, in the order read, in an array of unsigned
| char exactly overlaying the object. [...] If a partial
| element is read, its value is unspecified."
|
| It doesn't say you stop when fgetc() returns EOF (or an
| error), so by a strict reading the remainder of partial items
| is filled with fgetc()'s "EOF" return value. But the more
| relevant thing is that it says the value of a partial element
| is unspecified.
|
| (Edited to point out the "unspecified" aspect.)
| tom_ wrote:
| Another way of looking at it: if you don't want fread to
| overwrite N-1 bytes, don't tell it it can overwrite N-1 bytes!
|
| (Also, check the return value...)
| Karellen wrote:
| I think the author does want fread() to overwrite N-1 bytes,
| iff there are N-1 bytes to read.
|
| The objection is to overwriting N-1 bytes when only N-2 bytes
| were read.
|
| Which... doesn't sound completely unreasonable?
|
| On the other hand (especially with SIMD CPUs, but maybe also
| with simpler instruction sets) I can understand that copying
| blocks of e.g. 8 bytes at a time might be faster than the
| special case at the end of a read for copying only 7 bytes.
| (Because you need to follow an "unlikely" branch to get to
| the 7-byte code which stalls the CPU, maybe.) So if the
| output buffer has space for it, it might be more performant
| to copy 8 bytes including a junk byte and just say you copied
| 7, than copying only 7 bytes.
|
| And the author does seem to object to a number of other
| design decisions which could hurt performance.
| [deleted]
| WalterBright wrote:
| I started the article thinking it's suspect. But it's very good,
| and worth reading for any C programmer.
| stephc_int13 wrote:
| I also fully agree with the author and I am usually following a
| similar practice.
|
| The libc is the weakest point of the C programming language,
| outdated, bad naming conventions, and in some cases harmful APIs.
|
| It also has its merits, simplicity and availability, it is good
| as a fallback, but I think it is generally preferable to have a
| full API coverage when using a framework, including basic things
| such as printf/malloc/fopen as there are better ways to make
| those and it is quite important to maintain similar convention
| across the whole codebase.
| jhallenworld wrote:
| No mention of printf and scanf. Printf is annoying because of the
| PRIu32 macros in case you want to print uint32_t in a portable
| way. How about conversion specifiers for the stdints.. %u32 %d32
| or whatever.
|
| As for scanf: I want it to either succeed in parsing all of the
| arguments, or fail, but in this case leave the file position
| unchanged (so random access is required on input). This would
| allow trying to parse with another call to scanf, but with a
| different format string. For lists involving an arbitrary number
| of items, provide a conversion specifier that leaves the input
| alone and returns success: this way you can iterate the list.
|
| For example, these would be equivalent: if
| (scanf(f, "%d %d %d", &a, &b, &c)) printf("Success!\n");
| if (scanf(f, "%d %e", &a) && scanf(f, "%d %e", &b) && scanf(f,
| "%d", &c)) printf("Success!\n"); /* %e means more
| input expected, without it scanf fails if no EOF at that point */
|
| I should be able to provide my own parsing functions in it:
| imagine a conversion specifier for JSON that returns a tree.
|
| The existence of sprintf and sscanf point to bad design: I should
| be able to open a block of memory as a FILE, which would make
| sprintf and sscanf redundant. Also I should be able to open some
| kind of block device: meaning the FILE has pointers to user
| provided read and write functions. With this I could printf and
| scanf to some indirect device such as an EEPROM.
|
| We should be using scanf for the program's argument list. I mean
| the program should not get an array of pointers, but instead a
| FILE that contains the arguments. The memory layout is then
| abstracted. The pre-conversion of the command line to an array is
| totally unnecessary, only exists because libc's scanf sucks.
|
| Arbitrary length strings could have been handled with FILEs..
| leni536 wrote:
| From the article:
|
| > As for formatted input, don't ever bother with scanf.
|
| With an entire other article about scanf as the link.
|
| edit:
|
| As for your other point, files may not even be seekable, let
| alone random access. How should that be implemented for a pipe?
| jhallenworld wrote:
| Ack, you are right, I read it too fast.
|
| If the input is really not random access capable, provide
| buffering. You could imagine a version of FILE that
| automatically does this. There could be a some kind of
| truncate call to make it forget all past input, once you are
| sure it's not needed.
|
| In an OS you will have the resources to do this (malloc..).
| In other contexts (embedded), you may not.. but likely in
| those cases you don't have pipes.
| jdefr89 wrote:
| I am confused. Having gone over quite a bit of the authors C code
| on GitHub, it seems he uses libC quite often. I read/follow his
| blog posts; most are well written and insightful. His code is
| rather elegant as well, making his code bases good for studying
| (I recommend them to mentees)... What am I missing here. I think
| libc is quite alright given its time in operation, I bet libC is
| the most widely used standard library on the planet by a large
| margin. If not libC then what? Musl, glib? Those are great as
| well but I think avoiding libC is somewhat silly..
| woodruffw wrote:
| The post doesn't say he _doesn 't_ use libc; it says that he
| tries to minimize his use of it. It's very difficult to
| entirely avoid libc.
|
| musl and glibc are implementations _of_ libc, not alternatives
| _to_ it.
| eqvinox wrote:
| glib != glibc; glib is Gtk/GNOME's OS-independent wrapping
| layer and in fact an alternative to using libc functions
| directly. E.g. here is glib's take on strcmp() as entry
| point: https://docs.gtk.org/glib/func.strcmp0.html
| woodruffw wrote:
| I know that; I assumed from context ("Musl, glib") that the
| GP has probably just dropped the "c" by accident, since
| musl _is_ a libc implementation.
| mgaunard wrote:
| how is it difficult to avoid libc?
|
| All you need are the system calls of your kernel. You don't
| need libc to call those.
| woodruffw wrote:
| Relatively little of libc overlaps with the system call
| space: there's some I/O and a little bit of stuff that
| _could_ be done with a system call (like time), but
| traditionally aren 't for both concern separation and
| performance reasons.
|
| As a small example: `strtok` is simultaneously a very
| useful function and one that's annoying to use correctly
| (much less independently implement correctly). It isn't a
| system call.
| mgaunard wrote:
| strtok is a trivial and not particularly good function,
| in part due to its poor API. You can write a better one
| in minutes (which is incidentally less than the time to
| understand the corner cases of this function's behaviour)
| flohofwoe wrote:
| You use your own or someone else's non-standard wrapper
| libraries for specific OS services. For instance if you want to
| do asynchronous IO, or use virtual memory features, the C
| standard library is entirely useless anyway, so you create your
| own thin cross-platform wrappers over OS specific system API
| calls, or even call those OS specific functions directly (e.g.
| POSIX functions on UNIX-flavoured operating systems or Win32
| functions on Windows).
|
| The C stdlib is essentially an SDK for very simple 70's UNIX-
| style command line tools, but operating systems have moved on,
| while the C standard library is unfortunately stuck in the
| past.
|
| (IMHO a "C stdlib v2" would be much more important than any new
| language features, but it doesn't look like the C committee
| wants to go down that road).
| jwilk wrote:
| > the typical implementation doesn't have the courtesy to trap in
| the macro itself
|
| What does it mean?
___________________________________________________________________
(page generated 2023-02-11 23:01 UTC)